One of my recent little toys -- the NVIDIA Jetson TK1: https://developer.nvidia.com/jetson-tk1
Why? To play with CUDA 6.0 and the OpenCV image processing library.
Some initial benchmarking - about 150 GFLOPs... not bad.
ubuntu@tegra-ubuntu:/usr/local/cuda/samples/5_Simulations/nbody$ ./nbody -benchmark -numbodies=65536 Run "nbody -benchmark [-numbodies=]" to measure perfomance. -fullscreen (run n-body simulation in fullscreen mode) -fp64 (use double precision floating point values for simulation) -hostmem (stores simulation data in host memory) -benchmark (run benchmark to measure performance) -numbodies=N (number of bodies (>= 1) to run in simulation) -device=d (where d=0,1,2.... for the CUDA device to use) -numdevices=i (where i=(number of CUDA devices > 0) to use for simulation) -compare (compares simulation results running once on the default GPU and once on the CPU) -cpu (run n-body simulation on the CPU) -tipsy= (load a tipsy model file for simulation) > Windowed mode > Simulation data stored in video memory > Single precision floating point simulation > 1 Devices used for simulation GPU Device 0: "GK20A" with compute capability 3.2 > Compute 3.2 CUDA device: [GK20A] number of bodies = 65536 65536 bodies, total time for 10 iterations: 5471.183 ms = 7.850 billion interactions per second = 157.003 single-precision GFLOP/s at 20 flops per interaction