transpose demo: gpu vs cpu

The transpose example shows how you can move around data, but it’s not exactly a good example of how to use the full power of a GPU. Launching a kernel on the GPU requires compilation of the shader, starting up the GPU, executing the kernel, and then copying the data back from the GPU to the CPU. All lot of overhead for just reading and writing a bunch of data.

The last copy from GPU to CPU, by itself, is probably costlier than doing the transpose on the CPU!

If you want compare performance between CPU and GPU, you should look at algorithms that require a lot of floating point calculations. (E.g. the BlackScholes example).

Tom