suppose i want to calculate the SVD of N 4x4 Matrices. In general what do you think would be quicker, to calculate the SVDs on CPU (Dual Core 3GHz) or GPU. Whereas the GPU computations do not need a memory transfer.
Basically, will it be worth invoking the kernel N times?