It seems that all this CUDA thing makes sense only for really large arrays of data.
Any idea of a approximate minimum number of elements for which CUDA would start to make sense ?
Say, a fit to a circle for 20 points ?
Or a Gaussian fit for 5 points ?
(for example, Core 2 CPU vs 9600GT GPU)