Hi,
What is the best performance algorithm to sum N Bytes (N=400) on card support compute capability 1.3 ?
Thanks
Miki
Hi,
What is the best performance algorithm to sum N Bytes (N=400) on card support compute capability 1.3 ?
Thanks
Miki
This is a linear programming problem :D
Check reduction sample in SDK.
I would have thought the best performing algorithm was probably a memcpy back to the host followed by a serial loop on the CPU. Who would go to the trouble of implementing a parallel reduction for 400 bytes?
The guy that needs the result in the next iteration of the algorithms innermost loop.