I’ve a pretty simple question, but I don’t find answer anywhere : if I have an array with a size bigger than the number of allowed threads on my GPU, which method is the best :
a ) splitting the array in two arrays and calling twice the kernel
b ) a thread manages two elements of the array
Thanks for your help.
It really depends on the application. You should consider your memory access, and see which way would give you the most coalesced reads.
Ok, I will do that, thanks for the answer.
I think you need to worry about GPU memory limits long before the allowed thread limits:
(65 535 * 65 535 * 512 * 4) / (1 024^3) = 8 191.75
Assuming each thread works on 1 4-byte element, that allows the maximum possible number of threads to process 8 Terabytes of data: way more than will fit on current GPUs.