Hi,
i would like to use many blocks (with just one dimension) but i don’t know how to do this. Is there someone to help me?
I explain: i have to work with 1000 datas but until then i worked with only 512 (as the number of threads per block). Now, i still want to work with only 512 threads by block. So, i need 2 blocks per grid. Can you explain me how i can do it please?
I think you’ll have to use a for loop in your kernel, or split the calculation somehow and write the intermediate result out to global memory. If you have enough data, the two-step read/write latency may get hidden. The for loop is probably fine though.
but i don’t see where i need to do a loop… and i don’t know with what?
Old code, if I understand you correctly:
output_data1000[threadArrayPosition] = doCalculation(inputData1000[threadArrayPosition])
… where threadArrayPosition is valid for the first 512 values of the arrays.
New code?:
output_data1000[threadArrayPosition] = doCalculation(inputData1000[threadArrayPosition];
if (threadArrayPosition + 512 < 1000)
{
output_data1000[threadArrayPosition + 512] = doCalculation(inputData1000[threadArrayPosition + 512]);
}
[/code]
See section 4.2.3 of the programming guide. Just specify the grid in the kernel launch
kernel<<<2,512>>>(args)