How to use multi block


i would like to use many blocks (with just one dimension) but i don’t know how to do this. Is there someone to help me?

I explain: i have to work with 1000 datas but until then i worked with only 512 (as the number of threads per block). Now, i still want to work with only 512 threads by block. So, i need 2 blocks per grid. Can you explain me how i can do it please?

I think you’ll have to use a for loop in your kernel, or split the calculation somehow and write the intermediate result out to global memory. If you have enough data, the two-step read/write latency may get hidden. The for loop is probably fine though.

but i don’t see where i need to do a loop… and i don’t know with what?

Old code, if I understand you correctly:

output_data1000[threadArrayPosition] = doCalculation(inputData1000[threadArrayPosition])

… where threadArrayPosition is valid for the first 512 values of the arrays.

New code?:

output_data1000[threadArrayPosition] = doCalculation(inputData1000[threadArrayPosition];

if (threadArrayPosition + 512 < 1000)


    output_data1000[threadArrayPosition + 512] = doCalculation(inputData1000[threadArrayPosition + 512]);



See section 4.2.3 of the programming guide. Just specify the grid in the kernel launch