What is the practical difference between these two?

raghibrm · August 10, 2016, 1:53pm

kernel<>>(a,b)
kernel<<>>(a,b)

a and b both are arrays.

Robert_Crovella · August 10, 2016, 2:00pm

One launches one block of e threads, and the other launches e blocks of 1 thread. The total thread count is the same, but the hardware utilization on the device will be different. Neither approach is good for performance on the GPU.

raghibrm · August 10, 2016, 2:12pm

why and which one can be better for gpu utilisation?

njuffa · August 10, 2016, 2:16pm

The total thread count should ideally be on the order of 10,000 or more, and you typically would want around 128 to 256 threads per thread block (count should definitely be a multiple of 32). The exact approach will differ based on use case; we don’t know anything about yours.