Optimization problem how many blocks/ threads...

Hi everyone, i am a little bit confused…

i’d like to manipulate an array of WIDTH * HEIGHT = 8192 * 8000
To do some calculations on the GPU device, i need to call the kernel function, but i dont know how many blocks and threads
to put in optimally !! do you see what i mean ?

512 is the max number of threads on a block
65536 is the max number of block for each dimension

I think i will need a 2D block and 256 threads per blocks… but i am not sure
please help me :">

Try to keep your threadsPerBlock higher if possible. Might be better this way. You could set blockDim.x to 512, gridDim.x to 16 and gridDim.y to 8000. So blockDim.x * gridDim.x would equal WIDTH.