i’d like to manipulate an array of WIDTH * HEIGHT = 8192 * 8000
To do some calculations on the GPU device, i need to call the kernel function, but i dont know how many blocks and threads
to put in optimally !! do you see what i mean ?
512 is the max number of threads on a block
65536 is the max number of block for each dimension
I think i will need a 2D block and 256 threads per blocks… but i am not sure
please help me :">