2 dimension threads setting on block/grid

So far I used one dim block/grid threads and works fine for a graphic rendering project with all the device memories passing in device in the form of containing one dim array. I try to use 2-D threads to improve. when I set the y-dim either on block or both block and grid settings but don’t do anything on y-dim yet in the kernel code, it shown the exsiting performance before only with x-dim rendering changed a lot, most likely with unexpected things ploted on the screen… not debug more but just wonder anything need to know when add/assign both x and y dim threads in the form as:

m_cuda_threads = dim3(32, 32, 1);
m_cuda_grids = dim3((500 + m_cuda_threads.x -1)/m_cuda_threads.x, (400 + m_cuda_threads.y -1)/m_cuda_threads.y, 1);

assuming using 32 thread per dim so totally 1024 fit in the maximum threads per block of my gpu of GTX680Ti as specified.

before, the setting works fine on x-dim is:

m_cuda_threads = dim3(32, 1, 1);
m_cuda_grids = dim3((500 + m_cuda_threads.x -1)/m_cuda_threads.x, 1, 1);


From what I can gather, you changed the kernel launch parameters, and now your kernel still runs, but the output is incorrect – is this right? Does your kernel use the y block or thread dimensions or indices?

We would have to see how your kernel uses the thread indices before any good recommendations can be made.


There is now improvement in performance when using 2D grids as opposed to 1D grids. I only use 2D grids when it is required by the nature of the problem or when I reach the 65000 blocks I keep the threads per block the same, but make 2D grid of blocks. Please post here the part of the kernel where you calculate the array indexes from the threaIdx and blockIdx variables.

Thanks for all answers from you. Try to explain more clearly.

My kernel doesn’t use the y demension or its blocks yet. The problem was happened as long as I added one more dimension in the block and grid configuration (dim(…))and pass/launch the same kernel w/o implement the y dimension, it shown different results or damaged the existing x dimension data from the kernel.