So far I used one dim block/grid threads and works fine for a graphic rendering project with all the device memories passing in device in the form of containing one dim array. I try to use 2-D threads to improve. when I set the y-dim either on block or both block and grid settings but don’t do anything on y-dim yet in the kernel code, it shown the exsiting performance before only with x-dim rendering changed a lot, most likely with unexpected things ploted on the screen… not debug more but just wonder anything need to know when add/assign both x and y dim threads in the form as:
m_cuda_threads = dim3(32, 32, 1);
m_cuda_grids = dim3((500 + m_cuda_threads.x -1)/m_cuda_threads.x, (400 + m_cuda_threads.y -1)/m_cuda_threads.y, 1);
assuming using 32 thread per dim so totally 1024 fit in the maximum threads per block of my gpu of GTX680Ti as specified.
before, the setting works fine on x-dim is:
m_cuda_threads = dim3(32, 1, 1);
m_cuda_grids = dim3((500 + m_cuda_threads.x -1)/m_cuda_threads.x, 1, 1);