I’m trying to run a simulation and i have to manipulate number of blocks and thread sizing. For threads I think there is a maximum of 512x512? and blocks have a 65535x65535 maximum. So anyways i setup my grid to be 4,4,1 and threads to bs 2,2,1 but it only runs 10 of the blocks in simulation and on the card.
I also thought that the grid is 2D and the threads can be 3D but I’m not sure. The max of threadblocks in a grid is 65k in each direction. All this information you can find in the programming guide (some appendix).