Limit of blocks per grid and thread per block for the gtx 650

Does anyone know what the limit of blocks per grid is and the limete of thread per block for the gtx 650 2gb RAM board? I am trying to make a code for multiplication and inversion of arrays. But when I try to perform the multiplication operation for large matrices the program does not work. I’m setting a kernel parameter as follows:
dim3 dimGrid (83, 83, 1);
dim3 dimBlock (blocksize, blocksize, 1);
where blocksize is 32.
But it seems that the limete is exceeding. However, by reading some information in the forum I saw that this card I am using holds a higher limit of blocks. Or am I mistaken? I’m using Windows 10, with Microsoft Visual Studio 2015, CUDA 8.0 Runtime.
Thank you!!!

None of those would exceed any limits. The limits are observable either via the deviceQuery app or in the CUDA C programming guide, around Table 13 or 14.

The problem you are experiencing is coming from some other source. Be sure to use proper CUDA error checking.

on windows, your card will have a kernel execution time limit, so you may be hitting that as the problem gets larger.