problem with big matrix

I wrote a small program which basically has a kernel that adds one to each element in the matrix (matrix is in global memory)

i then transfer the global matrix to a host matrix and print the values

it works fine for small matrix’s but for some reason it doesnt work for big matrixs (30000x30000), remaining with the initial value given to the elements.

i tested it out and figured that the last size of matrix that works is 11056x11056, if you make the algorithm one bit bigger eg 11057x11057, the kernel doesnt add to the items in the matrix anymore.

any ideas why?

any pointers on using big matrixes and the problems they may cause?

(ps… and yes, i do have enough threads given to the kernel)

Does it launch? Not all grid/block sizes are legal. What is the launch config?

never mind, thanks for the help, i figured it out

i had hit the memory limit of the card

forgot how much room these matrix’s take up in memory

How dense is the matrix? I’ve been working on a problem that requires matrix vector product calculations on really large matrices (aiming for 66m x 66m rows right now w/ > 1.2b non-zeros). I’ve been experimenting with streaming the matrix data in chunks to the GPU and it seems to work fine and still outperforms the CPU 20 fold. One caveat is you have to write a scheduler to make sure a memcpy async is loading another matrix block while another computes.