CUDA runtime error? iterative call of kernel from the Host

I am a novice CUDA programmer. I am trying to use GPU-computing for simulation of some big electrical networks. I am facing the following problem, if anybody knows about this similar problem, please help.

Problem Description:

I have to initiate a matrix of size NxN, this matrix elements do not change throughout the simulation for one program.
I have to use this matrix iteratively, depending on the duration of simulation, number of iteration can be very big number. In each iteration I have to multiply this matrix with few Vectors of size N (elements of these vectors are not fixed, they are updated in each iteration, which requires other variable defined in the program), (In my case, I am using GPU to do this multiplication).

After the above multiplication, I have to calculate/update each of the previously defined variables, which require accessing, previous values of these defined variables (which are commonly called History terms, in the literature), and also, require to update those values to the new value depending on the newly calculated values from the result of matrix vector multiplication. These generates another few set of new vector of length of N, then again I have to go for matrix vector multiplication, and repeat the process.

Now the problem I am facing, if this N=83, my program is OK for any number of iteration, working very good. But, when, I am changing to N=84, the program seems to work, for only around 1500 iteration after that it generates, ‘nan’ instead of the expected results. Compilation and running the program does not generates any error.

If anybody knows about similar problem, please share your experience.
Thank you in advance.

Hi there. I’m also a novice, but I saw something similar when using zero-copy memory. I think that I needed to do a " cudaThreadSynchronize()" after the kernel call, so that the CPU and GPU would be in agreement as to the state of the memory. Then, my CPU code could adjust some of my vectors, and i’d run my kernel again.

Hope this helps.


Hi, Mox,

Thanks for your reply. I am going to try your suggestion. Hope it will be good.


Finally the problem seems to be solved and so far it worked for N=84.

The only thing I have to do, is to split the whole program into different parts, and I have to do the preconditionig of the NxN matrix, which requires CULA-tools, in a separate file, then use the results of that file in to a different file which does this iterative things. Apprantly it seems that lots of memory might be used by those CULA tools, but I tried using the CULASHUTDOWN command, but it seems it does not effectively release the memory, until the whole program stops.

Thanks all.