I am having problems understanding how memory behaves.
What my program does is not important, but I will just say that I have developed a kernel that multiplies a banded matrix with three diagonals on both sides of the main diagonal. Essentially is a custom sparse matrix-by-vector multiplication with seven nonzeros in each row.
After having tested my code and while taking some measurements my program stopped running correctly, and the weird thing is that it stopped working for matrices of big sizes. When I say “stopped working” I mean I was getting NaN as a result. I am talking about the same working code getting “corrupted” after some runs.
The biggest matrices I want to work with have 512000 lines (and since there are 7 nonzeros in each line there is enough memory for 512000x7xsizeof(float) on a Tesla C870).
While trying to see how my code behaves for smaller matrices I realize that it works for matrices of 4096 lines, and if I increase gradually the size of the matrix from run to run until reaching 512000 I can “make” my program to run correctly. After reaching the 512000 matrix size my code works fine (at least for now :) ).
I had the same problem with other kernels, and, again, I found out that by increasing gradually the size of the matrices I can “make” my code to run for the desired matrix size.
I free the memory after the end of the code and I initialize the memory to the desired value.
So, I want to ask if anybody has encountered the same problem, and, if anybody can explain this behavior.
P.S. I could post my code is somebody wants to look at it.