So I rewrote a grid solver physics program to run on GPUs but my progress has been halted at the moment by what looks an awful lot like two separate CUDA bugs. I’ll explain them separately below:
So when I run a standard test case on the original CPU version it takes 87 timesteps to reach the finish (it calculates a new delta t every timestep based on the stored physical values). Running the same code on the GPU still takes 87 timesteps but by the end of the simulation there is about a 0.01% random error in the timesteps (and physical data). This error is nonreproducible (different deviation from CPU version (consistent result) every time I run the program) which makes me think it’s a hardware issue (along with seeing other people on these forums have the same problem). Originally the error was larger (timesteps between 85 and 90 or so) but then I buffered all my constant and device vectors and knocked the error down to this level. I’m still working on buffering my cudaMalloc3D’d vectors but if that doesn’t work completely I do not know what else to do. My program launches about 87*3 kernels over the life of this simulation; do many CUDA programmers duck this bug entirely by not launching kernels over and over (I launch kernels inside a loop from the CPU to allow for data output etc.)
When I run the program the first time after compiling, I get this error when my first kernel of the program launches
Error: Kernel Failure! (my cudaGetLastError() output)
CUDA ERROR: cudaMemcpy - main.c - var : 4 : unspecified launch failure
So the kernel appears to be failing but the second time I run the program it launches fine (and every time after that). When I double the number of grid elements (doubling the number of blocks) the kernel launch will fail when I run the program for the first two times or so, with the number of initial failures doubling as the number of blocks in the kernel call double. The kernel definitely ends up running though, which makes me think this is another hardware problem. It can be a pain though to have to go through this error message twenty times to launch a run.
Have any of you ever encountered similar behavior, and if so do you have a story of how you got around it?
(I’m running CUDA 4.0 on a GTX 460, -sm_arch=21, although error 1 also appears (with greater random inaccuracy) on CUDA 3.2 and a Tesla M2070. I did not test for the second error with that card and no longer have access to it.)