I have been developing on making parallel an algorithm for as many GPU devices as are availble. At the moment, we have 2 Tesla C1060 devices hooked up. The machine has been in the company for a year or two, mostly sitting there on and idle.
Just recently, I have been banging my head against the wall because I thought I was a bad programmer. I thought that maybe I was stupid for making a program that would work sometimes, but not all the time. The program creates 2 CPU threads to manage the 2 GPU devices, and the 1st device always worked fine. I began to notice that sometimes the 2nd device wouldn’t give me the correct input or any input at all. When I put a cuda error test after the kernel invocation, I got something about failure of kernel invocation.
I ran the simpleMultiGPU SDK example, which I know had been working just days prior, and it got hung up. So, something tells me that the 2nd GPU device is problematic. Then, I reboot the machine and it works! What do you think might be causing this type of malfunction? Just your garden variety memory leaks? Any speculation?
Also, almost without fail, whenever I change which data set of .txt files to pull (I bring in anywhere from < 1 MB to 400 MB worth of text files) from one to another, the first time I start the program, it invokes the kernel. Then each GPU call returns in 0.000000 seconds, and the rest of the program finishes normally on the host. The output is a bunch of 0s, meaning nothing was written except the default, non-updated values.
When I run the program a second time, it works perfectly, and it continues to work perfectly then on out (until I change the data sets again). Any ideas on this?
EDIT: I am running Windows Server 2003 SP2, Quadcore with 4GB of memory. I am using CUDA 3.0, Visual Studio 2008 with the runtime API, not the driver API.
These are my 2 problems. Thanks for any ideas.
Daniel