Short kernels cause "unspecified launch failure"

I am trying to do matrix multiplication on my nvidia GTX260, which is primary display card in Windows XP. I have read that kernels lasting longer than 5 seconds are killed by the Windows watchdog timer, so I have made sure my CUBLAS calls are short (< 5ms). One matrix multiplication with cublasSgemm seems to work fine. However, when I put the matrix multiplication in a loop, the cudaThreadSynchronize following the call to cublasSgemm gives an “unspecified launch failure” after a seemingly arbitrary number of cycles (usually it is somewhere around 150,000 cycles, however, it varies greatly). I read that many times this is equivalent to “segmentation fault”. So I decided to create a test case based on the CUDA SDK matrix multiplication sample (non-driver version). I just put a loop around the kernel call, followed by a cudaThreadSynchronize(), and once again, the cudaThreadSynchronize returns an “unspecified launch failure” failure after about 150,000 cycles when I increase both matrix sizes to 1024x1024.

Does anybody know what is causing this? It seems very similar to the error that theMatrix got in , however, I get an “unspecified launch failure” rather than a “launch timeout”. Any help would be greatly appreciated.

I switch to an older graphics card for my primary display and I’m using my GTX260 for CUDA. The examples I mentioned work perfectly. Apparently, CUDA is only useful when the card is not being used by windows. This is unfortunate. I hope nvidia has plans to remedy this in the future. CUDA is extremely useful, and many people with CUDA capable cards are unable to use it.

This sounds like some kind of driver bug. Many people use CUDA for kernels running less than a few seconds on their primary display adapter.

Can you give some more info? Which driver version and CUDA toolkit are you using? Perhaps the Windows experts here can figure out what’s going on.

It was indeed a driver issue. Somehow I got the wrong drivers installed. I am now using driver version 3.0 and the version 2.3 toolkit and everything seems to be working fine. Thank you.