Simple kernel crashes Nvidia driver CUDA bug report

TimothyMasters · June 15, 2010, 10:35am

This simple little kernel crashes the Nvidia 197.45 driver. Here are my system specs:
Windows 7 64
devdriver_3.0_winvista_win7_64_197_13_general
cudatoolkit_3.0_win_64
MS Visual Studio 2005
Intel i975 12 GB memory GTX 295 video

I attached the .cu file that crashes the driver. I also print it in this message for convenience. Note:

This kernel appears to do nothing useful because I distilled it down from a huge kernel, isolating the minimum code necessary to reproduce the problem.
Trivial changes to this code let the kernel work fine. The crash scenario appears to be very sensitive.
The outer loop must execute a huge number of times. It runs thousands of passes before the driver crashes.
The body of the ‘if’ statement inside the loop SHOULD never execute, but my experiments indicate that it does seem to run! Is the compiler generating bad machine code?
After this kernel runs for a few seconds, my screen goes black for a few seconds. It comes back with a message saying that the driver has recovered from a serious error. The program ends with a cudaError return code of 30 immediately after the kernel launch.
This bug must be rare, because I have thousands of lines of complex CUDA code that run perfectly. This is the first problem I’ve run into.

global void cuda_pass ( float *rets )

{
int ithread, icase, iop, op, opcodes[5] ;

ithread = blockIdx.x * blockDim.x + threadIdx.x ;

if (ithread != 0) // This code executes for only thread 0
return ; // So we don’t have to worry about inter-thread issues

opcodes[0] = opcodes[1] = opcodes[2] = opcodes[3] = opcodes[4] = 0 ;

for (icase=0 ; icase<5000000 ; icase++) { // Loop runs thousands of times before driver crashes
for (iop=0 ; iop<5 ; iop++) {
op = opcodes[iop] ;
if (op != 0) { // The body of this if clause should never be executed
rets[ithread] = op ; // But it appears to be! Ithread will be zero only.
break ;
}
} // For all opcodes
} // For all cases
}

int main ( int argc, char** argv )
{
float *CUDA_returns ;
cudaError_t CudaError ;

CudaError = cudaGetLastError () ; // Reset CUDA error flag
if (CudaError != cudaSuccess) {
printf ( “Startup error %d”, (int) CudaError ) ;
_getch () ;
exit ( 1 ) ;
}

CudaError = cudaMalloc ( (void **) &CUDA_returns , (size_t) (32 * sizeof(float))) ;
if (CudaError != cudaSuccess) {
printf ( “Malloc error %d”, (int) CudaError ) ;
_getch () ;
exit ( 1 ) ;
}

cuda_pass <<< 1, 32 >>> ( CUDA_returns ) ; // This also fails for 1 thread
cudaThreadSynchronize() ;

CudaError = cudaGetLastError () ;
if (CudaError != cudaSuccess) {
printf ( “Kernel error %d”, (int) CudaError ) ;
_getch () ;
exit ( 1 ) ;
}

cudaFree ( CUDA_returns ) ;
cudaThreadExit () ;

printf ( “Success” ) ;
_getch () ;
}
CUDABUG.cu (1.96 KB)

avidday · June 15, 2010, 11:32am

Err don’t you have some sort of integer overflow there? Your counter icase is an integer, so you will get rollover after icase=2^31. It certainly won’t make it to 5 000 000, and what it does when the sign bit gets incremented might be a bit unpredictable. What happens if you make icase a long instead?

tera · June 15, 2010, 11:52am

Avid, I’d believe that if there were three more zeros. With the code as it is, this should be fine…

avidday · June 15, 2010, 12:58pm

Oh dear, the perils of sleep deprivation. Apologies for adding to the signal to noise ratio…

chippies · June 15, 2010, 2:34pm

Point 5 suggests that the problem is Windows resetting the display driver because the kernel keeps the GPU busy for more than a few seconds (I think the limit was 5 seconds). During which time the GPU doesn’t respond to Windows graphics commands, so Windows thinks that the GPU driver has crashed and resets it. I don’t know how to solve this one though.

TimothyMasters · June 15, 2010, 5:09pm

Your suggestion sounds like it may play a major role here, in which case I definitely need to find a way to increase that time limit. I’ll soon have apps that run for more than 5 seconds at a shot.

However, the body of the IF statement should never be executed. Yet if I change the rets[ithread]=op statement to something that does not reference global memory, it runs perfectly. I disabled all optimization in case the compiler was optimizing the loop in some way. The contents of that IF block play a role, even though they should never be executed. If I could disable the 5-second time-out I could learn a lot more.

Tim

chippies · June 15, 2010, 7:40pm

If you change the rets[ithread]=op statement to something that does not reference global memory then the if statement and loop are probably being eliminated by the compiler because it is useless code - there is no output. I’m not sure if you can disable this optimization - never tried it.

As a side note, how do you know that the code in the if statement is running?

I think that, even when the GPU never executes the contents of that if statement, the branching instructions take up so much time that running them 5 000 000 times takes too long.

TimothyMasters · June 16, 2010, 11:43am

I don’t know for sure that the IF statement block is executing. All I know is that the contents of this block impact whether it crashes or not.

From the helpful comments I’ve gotten here, I am coming to the conclusion that the following is happening:

Despite the fact that I have disabled all optimization, the compiler is still optimizing the loop into oblivion when I refrain from changing global memory. When I do reference global memory inside the loop, the loop itself, without execution of its IF block, runs so slowly that it causes a timeout.

I do wish that somewhere in the extensive documentation for CUDA, someone would have mentioned that the kernel has a limited processing time before Windows trashes it. I’ve spent about 15 hours trying to track down what I assumed was a bug in my code, only to learn that it is really an undocumented ‘feature’ of the OS and/or driver. Sigh. Anyhow, thank you all for your help. I am now going to start a new thread about timeouts.

Tim

Topic		Replies	Views
Bluescreen while running CUDA kernel CUDA Programming and Performance	5	7731	July 8, 2009
Strange behavior CUDA Programming and Performance	5	6560	November 7, 2010
Why does this programm crash CUDA Programming and Performance	3	1894	March 12, 2009
CUDA Driver crashing CUDA Programming and Performance	3	1437	June 2, 2011
Driver crashed while I was running an CUDA program CUDA Programming and Performance	2	1124	February 1, 2013
Another crash CUDA Programming and Performance	5	5477	July 12, 2007
Possible NVCC compiler bug Two 'breaks' in different loops :( CUDA Programming and Performance	3	5299	January 16, 2008
Kernel crashing drivers: simply too long? CUDA Programming and Performance	2	4763	July 23, 2010
5 Second Hell Did Not Happened ?! CUDA Programming and Performance	2	4502	January 14, 2008
Newbie idiot program crashes linux Crashing Linux CUDA Programming and Performance	6	12204	March 22, 2008

Simple kernel crashes Nvidia driver CUDA bug report

Related topics