This simple little kernel crashes the Nvidia 197.45 driver. Here are my system specs:
Windows 7 64
devdriver_3.0_winvista_win7_64_197_13_general
cudatoolkit_3.0_win_64
MS Visual Studio 2005
Intel i975 12 GB memory GTX 295 video
I attached the .cu file that crashes the driver. I also print it in this message for convenience. Note:
- This kernel appears to do nothing useful because I distilled it down from a huge kernel, isolating the minimum code necessary to reproduce the problem.
- Trivial changes to this code let the kernel work fine. The crash scenario appears to be very sensitive.
- The outer loop must execute a huge number of times. It runs thousands of passes before the driver crashes.
- The body of the ‘if’ statement inside the loop SHOULD never execute, but my experiments indicate that it does seem to run! Is the compiler generating bad machine code?
- After this kernel runs for a few seconds, my screen goes black for a few seconds. It comes back with a message saying that the driver has recovered from a serious error. The program ends with a cudaError return code of 30 immediately after the kernel launch.
- This bug must be rare, because I have thousands of lines of complex CUDA code that run perfectly. This is the first problem I’ve run into.
global void cuda_pass ( float *rets )
{
int ithread, icase, iop, op, opcodes[5] ;
ithread = blockIdx.x * blockDim.x + threadIdx.x ;
if (ithread != 0) // This code executes for only thread 0
return ; // So we don’t have to worry about inter-thread issues
opcodes[0] = opcodes[1] = opcodes[2] = opcodes[3] = opcodes[4] = 0 ;
for (icase=0 ; icase<5000000 ; icase++) { // Loop runs thousands of times before driver crashes
for (iop=0 ; iop<5 ; iop++) {
op = opcodes[iop] ;
if (op != 0) { // The body of this if clause should never be executed
rets[ithread] = op ; // But it appears to be! Ithread will be zero only.
break ;
}
} // For all opcodes
} // For all cases
}
int main ( int argc, char** argv )
{
float *CUDA_returns ;
cudaError_t CudaError ;
CudaError = cudaGetLastError () ; // Reset CUDA error flag
if (CudaError != cudaSuccess) {
printf ( “Startup error %d”, (int) CudaError ) ;
_getch () ;
exit ( 1 ) ;
}
CudaError = cudaMalloc ( (void **) &CUDA_returns , (size_t) (32 * sizeof(float))) ;
if (CudaError != cudaSuccess) {
printf ( “Malloc error %d”, (int) CudaError ) ;
_getch () ;
exit ( 1 ) ;
}
cuda_pass <<< 1, 32 >>> ( CUDA_returns ) ; // This also fails for 1 thread
cudaThreadSynchronize() ;
CudaError = cudaGetLastError () ;
if (CudaError != cudaSuccess) {
printf ( “Kernel error %d”, (int) CudaError ) ;
_getch () ;
exit ( 1 ) ;
}
cudaFree ( CUDA_returns ) ;
cudaThreadExit () ;
printf ( “Success” ) ;
_getch () ;
}
CUDABUG.cu (1.96 KB)