CUDA kernels keep on crashing

immuner · October 10, 2008, 1:17pm

Hello there,

i am dealing with some problems with CUDA 2.0 on a Quadro FX1700 512mb.
I am converting a number of algorithms from cpu to cuda. Everything seems fine however, i am getting a number of errors that i am still unable to fix (went through the docs, but i am not experienced yet with cuda).
The application uses a series of kernels (so far only two) to convert a large volume to an image. This means that the data passed to cuda could easily exceed 100mb and go up to 300mb.

Problem 1 Kernel 1: This kernel receives the loaded volume as an input, uses a block of 16,16, a grid of imagedims/block_size and has a loop that goes through each slice of the volume performing some calculations. During execution though the application crashes with an out of memory error and device restart (on vista). However, if i change the loop so that it goes from slice to to numSlices * 0.5 and add a second kernel that does the same but loops from numSlices * 0.5 to numSlices, then i get no errors. Why is that and is there any way to avoid this?

Problem2 Kernel 2: If i manage to go to the second kernel, which uses same block and grid, the kernel shows an error of too many resources upon launch.

Running nvcc with -ptxas options=v i get the following output for each kernel:

ptxas info : Compiling entry function ‘__globfunc__Z18KERNEL2PsS_4int3S0_s6float3Pf’
ptxas info : Used 40 registers, 68+64 bytes smem, 52 bytes cmem[1]
ptxas info : Compiling entry function ‘__globfunc__Z18KERNEL1PsiiiiPdS0_S0_S0_S0_S0_Pis’
ptxas info : Used 23 registers, 9600+0 bytes lmem, 66+64 bytes smem, 8 bytes cmem[1]
ptxas info : Compiling entry function ‘__globfunc__Z17KERNEL1PART2PsiiiiPdS0_S0_S0_S0_S0_Pis’
ptxas info : Used 23 registers, 9600+0 bytes lmem, 66+64 bytes smem, 8 bytes cmem[1]

kernel1 doesnt seem to be that heavy in order to cause such an exception.
kernel2 on the other side, is heavy only because it needs 40 registers, but i have run succesfully projects from the nvidia samples that use much heavier kernels and a larger number of threads.

All the data im passing i am doing proper cudamalloc and i think my code is on the right direction but I am still a bit inexperienced with cuda so i guess i am missing something here . Any help would be really appreciated. Can you please help me out?

DarkAr · October 10, 2008, 3:06pm

as for the first – you probably hit the timelimit of kernel execution (watchdog)

for the second - check if number_of_threads_per_block*your_registers_usage < 8192
and if shared memory usage is below 16kb
(and ofcourse if your blocks and grid dimensions are within limit)

immuner · October 23, 2008, 10:14am

hello,

thank you for your answer.

ok played with it a bit more, tried reducing the block size. doesnt crash now at first, but after two or three frames. seems to be a bit unstable, since cuda returns an unknown error. is there any way to trace that?

using emulation mode, i get no errors.

how can i find out the time limit of kernel execution?

and how can i find out the limits of blocks and grid dimensions?

Linny · October 23, 2008, 11:44am

The time limit for kernels is 5 seconds when running on a graphics card that is also used for displaying a desktop.

Dimension for the block and grid size can be found in the programming guide that comes with the

immuner · October 23, 2008, 11:54am

thanks for that…

i am definitely not exceeding those…so in terms of boundaries i am correct now.

it is quite weird though why i get unknown error crashes. i hope that on the next release there will be more detailed info on error tracking.

Linny · October 23, 2008, 1:51pm

Well, it might be that your kernel is accessing memory addresses that it shouldn’t, that is usually a guaranteed crash. Without seeing the code it’s hard to make suggestions what could be causing the crashes.

immuner · October 27, 2008, 9:00am

is there any way to exceed that? on kernel1 i am performing some mathematical calculations that would require some time to finish. I am sure it doesn’t take more than 5 seconds so far, but it could easily exceed it. Can i alter the kernel execution time limit?

EDIT: I get a crash that seems to happen at random times on kernel1. I get an out of memory error. It is same data, but every once out of three times vista reboots the device. I have checked and lowered the block count, and register and shared memory are same as posted above. It seems i am not exceeding anything that could cause that. In my opinion, CUDA seems to be a bit unstable. Otherwise, why does it crash at random times?

P.S. Could it be because of my OS? I am running Vista 64-bit, but have installed the 32-bit SDK as the 64-bit one was causing problems.

i wish i could post the code, but is impossible at the current state since its quite big and the main project is in java.

Topic		Replies	Views
Limitations of a CUDA kernel reached? CUDA Programming and Performance	3	4327	March 7, 2011
Odd performance problem/question CUDA Programming and Performance	3	835	June 3, 2009
Launching Kernel Fail CUDA Programming and Performance	15	3409	May 28, 2014
Can a Kernel be too big?? CUDA_ERROR_NO_BINARY_FOR_GPU error 209 CUDA Programming and Performance	11	3045	November 13, 2017
Kernel problem, execution stop after ~15min CUDA Programming and Performance	7	1787	November 4, 2016
syncthreads problem I guess this is a syncthreads problem CUDA Programming and Performance	9	5130	October 12, 2008
Shared memory limits and cudaError_enum How to precisely determine how much of the shared memory is CUDA Programming and Performance	5	2811	April 29, 2009
Cuda API error detected: cudaLaunchKernel returned (0x2bd) CUDA Programming and Performance	2	929	April 25, 2024
Maximum number of instruction inside a Kernel CUDA Programming and Performance	9	2814	October 13, 2009
Cuda application crashes works fine for small data and crashes for big data CUDA Programming and Performance	3	414	October 12, 2021

CUDA kernels keep on crashing

Related topics