cudaError_enum Strange kernel failure

immo · July 25, 2008, 8:27am

Hello,

I have a reasonably basic kernel that multiplies two complex volumes, that generates an exception:

First-chance exception at 0x7c812a5b in mshta.exe: Microsoft C++ exception: cudaError_enum at memory

location 0x0983ea00…

First-chance exception at 0x7c812a5b in mshta.exe: Microsoft C++ exception: cudaError_enum at memory

location 0x0983ea70…

First-chance exception at 0x7c812a5b in mshta.exe: Microsoft C++ exception: cudaError_enum at memory

location 0x0983fd80…

__global__ void volume_complex_conjugate_multiply(Complex* pVolOut, const Complex* pC1, const Complex* pC2, 

const int maxindex)

{

	const int index = (blockIdx.y*gridDim.x*blockDim.x)+(blockIdx.x*blockDim.x)+threadIdx.x;

	

	if (index >=0 && index < maxindex && (threadIdx.x < blockDim.x || blockIdx.x < gridDim.x || 

blockIdx.y < gridDim.y))

	{

  Complex Res;

  Res.x = (pC1[index].x*pC2[index].x) + (pC1[index].y*pC2[index].y);

  Res.y = (pC1[index].x*pC2[index].y) - (pC2[index].x*pC1[index].y);

 pVolOut[index] = Res;

	}

}

I’ve tried to find the problem and thought to have found it a few times but it keeps coming back. The same

error appears when I make a basic “array out of bounds” error so I added multiple if-guards to prevent that and

also checked that the three pointers are not NULL in advance. All of the three pointers point to a

different volume and ofcourse their memory does not overlap. I have checked the error after the kernel

failure but it’s just a (meaningless) cudaErrorLaunchFailure. The kernel is simply called as follows:

threads[GPUId].x = 109

threads[GPUId].y = 1;

threads[GPUId].z = 1;

blocks[GPUId].x = 216

blocks[GPUId].y = 116

blocks[GPUId].z = 1;

volume_complex_conjugate_multiply<<<blocks[GPUId], threads[GPUId]>>>

This kernel is part of a larger multigpu application (8800 GTX and 8500 GT). For some reason the error only

occurs on the 8500 GT. (I am aware of the large difference in performance between the two GPU’s but thats

part of the experiment).

Execution time of the kernel falls way below the 5 sec timer (so its not the watchdog timer)
NVCC reports that the kernel uses 7 registers (7109 < 8192) and 32+28 bytes smem (60109 < 16k)

I have spent literally two days to find the source of this error, searched the forum for similar errors but

have not been able to solve it. If anybody has ANY idea what could cause this please post it.

Thanks in advance,

Kevin

BarsMonster · July 25, 2008, 1:27pm

Have you tried to check errors from ALL cuda-related commands on the host-side? Have you changed hardware reqs (like compute capability)

immo · July 25, 2008, 1:49pm

Yes, I have a macro which checks for error states around every Cuda function. What exactly do u mean by the question if I changed hardware requirements? How could I change the compute capability, do you mean in the program for instance by adding statements only supported by 1.1 comp. cap. ? As far as I know I am not using statements only supported by 1.1 (no atomics), also, the 8500 GT (on which the error occurs) has a higher comp. cap. (1.1) then the 8800 GTX (1.0), could this be the source of the problem ? It could be, but I would find it somewhat strange especially since the error always occurs in the same kernel…

immo · July 30, 2008, 8:48am

Is there anybody who has any idea what could possibly go wrong in my code?

Indogo2000 · July 31, 2008, 10:22am

i’ve been getting exactly the same thing!

i’ve been wrapping cuda error checks around eveery call

also following every kernel call with cudaSychThreads()

i haven’t tested your code yet but when i DO get this error

it SEEMS to still run fine. how about you?

also it only does this in debug mode (of course). and when

i run without debugging, no probs.

immo · July 31, 2008, 11:09am

When i get this error the thread in which it occurs exits immediately. Note, however, that the most likely reason is something like indexing an array out of bounds (which I’m very sure I’m not doing). I have also some other lead now (see http://forums.nvidia.com/index.php?showtopic=60956) which it could be related to as well since I have not put cudaSynchThreads after every cuda call. I’m now experimenting with loops full with mallocs and free and kernel launches to see what happens. I have not tried to run without debugging, will try that soon…

DannyD · October 3, 2009, 4:27am

I think you could get this when in Visual Studio when you have exceptions turned on.
Hope that helps anyone.

Topic		Replies	Views
cudaError_enum reason CUDA Programming and Performance	9	39493	May 4, 2008
CUDA kernels keep on crashing CUDA Programming and Performance	6	3644	October 27, 2008
How to debug kernel throwing an exception? CUDA Programming and Performance	16	7945	June 14, 2013
Diagnosing error messages cudaError_enum CUDA Programming and Performance	1	1011	March 18, 2009
strange behavior with device emulation CUDA Programming and Performance	5	2693	May 20, 2008
Embarassingly beginner question CUDA Programming and Performance	8	3289	May 22, 2009
Kernel problem, execution stop after ~15min CUDA Programming and Performance	7	1785	November 4, 2016
Emulation works, Debug doesn't CUDA Programming and Performance	12	2655	January 29, 2010
Illegal memory access crash CUDA Programming and Performance	15	4528	January 30, 2022
Silent kernel failure CUDA Programming and Performance	25	8299	May 18, 2020

cudaError_enum Strange kernel failure

Related topics