Hello,
I have a reasonably basic kernel that multiplies two complex volumes, that generates an exception:
First-chance exception at 0x7c812a5b in mshta.exe: Microsoft C++ exception: cudaError_enum at memory
location 0x0983ea00…
First-chance exception at 0x7c812a5b in mshta.exe: Microsoft C++ exception: cudaError_enum at memory
location 0x0983ea70…
First-chance exception at 0x7c812a5b in mshta.exe: Microsoft C++ exception: cudaError_enum at memory
location 0x0983fd80…
__global__ void volume_complex_conjugate_multiply(Complex* pVolOut, const Complex* pC1, const Complex* pC2,
const int maxindex)
{
const int index = (blockIdx.y*gridDim.x*blockDim.x)+(blockIdx.x*blockDim.x)+threadIdx.x;
if (index >=0 && index < maxindex && (threadIdx.x < blockDim.x || blockIdx.x < gridDim.x ||
blockIdx.y < gridDim.y))
{
Complex Res;
Res.x = (pC1[index].x*pC2[index].x) + (pC1[index].y*pC2[index].y);
Res.y = (pC1[index].x*pC2[index].y) - (pC2[index].x*pC1[index].y);
pVolOut[index] = Res;
}
}
I’ve tried to find the problem and thought to have found it a few times but it keeps coming back. The same
error appears when I make a basic “array out of bounds” error so I added multiple if-guards to prevent that and
also checked that the three pointers are not NULL in advance. All of the three pointers point to a
different volume and ofcourse their memory does not overlap. I have checked the error after the kernel
failure but it’s just a (meaningless) cudaErrorLaunchFailure. The kernel is simply called as follows:
threads[GPUId].x = 109
threads[GPUId].y = 1;
threads[GPUId].z = 1;
blocks[GPUId].x = 216
blocks[GPUId].y = 116
blocks[GPUId].z = 1;
volume_complex_conjugate_multiply<<<blocks[GPUId], threads[GPUId]>>>
This kernel is part of a larger multigpu application (8800 GTX and 8500 GT). For some reason the error only
occurs on the 8500 GT. (I am aware of the large difference in performance between the two GPU’s but thats
part of the experiment).
-
Execution time of the kernel falls way below the 5 sec timer (so its not the watchdog timer)
-
NVCC reports that the kernel uses 7 registers (7109 < 8192) and 32+28 bytes smem (60109 < 16k)
I have spent literally two days to find the source of this error, searched the forum for similar errors but
have not been able to solve it. If anybody has ANY idea what could cause this please post it.
Thanks in advance,
Kevin