Unspecifiec launch failure on CUDA_SAFE_CALL(cudaThreadSynchronize())

barant · January 12, 2011, 5:29pm

Hello,

I’m trying to convert a very slow serial program to GPU. Problem is, I get the “Unspecified launch failure” error message every time I run it. The line that the CUDA error points to is

CUDA_SAFE_CALL( cudaThreadSynchronize() );

which I have inserted directly after my call to the kernel. I am really perplexed and hoping that someone can help. I’ve compiled and run a number of other codes, so I think I’ve got my system properly configured.

The command line is (from Properties->Configuration Properties->CUDA Runtime API->Command Line):

echo “C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v3.2\bin\nvcc.exe” -gencode=arch=compute_20,code="sm_20,compute_20" -gencode=arch=compute_20,code="sm_20,compute_20" --machine 32 -ccbin “C:\Program Files\Microsoft Visual Studio 9.0\VC\bin” -Xcompiler “/EHsc /W3 /nologo /O2 /Zi /MT " -I"C:\Documents and Settings\All Users\Application Data\NVIDIA Corporation\NVIDIA GPU Computing SDK 3.2\C\common\inc” -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v3.2\include" -maxrregcount=32 --compile -o “Debug/Image_Dist.vcproj.obj” Image_Dist.vcproj
“C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v3.2\bin\nvcc.exe” -gencode=arch=compute_20,code="sm_20,compute_20" -gencode=arch=compute_20,code="sm_20,compute_20" --machine 32 -ccbin “C:\Program Files\Microsoft Visual Studio 9.0\VC\bin” -Xcompiler “/EHsc /W3 /nologo /O2 /Zi /MT " -I"C:\Documents and Settings\All Users\Application Data\NVIDIA Corporation\NVIDIA GPU Computing SDK 3.2\C\common\inc” -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v3.2\include" -maxrregcount=32 --compile -o “Debug/Image_Dist.vcproj.obj” “c:\Tim\Image_Dist\Image_Dist.vcproj”

I am using Visual C++ 2008 Express Edition with the CUDA 3.2 SDK and a GTX 570 card.

Here’s the relevant bit of host code. pos and dist are float** and num_zero, x, y, and len are ints

    // Allocate matrices in device memory
    size_t pitch;
int* dzeros;
cudaMalloc((void**)&dzeros,	num_zero * sizeof(int));
float* dpos;
cudaMallocPitch((void**)&dpos,&pitch,10 * sizeof(float),len);
float* ddist;
cudaMallocPitch((void**)&ddist,&pitch,(len+2)*sizeof(float),num_zero);

// Copy matrices from host memory to device memory
cudaMemcpy(dzeros,thezeros,num_zero*sizeof(int),cudaMemcpyHostToDevice);
cudaMemcpy2D(dpos,pitch,pos,pitch,10*sizeof(float),len,cudaMemcpyHostToDevice);
cudaMemcpy2D(ddist,pitch,dist,pitch,(len+2)*sizeof(float),num_zero,cudaMemcpyHostToDevice);

// Send the task out
dim3 threads(1, 1);
    dim3 grid(x / threads.x, y / threads.y);
cudaError_t cudastat;

computedist<<<grid,threads>>>(dzeros,dpos,ddist,num_zero,x,len,pitch);
CUDA_SAFE_CALL( cudaThreadSynchronize() );

Here’s the device code:

global void computedist(int* dzeros,float* dpos,float* ddist,int num_zero,int x,int len,size_t pitch){
// Block index
int bx = blockIdx.x;
int by = blockIdx.y;
// Thread index
int tx = threadIdx.x;
int ty = threadIdx.y;
// Block Dimension
int dx = blockDim.x;
int dy = blockDim.y;

int i = bx*dx+tx;
int j = by*dy+ty;

int index = i*x+j;
int theind;

for(theind=0;theind<num_zero;theind++)
	if(dzeros[theind]==index)
		break;

// Pointers to matrix locations
float* location;
float* r;
float* c;

location = (float*)((char*)ddist + theind*pitch);
*location = i;
location = (float*)((char*)ddist + theind*pitch) + 1;
*location = j;
for(int ind=0;ind<len;ind++){
	location = (float*)((char*)ddist + theind*pitch) + ind+2;
	r = (float*)((char*)dpos + ind*pitch) + 6;
	c = (float*)((char*)dpos + ind*pitch) + 5;
	*location = sqrt((i-*r)*(i-*r)+(j-*c)*(j-*c));
}

}

sergeyn · January 12, 2011, 6:12pm

According to my experience in 99% of cases that means that your kernel is doing something wrong, like access violation for example. In this case the next cuda call signales unspecified error.
I recommend you first run it under the nsight parallel debugger with memory checking enabled, then if that doesn’t point you to the problem start commenting out parts of your kernel until the problem goes away, keep commenting out piece by piece until you isolate the problem to a specific line.

kevyin1 · January 26, 2011, 11:56pm

disclaimer: I’m a total newbie

If you look at the definition of the CUDA_SAFE_CALL macro it actually calls cudaThreadSynchronize()
http://forums.nvidia.com/index.php?showtopic=66907

So maybe it has something to do with it being called again?

DrAnderson42 · January 27, 2011, 11:49am

Any errors reported by cudaThreadSynchronize() are delayed errors that were caused by a previous kernel launch (or any other async CUDA call).

Dittoaway · January 27, 2011, 4:29pm

I’m guessing ( because I gave up trying to understand your code) that you’ve got an access violation in your kernel.

Your code is WAY too complicated for what it appears (?) to be doing.
Why all the pointer casting … float ->char->float ?
Why are you even using pointers for this?

Also, in general, it’s a lot easier to use flattened 1-D rather than 2-D.

barant · January 27, 2011, 6:31pm

I’ve kind of abandoned this problem. I was just using it to get familiar with CUDA before I moved on to the real thing that I’m interested in. As for what was causing the error, it seems to have been some combination of writing to device memory poorly and trying to pass things to the device that were too large. And looking at it now, the code is a mess. I tried to convert some MATLAB code I had, and it ended up pretty lousy. So if anyone is looking to solve a similar problem, just disregard my code.

Topic		Replies	Views
cudaSafeCall() Runtime API error in file <main.cu>, line 76 : unspecified launch failure I am CUDA Programming and Performance	2	11084	July 6, 2009
cuda-gdb and cudaThreadSynchronize() CUDA Programming and Performance	1	3337	January 7, 2009
Unspecified launch failure CUDA Programming and Performance	2	5609	August 15, 2008
Simple code won't work: unspecified launch failure CUDA Programming and Performance	0	778	October 20, 2011
unspecified launch failure ~2 second durating random freezes CUDA Programming and Performance	2	1969	March 4, 2008
Unspecified launch failure when launch my kernel CUDA Programming and Performance	2	3437	June 15, 2010
Synchronization synchronizing a n body problem. CUDA Programming and Performance	8	4321	September 22, 2009
Looping kernel calls Unspecified launch error on cudaFree() ?? CUDA Programming and Performance	5	1746	May 13, 2009
unspecified launch failure CUDA Programming and Performance	1	4488	October 12, 2008
How to check if kernel was launched? Is possible that kernel failed to launch but it was not recorde CUDA Programming and Performance	3	3300	March 8, 2010

Unspecifiec launch failure on CUDA_SAFE_CALL(cudaThreadSynchronize())

Related topics