Unspecified launch failure

Getting another weird error. Here is my code:

void GetMatches(int *pMatchList_CPU, int startIdx)


	int blockCnt = 512;

	int threadsPerBlock = 16;

	int totalThreads = blockCnt * threadsPerBlock;

	int *pMatchList_GPU;

	int dataSize = totalThreads * sizeof(int);

	CUDA_SAFE_CALL(cudaMalloc((void**) &pMatchList_GPU, dataSize));

	CUDA_SAFE_CALL(cudaMemset((void*) pMatchList_GPU, 0, dataSize));

	cudaGetMatches<<<blockCnt, threadsPerBlock>>>(pMatchList_GPU, startIdx);

	CUDA_SAFE_CALL(cudaMemcpy(pMatchList_CPU, pMatchList_GPU, dataSize, cudaMemcpyDeviceToHost));



Everything appears to work fine up until the cudaMemcpy, which is gives the less than helpful error message “unspecified launch failure.”

Am I missing something obvious?

Edit: Just one additional note, pMatchList_CPU is allocated to be the same size as dataSize, in the calling method. I tried adding a cudaMallocHost call and copying from pMatchList_GPU to that first, but I get the same error.

usually, unspecified launch errors result from kernel calls. Since you aren’t checking for errors after the kernel call directly, this is probably why it is showing up at the memcpy.

The most common cause of “unspecified launch failure” is when a kernel writes past the end of allocated array bounds.

Actually, I had tried that, but I didn’t have a cudaThreadSynchronize() call before getting the error message (and it was missing from the code I posted), so it was saying there was no error message.

When I added a cudaThreadSynchronize() and then get the error message, there is an error message, so you are correct, it is happening in my kernel call. I’ll keep digging. Thanks.