cudaErrorUnknown

bobtown · May 30, 2009, 9:03am

Does anyone have any advice as to what is the most likely cause of this pretty generic error ( returned by cudaGetLastError())?

[codebox] unsigned int num_threads = 256;

unsigned int blocks = (len/num_threads) + 1;	

//printf("block: %d\r\n",blocks);



dim3 grid(blocks, 1);

dim3 threads(num_threads, 1);

//dim3 grid(1, 1);

//dim3 threads(1, 1);

actFuncDouble<<< grid, threads >>>(f);

cutilCheckMsg("Kernel execution failed");[/codebox]

[codebox]global void

actFuncDouble( double* d_data )

{

// write data to global memory

const unsigned int tid = blockIdx.x*blockDim.x + threadIdx.x;

//double data = d_data[tid];



d_data[tid] = 1/(   1+exp(-d_data[tid])       );

}[/codebox]

In most of my code I am calling CUBLAS functions, but I also needed to add a few basic ones of my own (listed above). Seems simple enough, I am wondering if there is a CUBLAS error being thrown and reqular driver/runtime CUDA doesn’t identifiy those even through I syncThreads after a cublas command completes. Any ideas?

Cygnus_X1 · May 30, 2009, 9:25am

[*]Check if size of array f is a multiple of num_threads. If it is not, last block may overwrite some other memory.

[*]If len is a multiple of num_threads, blocks should be equal to ((len-1)/num_threads) + 1. Otherwise, last block will have nothing to do - all tid-s will point beyond the end of array.

That’s all I could think of from this short code.

Also note, you can provide a single int and not necessairy a dim3 structure for kernel launch configuration. Ints will be implicitly casted to dim3 in x direction, filling rest with 1.

jph4599 · May 30, 2009, 2:40pm

Don’t use CUTIL at all, it was developed only for use in the SDK and is not stable for production code. Check out the Dr. Dobbs Article on Error Handling for the correct way to check errors.
It’s just an idea, but try adding a

cudaThreadSynchronize();

between the kernel launch and the error checking. Since kernel launches are async, the kernel may not be done by the time you query the error message.

bobtown · May 31, 2009, 5:26am

When I call cublasAlloc I allocate at least 512 more bytes than neccesary so the code I wrote shouldn’t run off the end provided I understand how CUBLAS allocates memory.

[codebox]

            syncCudaMain();

	cublasDgemm(	'n', 	'n',	dataPnts, 	lyr2Cols, 	lyr2Rows, 

					1, ((double*)devMem_tMatrix2Inputs), 

					dataPnts, 	((double*)devMem_tMatrix2Wghts), 

					lyr2Rows, 	0,  ((double*)devMem_tMatrix2Output), dataPnts);

	syncCudaMain();

	actFunMainDbl((double*)devMem_tMatrix2Output,dataPnts*lyr2Co

ls);

	syncCudaMain();

[/codebox]

So I am assuming the output is allocated just as if I called cudaMalloc and all the elements of the matrix are stored sequentially in memory. Is this correct?

Is the CUBLAS source code available? I didn’t see it in the SDK anywhere.

Sarnath · June 1, 2009, 2:00am

May b, you are NOT using the correct DRIVER version.

Are other CUDA apps working fine? Check out for driver compatiblity

Topic		Replies	Views
Cublas_status_execution_failed GPU-Accelerated Libraries	2	10674	February 23, 2021
Very strange behaviour. Maybe a bug...? Kernel fails to run strangely, but no errors are reported. CUDA Programming and Performance	5	1041	May 13, 2009
CUBLAS_STATUS_EXECUTION_FAILED cublasDscal calls CUDA Programming and Performance	6	4115	June 1, 2009
Why does my kernel launch? CUDA Programming and Performance	5	5986	February 13, 2009
Simple kernel problem A question about debugging a simple kernel CUDA Programming and Performance	2	2958	November 11, 2009
Problem with "unspecified launch failure" CUDA Programming and Performance	4	3310	February 27, 2009
UVM CPU Fault on an Empty Kernel CUDA Programming and Performance	3	1203	October 27, 2017
Missing Kernel executions CUDA Programming and Performance	2	878	June 27, 2012
Unknown Error CUDA Programming and Performance	4	5904	October 17, 2018
"unspecified launch failure" - ERROR CUDA Programming and Performance	9	14314	July 19, 2011

cudaErrorUnknown

Related topics