cudaFree is returning an unrecognised error code

I’m using the cudaError_t value returned by cudaFree to discover what could be causing the memory fill I’m experiencing.

For the cudaMalloc, for all six arrays I get cudaSuccess.

But when I try to deallocate with cudaFree for all six arrays I get an unrecognised error code i.e. not cudaSuccess, cudaErrorInvalidDevicePointer or cudaErrorInitializationError.

Am I calling these correctly?

[codebox]//device variables

float2 *d_x;

error = CUDA_SAFE_CALL( cudaMalloc((void**)&d_x,NTOTAL*sizeof(float2)) );

//Copy input data from CPU

CUDA_SAFE_CALL( cudaMemcpy(d_x, x, NTOTAL*sizeof(float2), cudaMemcpyHostToDevice) );

//call kernel

error = CUDA_SAFE_CALL( cudaFree(d_x) );



	case	cudaErrorInvalidDevicePointer	:	printf("\n x cudaErrorInvalidDevicePointer");


	case	cudaSuccess						:	printf("\n x cudaSuccess");


	case 	cudaErrorInitializationError	:	printf("\n x cudaErrorInitializationError");


	default	:								printf("\n x return not recognised");



for 512x512 particles there is no problem.

but for 1024x1024 particles it appears that cudaFree is not deallocating fully.

In the first kernel, with peak mem usage of 11% cudaFree deallocates all memory.

but in the second kernel, with peak mem usage of 80% the allocation through cudaMalloc is successful but cudaFree returns an unrecognised cudaError_t value.

can you print the error code as an int? (you may have a mismatch between CUDART and your compiler)

Use the CUDA_SAFE_CALL(cudafunctionhere) macro in debug mode. That should print a more detailed error description. Worst-case scenario it will say “unknown error”, but it may find an error you’re not testing for in the switch statement.

or use cudaGetErrorString(cudaGetLastError()) or whatever it is. (nobody should ever use cutil)

all error codes cudaFree = 4 from

when I use cudaGetLastError the report is no error.

That corrsponds to cudaErrorLaunchFailure. It seems that your kernel fails to launch for some reason. cudaFree returns errors from previous async launches, so it might just return the error from the kernel.

Try the following code right after your kernel. That way, cudaFree() shouldn’t return any error.



Why do you think we shouldn’t cutil?

where is cudaThreadSyncronize() and how should I link to it?

I noticed with the 1024x1024 particle problem that the first kernel which is not deallocated has memory use of 80% after all cudaMallocs have been made, while with a 1024x512 particle problem which I have just run the memory use at the same stage was just 15%, and for the 512x512 particle problem the memory use at the same stage is just 11%

Why is there such a jump for the 1024x1024 problem?

And does this large memory use for that problem leave enough memory for the switching between blocks?

OK, I put cudaThreadExit() and cudaThreadSynchronize() after the kernel and now cudaGetLastError reports unspecified launch failure and the cudaError_t value returned by all six calls to cudaFree() is now cudaErrorInvalidDevicePointer even though six calls to cudaMalloc to allocate the memory on each GPU in the first place is cudaSuccess.

Is the amount of data being passed simply just too big for the GPU and the compiler is not picking it up? As stated earlier it seems odd that after six calls to cudaMalloc I get 80% memory use. This does seem alot. Is there a limit to the amount of memory one can allocate on the GPU?

… Problem solved …


Hi everyone…

i’ve got the same problem.

i’m calling cudaMalloc and cudaMallocHost 7 times, -> 7 times cudaSuccess.
No kernel launches before, between or after.

Then, i call a cudaFree -> results in the same “cudaErrorLaunchFailure”…

i checked the pointers multiple times, checked for previous cudaErrors… nothing… everything is valid, no problems

The accumulated amount of allocated memory is 17MB
and im useing the api version 2.1, a quadro fx 4600, driver version 181.20

this error drives me crazy…


–> Problem Solved <—

Sorry to bother you… i found the error… searched at the wrong positions

i called cudaMemcpy sometimes, and the transfered data size was larger than the array i read from.
The cudaFree throwing the error followed directly on one of these cudaMemcpy call.
BUT: cudaLastError did’nt return anything, and the cudaMemCpy itself returned cudaSuccess as well… so i ignored the function…

After correcting this mistake, i didn’t get anymore “cudaErrorLaunchFailures” on any cudaFree call.
So this mistake in memCpy influenced cudaFree…

@chrismc: have you checked your memcpy? maybe you got the same bug as like me…

You are welcome to discuss what might have happened backstage here ;)