invalid argument error while using cudaDeviceSetLimit in cuda kernel

I have a kernel with array allocated using malloc as

__global__ static void CalcSTLDistance_Kernel(Integer ComputeParticleNumber)
	const Integer ID  =CudaGetTargetID(); 
	CDistance NearestDistance;
	Integer NearestID = -1;
	NearestDistance.Magnitude = 1e8;
	NearestDistance.Direction.x = 0;
	NearestDistance.Direction.y = 0;
	NearestDistance.Direction.z = 0;//make_Scalar3(0,0,0);


	Integer TriangleID;		
	Integer CIDX, CIDY, CIDZ;
	Integer CID = GetCellID(&CONSTANT_BOUNDINGBOX,&c_daParticlePosition[ID],CIDX, CIDY, CIDZ);
	int len=0;
	int* td = (int*)malloc(100);


I have called this kernel with

cudaDeviceSetLimit(cudaLimitMallocHeapSize, 10*1024*1024);

during runtime it shows argument error, I want to define heap size explictly and used cudaDeviceSetLimit but it shows invalid argument error

are you calling cudaDeviceSetLimit in a loop, i.e. multiple times in your application?

yes it is called inside a loop.

I guess you should read the documentation.

“The device memory heap has a fixed size that must be specified before any program using malloc() or free() is loaded into the context.”

“Heap size cannot be changed once a module load has occurred”

So when you call the cudaDeviceSetLimit function before any kernel call, it will succeed. Then you call a kernel that does a malloc operation. After that you cannot call the cudaDeviceSetLimit function again in your program.

Once you run a kernel that does a malloc operation, you can no longer call this function. If you do, it will return an error. Try a simple test case and you will see that the description in the documentation is accurate.

So decide what size you want the heap to be, taking into account all the needs of all the kernels in your program. Then set it once, at the beginning of the program, before any kernel calls.

After that, you cannot call it again. If you do, it will return an error.