cudaMalloc KILLED on tx2, and the memory can not be cudaFree real

Hi everyone!

I have an issue when I run a little test about “cudaMalloc” and “cudaFree” on TX2.

Each time I apply for memory on GPU, and it returns “success”. Then, I “cudaFree” it, and its value is “success” also. But, when I use “tegrastats” to see the GPU usage, I found that the usage didn’t reduce. To be more straightforward, suppose I apply for 1GB memory by “cudaMalloc”, and the RAM usage is 1GB; then after my “cudaFree” success, the RAM usage is still 1GB, which is unreasonable I think.

Even worse, if I repeat these two steps: cudaMalloc and then cudaFree memory(1GB), for 7 times. then ,the program is KILLED.

By the way, when I apply for memory on CPU by “malloc” and “free”, it’s all correct.

So, my question is why I can’t cudaFree the memory on GPU really ?

Thanks!!!

Please post your code so we can assess if your use of cudaMalloc and cudaFree is correct.

following is my code. Just the most simple test.
#define DATASIZE 1048576
#define blocks 1024
#define blocks 8

int main()
{

 cudaError_t err;
 err = cudaMalloc((void**) &gpudata, sizeof(float) * DATASIZE);
 err = cudaFree(gpudata);     
// Datanumber<<<blocks, threads>>>(gpudata);

}

global void Datanumber(int num)
{
int tid = threadIdx.x;
int bid = blockIdx.x;
for ( int i = bid
blockDim.x + tid ; i<DATASIZE ; i+=blocks * threads)
{
num[i] = i ;
}
}

I am sure that it cudaFree successfully, because if I add a kernel after it, I found “gpudata” can not be used anymore.But the GPU usage didn’t reduce, and the free area is still unavailable. After repeating several times, it will be out of memory.
(The platform I use is Nsight eclipse 9.2, and all data is obtained in its debug mode.My GPU is NVIDIA Jetson TX2.)
Thank you !

I assume you declared gpudata as int* gpudata; ? Otherwise the kernel call would complain about incompatible pointer types in the arguments.

Just a minor nitpick: you need sizeof(int) * DATASIZE for the allocation.

There could be platforms where sizeof(int) is 8 bytes, but for CUDA I am pretty sure sizeof(int) is 4, same as sizeof(float).

So the code should not cause undefined behavior when invoked repeatedly.

Do you get the process killed when repeatedly starting the binary from a console, or do you have to put a for() loop around the cudaMalloc/cudaFree() part to make it crash?

Christian

Thank you for your answer!Sorry for the trouble that the above is not my source code. You can use the following code.
I must say, when I use the following code to RUN, there is NO error. Its problem is only reflected in the DEBUG mode.When I used the STEP OVER, it happened.

  1. #define DATASIZE (16384*8192)

    int main()
    {
    float gpudata1,gpudata2,gpudata3,gpudata4,gpudata5,gpudata6,gpudata7;
    checkCudaErrors(cudaMalloc((void
    ) &gpudata1, sizeof(float) * DATASIZE * 2));
    checkCudaErrors(cudaFree(gpudata1));
    checkCudaErrors(cudaMalloc((void
    ) &gpudata2, sizeof(float) * DATASIZE * 2));
    checkCudaErrors(cudaFree(gpudata2));
    checkCudaErrors(cudaMalloc((void
    ) &gpudata3, sizeof(float) * DATASIZE * 2));
    checkCudaErrors(cudaFree(gpudata3));
    checkCudaErrors(cudaMalloc((void
    *) &gpudata4, sizeof(float) * DATASIZE * 2));
    checkCudaErrors(cudaFree(gpudata4));
    checkCudaErrors(cudaMalloc((void**) &gpudata5, sizeof(float) * DATASIZE * 2));
    checkCudaErrors(cudaFree(gpudata5));
    checkCudaErrors(cudaMalloc((void**) &gpudata6, sizeof(float) * DATASIZE * 2));
    checkCudaErrors(cudaFree(gpudata6));
    checkCudaErrors(cudaMalloc((void**) &gpudata7, sizeof(float) * DATASIZE * 2)); //SIGKILL
    checkCudaErrors(cudaFree(gpudata7));
    checkCudaErrors(cudaMalloc((void**) &gpudata8, sizeof(float) * DATASIZE * 2));
    checkCudaErrors(cudaFree(gpudata8));
    return 0;
    }

When I step to this line “cudaMalloc(gpudata7)”, my GPU usage has been 7197MB, and I can’t step over it, because it will turn off the display due to insufficient memory.

This is bizarre. You could file a bug at https://developer.nvidia.com/ to have nVidia look into it.

They will probably need to know the exact driver and CUDA toolkit versions you are using, and the above repro code plus a description how to reproduce it exactly (i.e. which compiler arguments you used)

Thanks! I will ask them for some advice.

The exact bug report URL is this: https://developer.nvidia.com/nvidia_bug/add

“File attachments are currently not supported on this form - please send as attachments via email to NVSDKIssues@nvidia.com referencing the Bug ID listed on the My Bugs section of My Account.”

what strikes me as odd is this statement:

  • For Jetson Platform issues post your question on the NVIDIA Developer Forums.

Does this mean Jetson support issues are generally not handled on developer.nvidia.com?

The FAQ for Jetson states this:

How can I get support for my Jetson Developer Kit or module?

See this link for available support: https://developer.nvidia.com/embedded/support

Christian

OK, I’ll try. Thanks!