cudaMalloc KILLED on tx2, and the memory can not be cudaFree real

Hi everyone!

I have an issue when I run a little CUDA test about “cudaMalloc” and “cudaFree” on TX2.

Each time I apply for memory on GPU, and it returns “success”. Then, I “cudaFree” it, and its value is “success” also. But, when I use “tegrastats” to see the GPU usage, I found that the usage didn’t reduce, which means the area just freed is not available. To be more straightforward, suppose I apply for 1GB memory by “cudaMalloc”, and the RAM usage is 1GB; then after my “cudaFree” success, the RAM usage is still 1GB, which is unreasonable I think.

Even worse, if I repeat these two steps: cudaMalloc and then cudaFree memory(1GB), for 7 times. then ,the program is KILLED.

By the way, when I apply for memory on CPU by “malloc” and “free”, it’s all correct.

So, my question is why I can’t cudaFree the memory on GPU really ?

Following is my code. Thanks!!!

#define DATASIZE 1048576
#define blocks 1024
#define blocks 8

int main()
{

cudaError_t err;
err = cudaMalloc((void**) &gpudata, sizeof(float) * DATASIZE);
err = cudaFree(gpudata);
// Datanumber<<<blocks, threads>>>(gpudata);
}

global void Datanumber(int num)
{
int tid = threadIdx.x;
int bid = blockIdx.x;
for ( int i = bid
blockDim.x + tid ; i<DATASIZE ; i+=blocks * threads)
{
num[i] = i ;
}
}

I am sure that it cudaFree successfully, because if I add a kernel after it, I found “gpudata” can not be used anymore.But the GPU usage didn’t reduce, and the free area is still unavailable.
(The platform I use is Nsight eclipse 9.2, and all data is obtained in its debug mode.My GPU is NVIDIA Jetson TX2.)

Hi,

Thanks for your report.
We will reproduce this issue internally and update information with you later.

By the way, how do you repeat this experiment?
Do you repeat multiple function call in the same application or run the app several times?

Thanks.

I repeat it just by writing “cudaMalloc()” and “cudaFree()” for several times in one program. Like:

cudaMalloc(x,size);
cudaFree(x);
cudaMalloc(y,size);
cudaFree(y);

By the way, I want to say, every time I restart TX2, the GPU usage is 1GB about. However, after I run a program OVER, the RAM usage will be 1.9GB about, even though the program has quit. And it will never reduce to 1GB, unless I restart the TX2 again.

Hi,

We have tried your sample and didn’t meet the KILLED issue.
Guess that you are confused with our memory manage mechanism.

When freeing an cuda Memory, these memory won’t be returned to system immediately.
Instead, these memory will be held for accelerating next GPU allocation.

But the KILLED issue your mentioned is out of our expectation.
Could you share the reproduce steps with us in detail?

Thanks.

Thank you for your answer. Here is my steps.

#define DATASIZE (16384*8192)

int main()
{
checkCudaErrors(cudaMalloc((void**) &gpudata1, sizeof(float) * DATASIZE * 2));
checkCudaErrors(cudaFree(gpudata1));
checkCudaErrors(cudaMalloc((void**) &gpudata2, sizeof(float) * DATASIZE * 2));
checkCudaErrors(cudaFree(gpudata2));
checkCudaErrors(cudaMalloc((void**) &gpudata3, sizeof(float) * DATASIZE * 2));
checkCudaErrors(cudaFree(gpudata3));
checkCudaErrors(cudaMalloc((void**) &gpudata4, sizeof(float) * DATASIZE * 2));
checkCudaErrors(cudaFree(gpudata4));
checkCudaErrors(cudaMalloc((void**) &gpudata5, sizeof(float) * DATASIZE * 2));
checkCudaErrors(cudaFree(gpudata5));
checkCudaErrors(cudaMalloc((void**) &gpudata6, sizeof(float) * DATASIZE * 2));
checkCudaErrors(cudaFree(gpudata6));
checkCudaErrors(cudaMalloc((void**) &gpudata7, sizeof(float) * DATASIZE * 2)); //SIGKILL
checkCudaErrors(cudaFree(gpudata7));
checkCudaErrors(cudaMalloc((void**) &gpudata8, sizeof(float) * DATASIZE * 2));
checkCudaErrors(cudaFree(gpudata8));
return 0;
}

Try this program. I DEBUG it step by step on remote Nsight Eclipse,and examine the RAM usage by “sudo ~/tegrastats”. I found that after every step “cudaFree”, the RAM usage didn’t reduce.
When I RUN it ,there is NO error information.
This makes me crazy.

Thanks a lot !!!

I am sorry that I can’t give you a photo here.
The question is when I step to this line “cudaMalloc(gpudata7)”, my GPU usage has been 7197MB, and I can’t step over it, because it will turn off the display due to insufficient memory.

Hi,

We have applied a similar experiment and the allocation run correctly on our side.

Here is our sample:

#include <stdio.h>
#include <iostream>
#include <cuda_runtime_api.h>
#include <cuda.h>

#define ONE_MBYTE 1024*1024
#define N 8
void printMemInfo()
{
    size_t free_byte ;
    size_t total_byte ;
    cudaError_t cuda_status = cudaMemGetInfo( &free_byte, &total_byte ) ;

    if ( cudaSuccess != cuda_status ){
        printf("Error: cudaMemGetInfo fails, %s\n", cudaGetErrorString(cuda_status));
        exit(1);
    }

    double free_db = (double)free_byte ;
    double total_db = (double)total_byte ;
    double used_db = total_db - free_db ;

    printf("GPU memory usage: used = %.2f MB, free = %.2f MB, total = %.2f MB\n", used_db/ONE_MBYTE, free_db/ONE_MBYTE, total_db/ONE_MBYTE);
}

int main()
{
    float* gpudata[N];

    cudaError_t err;
    for(int i=0; i<N; i++)
    {

        err = cudaMalloc((void**) &gpudata[i],1024*ONE_MBYTE);
        std::cout << "allocation = " << (err==cudaSuccess?"success":"failed") << std::endl;
        printMemInfo();

        err = cudaFree(gpudata[i]);
        std::cout << "release    = " << (err==cudaSuccess?"success":"failed") << std::endl;
        printMemInfo();

    }
    return 0;
}
nvcc topic_1037598.cpp -o test && ./test

Could you also test this experiment on your environment?
Thanks.

I’m happy that you can reply to me.
I tested your code and it runs very well. But, I am afraid that I didn’t express my problem clearly before.
Based on your cade, when I RUN it, I can get the following message, NO ERROR at all.

  • /bin/sh -c "cd \"/home/nvidia/LHX/test01/Debug\";export LD_LIBRARY_PATH=\"/usr/local/cuda-9.0/lib64\":\${LD_LIBRARY_PATH};export NVPROF_TMPDIR=\"/tmp\";\"/home/nvidia/LHX/test01/Debug/test01\"";exit nvidia@tegra-ubuntu:~$ echo $PWD'>' /home/nvidia> nvidia@tegra-ubuntu:~$ /bin/sh -c "cd \"/home/nvidia/LHX/test01/Debug\";export LD_LIBRARY_PATH=\"/usr/local/cuda-9.0/lib64\":\${LD_LIBRARY_PATH};export NVPROF_TMPDIR=\"/tmp\";\"/home/nvidia/LHX/test01/Debug/test01\"";exit allocation = success GPU memory usage: used = 1682276352.00 MB, free = 6553526272.00 MB, total = 8235802624.00 MB release = success GPU memory usage: used = 1637117952.00 MB, free = 6598684672.00 MB, total = 8235802624.00 MB allocation = success GPU memory usage: used = 2695680000.00 MB, free = 5540122624.00 MB, total = 8235802624.00 MB release = success GPU memory usage: used = 1639247872.00 MB, free = 6596554752.00 MB, total = 8235802624.00 MB allocation = success GPU memory usage: used = 1684316160.00 MB, free = 6551486464.00 MB, total = 8235802624.00 MB release = success GPU memory usage: used = 1639276544.00 MB, free = 6596526080.00 MB, total = 8235802624.00 MB allocation = success GPU memory usage: used = 2695680000.00 MB, free = 5540122624.00 MB, total = 8235802624.00 MB release = success GPU memory usage: used = 1639227392.00 MB, free = 6596575232.00 MB, total = 8235802624.00 MB allocation = success GPU memory usage: used = 1684443136.00 MB, free = 6551359488.00 MB, total = 8235802624.00 MB release = success GPU memory usage: used = 1639276544.00 MB, free = 6596526080.00 MB, total = 8235802624.00 MB allocation = success GPU memory usage: used = 2695688192.00 MB, free = 5540114432.00 MB, total = 8235802624.00 MB release = success GPU memory usage: used = 1639157760.00 MB, free = 6596644864.00 MB, total = 8235802624.00 MB allocation = success GPU memory usage: used = 1684197376.00 MB, free = 6551605248.00 MB, total = 8235802624.00 MB release = success GPU memory usage: used = 1639059456.00 MB, free = 6596743168.00 MB, total = 8235802624.00 MB allocation = success GPU memory usage: used = 2695561216.00 MB, free = 5540241408.00 MB, total = 8235802624.00 MB release = success GPU memory usage: used = 1639157760.00 MB, free = 6596644864.00 MB, total = 8235802624.00 MB logout
  • But, when I use the DEBUG button (a small beetle icon) to step through this program, the following situation appears.

  • Warning: Adjusting return value of linux_common_core_of_thread (pid=4339, tid=4356). core = 4 >= num_cores = 4! allocation = success GPU memory usage: used = 3162685440.00 MB, free = 5073117184.00 MB, total = 8235802624.00 MB release = success GPU memory usage: used = 3162685440.00 MB, free = 5073117184.00 MB, total = 8235802624.00 MB allocation = success GPU memory usage: used = 4238012416.00 MB, free = 3997790208.00 MB, total = 8235802624.00 MB release = success GPU memory usage: used = 4238045184.00 MB, free = 3997757440.00 MB, total = 8235802624.00 MB allocation = success GPU memory usage: used = 5314039808.00 MB, free = 2921762816.00 MB, total = 8235802624.00 MB release = success GPU memory usage: used = 5314039808.00 MB, free = 2921762816.00 MB, total = 8235802624.00 MB allocation = success GPU memory usage: used = 6389841920.00 MB, free = 1845960704.00 MB, total = 8235802624.00 MB release = success GPU memory usage: used = 6389841920.00 MB, free = 1845960704.00 MB, total = 8235802624.00 MB allocation = success GPU memory usage: used = 7465455616.00 MB, free = 770347008.00 MB, total = 8235802624.00 MB release = success GPU memory usage: used = 7465582592.00 MB, free = 770220032.00 MB, total = 8235802624.00 MB allocation = success GPU memory usage: used = 8115265536.00 MB, free = 120537088.00 MB, total = 8235802624.00 MB release = success GPU memory usage: used = 8115392512.00 MB, free = 120410112.00 MB, total = 8235802624.00 MB logout
  • Under DEBUG mode, my tx2 is shut down.

    I don’t know if I have stated my problem clearly this time. Hope you can understand. Say again, I use the Nsight Eclipse to debug my program remotely.

    Hi,

    Sorry for the missing.

    Original, we think that it may be some issue in our CUDA driver.
    And use a sample to make sure we are at the same pace.

    We try this sample with cuda-gdb (Nsight) today and be able to reproduce this issue.
    This problem is reported to our internal core team already.

    Sorry to keep you waiting. We will update further information once we get feedback.
    Thanks.

    Is there any further update to solve this problem?

    Hi,

    This issue is fixed in our CUDA10.1.
    Please pay attention to our announcement for the future release.

    Thanks.