cudaMalloc KILLED on tx2, and the memory can not be cudaFree real

337229260 · July 18, 2018, 1:43am

Hi everyone!

I have an issue when I run a little CUDA test about “cudaMalloc” and “cudaFree” on TX2.

Each time I apply for memory on GPU, and it returns “success”. Then, I “cudaFree” it, and its value is “success” also. But, when I use “tegrastats” to see the GPU usage, I found that the usage didn’t reduce, which means the area just freed is not available. To be more straightforward, suppose I apply for 1GB memory by “cudaMalloc”, and the RAM usage is 1GB; then after my “cudaFree” success, the RAM usage is still 1GB, which is unreasonable I think.

Even worse, if I repeat these two steps: cudaMalloc and then cudaFree memory(1GB), for 7 times. then ,the program is KILLED.

By the way, when I apply for memory on CPU by “malloc” and “free”, it’s all correct.

So, my question is why I can’t cudaFree the memory on GPU really ?

Following is my code. Thanks!!!

#define DATASIZE 1048576
#define blocks 1024
#define blocks 8

int main()
{

cudaError_t err;
err = cudaMalloc((void**) &gpudata, sizeof(float) * DATASIZE);
err = cudaFree(gpudata);
// Datanumber<<<blocks, threads>>>(gpudata);
}

global void Datanumber(int num)
{
int tid = threadIdx.x;
int bid = blockIdx.x;
for ( int i = bidblockDim.x + tid ; i<DATASIZE ; i+=blocks * threads)
{
num[i] = i ;
}
}

I am sure that it cudaFree successfully, because if I add a kernel after it, I found “gpudata” can not be used anymore.But the GPU usage didn’t reduce, and the free area is still unavailable.
(The platform I use is Nsight eclipse 9.2, and all data is obtained in its debug mode.My GPU is NVIDIA Jetson TX2.)

AastaLLL · July 18, 2018, 5:55am

Hi,

Thanks for your report.
We will reproduce this issue internally and update information with you later.

By the way, how do you repeat this experiment?
Do you repeat multiple function call in the same application or run the app several times?

Thanks.

337229260 · July 18, 2018, 6:23am

I repeat it just by writing “cudaMalloc()” and “cudaFree()” for several times in one program. Like:

cudaMalloc(x,size);
cudaFree(x);
cudaMalloc(y,size);
cudaFree(y);
…

By the way, I want to say, every time I restart TX2, the GPU usage is 1GB about. However, after I run a program OVER, the RAM usage will be 1.9GB about, even though the program has quit. And it will never reduce to 1GB, unless I restart the TX2 again.

AastaLLL · July 18, 2018, 6:57am

Hi,

We have tried your sample and didn’t meet the KILLED issue.
Guess that you are confused with our memory manage mechanism.

When freeing an cuda Memory, these memory won’t be returned to system immediately.
Instead, these memory will be held for accelerating next GPU allocation.

But the KILLED issue your mentioned is out of our expectation.
Could you share the reproduce steps with us in detail?

Thanks.

337229260 · July 18, 2018, 7:27am

Thank you for your answer. Here is my steps.

define DATASIZE (16384*8192)

int main()
{
checkCudaErrors(cudaMalloc((void**) &gpudata1, sizeof(float) * DATASIZE * 2));
checkCudaErrors(cudaFree(gpudata1));
checkCudaErrors(cudaMalloc((void**) &gpudata2, sizeof(float) * DATASIZE * 2));
checkCudaErrors(cudaFree(gpudata2));
checkCudaErrors(cudaMalloc((void**) &gpudata3, sizeof(float) * DATASIZE * 2));
checkCudaErrors(cudaFree(gpudata3));
checkCudaErrors(cudaMalloc((void**) &gpudata4, sizeof(float) * DATASIZE * 2));
checkCudaErrors(cudaFree(gpudata4));
checkCudaErrors(cudaMalloc((void**) &gpudata5, sizeof(float) * DATASIZE * 2));
checkCudaErrors(cudaFree(gpudata5));
checkCudaErrors(cudaMalloc((void**) &gpudata6, sizeof(float) * DATASIZE * 2));
checkCudaErrors(cudaFree(gpudata6));
checkCudaErrors(cudaMalloc((void**) &gpudata7, sizeof(float) * DATASIZE * 2)); //SIGKILL
checkCudaErrors(cudaFree(gpudata7));
checkCudaErrors(cudaMalloc((void**) &gpudata8, sizeof(float) * DATASIZE * 2));
checkCudaErrors(cudaFree(gpudata8));
return 0;
}

Try this program. I DEBUG it step by step on remote Nsight Eclipse,and examine the RAM usage by “sudo ~/tegrastats”. I found that after every step “cudaFree”, the RAM usage didn’t reduce.
When I RUN it ,there is NO error information.
This makes me crazy.

Thanks a lot !!!

337229260 · July 18, 2018, 7:45am

I am sorry that I can’t give you a photo here.
The question is when I step to this line “cudaMalloc(gpudata7)”, my GPU usage has been 7197MB, and I can’t step over it, because it will turn off the display due to insufficient memory.

AastaLLL · July 23, 2018, 9:13am

Hi,

We have applied a similar experiment and the allocation run correctly on our side.

Here is our sample:

#include <stdio.h>
#include <iostream>
#include <cuda_runtime_api.h>
#include <cuda.h>

#define ONE_MBYTE 1024*1024
#define N 8
void printMemInfo()
{
    size_t free_byte ;
    size_t total_byte ;
    cudaError_t cuda_status = cudaMemGetInfo( &free_byte, &total_byte ) ;

    if ( cudaSuccess != cuda_status ){
        printf("Error: cudaMemGetInfo fails, %s\n", cudaGetErrorString(cuda_status));
        exit(1);
    }

    double free_db = (double)free_byte ;
    double total_db = (double)total_byte ;
    double used_db = total_db - free_db ;

    printf("GPU memory usage: used = %.2f MB, free = %.2f MB, total = %.2f MB\n", used_db/ONE_MBYTE, free_db/ONE_MBYTE, total_db/ONE_MBYTE);
}

int main()
{
    float* gpudata[N];

    cudaError_t err;
    for(int i=0; i<N; i++)
    {

        err = cudaMalloc((void**) &gpudata[i],1024*ONE_MBYTE);
        std::cout << "allocation = " << (err==cudaSuccess?"success":"failed") << std::endl;
        printMemInfo();

        err = cudaFree(gpudata[i]);
        std::cout << "release    = " << (err==cudaSuccess?"success":"failed") << std::endl;
        printMemInfo();

    }
    return 0;
}

nvcc topic_1037598.cpp -o test && ./test

Could you also test this experiment on your environment?
Thanks.

337229260 · July 23, 2018, 10:17am

I’m happy that you can reply to me.
I tested your code and it runs very well. But, I am afraid that I didn’t express my problem clearly before.
Based on your cade, when I RUN it, I can get the following message, NO ERROR at all.

/bin/sh -c "cd \"/home/nvidia/LHX/test01/Debug\";export LD_LIBRARY_PATH=\"/usr/local/cuda-9.0/lib64\":\${LD_LIBRARY_PATH};export NVPROF_TMPDIR=\"/tmp\";\"/home/nvidia/LHX/test01/Debug/test01\"";exit nvidia@tegra-ubuntu:~$ echo $PWD'>' /home/nvidia> nvidia@tegra-ubuntu:~$ /bin/sh -c "cd \"/home/nvidia/LHX/test01/Debug\";export LD_LIBRARY_PATH=\"/usr/local/cuda-9.0/lib64\":\${LD_LIBRARY_PATH};export NVPROF_TMPDIR=\"/tmp\";\"/home/nvidia/LHX/test01/Debug/test01\"";exit allocation = success GPU memory usage: used = 1682276352.00 MB, free = 6553526272.00 MB, total = 8235802624.00 MB release = success GPU memory usage: used = 1637117952.00 MB, free = 6598684672.00 MB, total = 8235802624.00 MB allocation = success GPU memory usage: used = 2695680000.00 MB, free = 5540122624.00 MB, total = 8235802624.00 MB release = success GPU memory usage: used = 1639247872.00 MB, free = 6596554752.00 MB, total = 8235802624.00 MB allocation = success GPU memory usage: used = 1684316160.00 MB, free = 6551486464.00 MB, total = 8235802624.00 MB release = success GPU memory usage: used = 1639276544.00 MB, free = 6596526080.00 MB, total = 8235802624.00 MB allocation = success GPU memory usage: used = 2695680000.00 MB, free = 5540122624.00 MB, total = 8235802624.00 MB release = success GPU memory usage: used = 1639227392.00 MB, free = 6596575232.00 MB, total = 8235802624.00 MB allocation = success GPU memory usage: used = 1684443136.00 MB, free = 6551359488.00 MB, total = 8235802624.00 MB release = success GPU memory usage: used = 1639276544.00 MB, free = 6596526080.00 MB, total = 8235802624.00 MB allocation = success GPU memory usage: used = 2695688192.00 MB, free = 5540114432.00 MB, total = 8235802624.00 MB release = success GPU memory usage: used = 1639157760.00 MB, free = 6596644864.00 MB, total = 8235802624.00 MB allocation = success GPU memory usage: used = 1684197376.00 MB, free = 6551605248.00 MB, total = 8235802624.00 MB release = success GPU memory usage: used = 1639059456.00 MB, free = 6596743168.00 MB, total = 8235802624.00 MB allocation = success GPU memory usage: used = 2695561216.00 MB, free = 5540241408.00 MB, total = 8235802624.00 MB release = success GPU memory usage: used = 1639157760.00 MB, free = 6596644864.00 MB, total = 8235802624.00 MB logout

But, when I use the DEBUG button (a small beetle icon) to step through this program, the following situation appears.

Warning: Adjusting return value of linux_common_core_of_thread (pid=4339, tid=4356). core = 4 >= num_cores = 4! allocation = success GPU memory usage: used = 3162685440.00 MB, free = 5073117184.00 MB, total = 8235802624.00 MB release = success GPU memory usage: used = 3162685440.00 MB, free = 5073117184.00 MB, total = 8235802624.00 MB allocation = success GPU memory usage: used = 4238012416.00 MB, free = 3997790208.00 MB, total = 8235802624.00 MB release = success GPU memory usage: used = 4238045184.00 MB, free = 3997757440.00 MB, total = 8235802624.00 MB allocation = success GPU memory usage: used = 5314039808.00 MB, free = 2921762816.00 MB, total = 8235802624.00 MB release = success GPU memory usage: used = 5314039808.00 MB, free = 2921762816.00 MB, total = 8235802624.00 MB allocation = success GPU memory usage: used = 6389841920.00 MB, free = 1845960704.00 MB, total = 8235802624.00 MB release = success GPU memory usage: used = 6389841920.00 MB, free = 1845960704.00 MB, total = 8235802624.00 MB allocation = success GPU memory usage: used = 7465455616.00 MB, free = 770347008.00 MB, total = 8235802624.00 MB release = success GPU memory usage: used = 7465582592.00 MB, free = 770220032.00 MB, total = 8235802624.00 MB allocation = success GPU memory usage: used = 8115265536.00 MB, free = 120537088.00 MB, total = 8235802624.00 MB release = success GPU memory usage: used = 8115392512.00 MB, free = 120410112.00 MB, total = 8235802624.00 MB logout

Under DEBUG mode, my tx2 is shut down.

I don’t know if I have stated my problem clearly this time. Hope you can understand. Say again, I use the Nsight Eclipse to debug my program remotely.

AastaLLL · July 26, 2018, 8:21am

Hi,

Sorry for the missing.

Original, we think that it may be some issue in our CUDA driver.
And use a sample to make sure we are at the same pace.

We try this sample with cuda-gdb (Nsight) today and be able to reproduce this issue.
This problem is reported to our internal core team already.

Sorry to keep you waiting. We will update further information once we get feedback.
Thanks.

phone_25852282 · November 15, 2019, 1:46am

Is there any further update to solve this problem?

AastaLLL · December 2, 2019, 8:43am

Hi,

This issue is fixed in our CUDA10.1.
Please pay attention to our announcement for the future release.

Thanks.

Topic		Replies	Views
cudaMalloc KILLED on tx2, and the memory can not be cudaFree real CUDA Programming and Performance	8	703	July 18, 2018
GPU out of memory when the total ram usage is 2.8G Jetson TX2	28	18502	October 18, 2021
CudaMalloc on Vista : strange behaviour Works on XP, Fails on Vista CUDA Programming and Performance	6	12258	July 1, 2009
cudaMalloc error in big loop CUDA Programming and Performance	12	15587	May 21, 2008
Question about GPU Memory Overhead with Cudamallocmanaged CUDA Programming and Performance	7	978	August 21, 2024
Cuda Memory Usage TX1 Jetson TX1	8	4526	December 16, 2015
Memory leak with cudagraph? CUDA Programming and Performance	4	229	June 10, 2024
cudaFree is returning an unrecognised error code CUDA Programming and Performance	10	7910	March 13, 2009
Slow cudaMalloc (~1.5s) and slow mem access there, allocating nearly whole memory, with WDDM CUDA Programming and Performance	0	1090	June 18, 2014
using cudaMalloc and cudaFree within a loop unspecified launch failure! CUDA Programming and Performance	21	37670	April 23, 2009

cudaMalloc KILLED on tx2, and the memory can not be cudaFree real

Related topics