Memory leak running CUDA C program

gengqi_yz · December 5, 2024, 11:13pm

I have discover memory leaks while running a CUDA C program. I reproduce the problem with a very simple program.
Because I will continuously receive different data, so I want CUDA program can always be running.
In each loop, I need to copy the data to the GPU, and then use several kernel functions for calculation.

I compile the code using the following command:
nvcc -o test2 test.cu

During the program run, the memory used by the process increasing. Why would this happen?

The version of CUDA:
ba25b5034b9519a41a1f1b7713ce8cd

AastaLLL · December 6, 2024, 3:29am

Hi,

We will give it a try and provide more info to you later.
Thanks.

AastaLLL · December 6, 2024, 7:10am

Hi,

Could you attach the test.cu source so we can give it a try?
Thanks.

huxinwei151 · December 6, 2024, 7:16am

include <stdio.h>
include <unistd.h>
include <stdlib.h>
include <string.h>
include <cuda_runtime.h>

global void Multip(float *a, float *b, float *c) {
int idx = blockIdx.x * blockDim.x + threadIdx.x;

c[idx] = a[idx] * b[idx];

}

global void Log(float *c) {
int idx = blockIdx.x * blockDim.x + threadIdx.x;

c[idx] = log10f(c[idx] * 5);

}

int main() {
int n = 512000;
int i = 0;
float *a, *b;
float *d_a, *d_b, *d_c;

cudaMalloc((void **)&d_a, n * sizeof(float));
cudaMalloc((void **)&d_b, n * sizeof(float));
cudaMalloc((void **)&d_c, n * sizeof(float));

a = (float *)malloc(n * sizeof(float));
b = (float *)malloc(n * sizeof(float));

for (i = 0; i < n; i++) {
    a[i] = i;
    b[i] = i + 1;
}

while(1){

    cudaMemcpy(d_a, a, n * sizeof(float), cudaMemcpyHostToDevice);
    cudaMemcpy(d_b, b, n * sizeof(float), cudaMemcpyHostToDevice);
    Multip<<<1000, 512>>>(d_a, d_b, d_c);
    Log<<<1000, 512>>>(d_c);
    usleep(200000);
}

free(a);
free(b);
cudaFree(d_a);
cudaFree(d_b);
cudaFree(d_c);

return 0;

}

AastaLLL · December 9, 2024, 10:37am

Hi,

We are not able to reproduce this issue in our environment.
Could you double-check on your side as well?

Thanks.

AastaLLL · December 10, 2024, 6:20am

Hi,

We tested this again and found that memory does increase a little in the beginning.
But after that, memory usage remains the same for hours.

Could you also help confirm whether you see a similar behavior on your side?
This should be expected and not a bug.

Start:

$ ps aux | grep test2
nvidia      6233  0.8  0.3 8037028 26876 pts/0   Sl+  02:53   0:00 ./test2
nvidia      6276  0.0  0.0   9016  1840 pts/1    S+   02:53   0:00 grep --color=auto test2
$ sudo tegrastats 
12-10-2024 02:54:07 RAM 1510/7620MB (lfb 4x4MB) SWAP 0/3810MB (cached 0MB) CPU [0%@1510,0%@1510,0%@1510,0%@1510,0%@1510,0%@1510] EMC_FREQ 0%@2133 GR3D_FREQ 0%@[624] NVDEC off NVJPG off NVJPG1 off VIC off OFA off APE 200 cpu@43.5C soc2@42.843C soc0@40.562C gpu@43.5C tj@43.5C soc1@42.062C VDD_IN 4654mW/4654mW VDD_CPU_GPU_CV 963mW/963mW VDD_SOC 1484mW/1484mW
12-10-2024 02:54:08 RAM 1510/7620MB (lfb 4x4MB) SWAP 0/3810MB (cached 0MB) CPU [0%@1510,0%@1510,0%@1510,0%@1510,0%@1510,0%@1510] EMC_FREQ 0%@2133 GR3D_FREQ 0%@[624] NVDEC off NVJPG off NVJPG1 off VIC off OFA off APE 200 cpu@43.343C soc2@42.843C soc0@40.656C gpu@43.625C tj@43.625C soc1@42.25C VDD_IN 4654mW/4654mW VDD_CPU_GPU_CV 963mW/963mW VDD_SOC 1484mW/1484mW

After 1 hour:

$ ps aux | grep test2
nvidia      6233  0.3  0.4 8037028 31232 pts/0   Sl+  02:53   0:13 ./test2
nvidia      6404  0.0  0.0   9016  1852 pts/1    S+   03:55   0:00 grep --color=auto test2
$ sudo tegrastats 
12-10-2024 03:55:16 RAM 1520/7620MB (lfb 4x4MB) SWAP 0/3810MB (cached 0MB) CPU [0%@1510,0%@1510,0%@1510,0%@1510,0%@1510,0%@1510] EMC_FREQ 0%@2133 GR3D_FREQ 0%@[624] NVDEC off NVJPG off NVJPG1 off VIC off OFA off APE 200 cpu@44.843C soc2@44.312C soc0@42.156C gpu@44.906C tj@44.906C soc1@43.687C VDD_IN 4654mW/4654mW VDD_CPU_GPU_CV 963mW/963mW VDD_SOC 1484mW/1484mW
12-10-2024 03:55:17 RAM 1520/7620MB (lfb 4x4MB) SWAP 0/3810MB (cached 0MB) CPU [0%@1510,0%@1510,0%@1510,0%@1510,0%@1510,0%@1510] EMC_FREQ 0%@2133 GR3D_FREQ 0%@[624] NVDEC off NVJPG off NVJPG1 off VIC off OFA off APE 200 cpu@45C soc2@44.375C soc0@42.218C gpu@45.031C tj@45C soc1@43.468C VDD_IN 4654mW/4654mW VDD_CPU_GPU_CV 963mW/963mW VDD_SOC 1484mW/1484mW

After 2 hours:

$ ps aux | grep test2
nvidia      6233  0.3  0.4 8037028 31232 pts/0   Sl+  02:53   0:25 ./test2
nvidia      6477  0.0  0.0   9016  1852 pts/1    S+   04:56   0:00 grep --color=auto test2
$ sudo tegrastats  
12-10-2024 04:56:24 RAM 1523/7620MB (lfb 5x4MB) SWAP 0/3810MB (cached 0MB) CPU [0%@1510,0%@1510,0%@1510,0%@1510,0%@1510,0%@1510] EMC_FREQ 0%@2133 GR3D_FREQ 0%@[624] NVDEC off NVJPG off NVJPG1 off VIC off OFA off APE 200 cpu@45.062C soc2@44.281C soc0@41.906C gpu@44.906C tj@45.062C soc1@43.468C VDD_IN 4654mW/4654mW VDD_CPU_GPU_CV 963mW/963mW VDD_SOC 1484mW/1484mW
12-10-2024 04:56:25 RAM 1523/7620MB (lfb 5x4MB) SWAP 0/3810MB (cached 0MB) CPU [3%@1510,0%@1510,0%@1510,0%@1510,0%@1510,0%@1510] EMC_FREQ 0%@2133 GR3D_FREQ 0%@[624] NVDEC off NVJPG off NVJPG1 off VIC off OFA off APE 200 cpu@44.812C soc2@44.187C soc0@42.062C gpu@44.812C tj@44.812C soc1@43.718C VDD_IN 4694mW/4674mW VDD_CPU_GPU_CV 963mW/963mW VDD_SOC 1484mW/1484mW

After 3 hours

$ ps aux | grep test2
nvidia      6233  0.3  0.4 8037028 31232 pts/0   Sl+  02:53   0:39 ./test2
nvidia      6567  0.0  0.0   9016  1844 pts/1    S+   06:09   0:00 grep --color=auto test2
$ sudo tegrastats 
12-10-2024 06:09:59 RAM 1521/7620MB (lfb 4x4MB) SWAP 0/3810MB (cached 0MB) CPU [0%@1510,0%@1510,0%@1510,1%@1510,0%@1510,0%@1510] EMC_FREQ 0%@2133 GR3D_FREQ 0%@[624] NVDEC off NVJPG off NVJPG1 off VIC off OFA off APE 200 cpu@44.968C soc2@44.312C soc0@42.156C gpu@45.281C tj@45.281C soc1@43.406C VDD_IN 4654mW/4654mW VDD_CPU_GPU_CV 963mW/963mW VDD_SOC 1484mW/1484mW
12-10-2024 06:10:00 RAM 1521/7620MB (lfb 4x4MB) SWAP 0/3810MB (cached 0MB) CPU [0%@1510,0%@1510,0%@1510,0%@1510,0%@1510,0%@1510]

Thanks.

huxinwei151 · December 10, 2024, 12:25pm

Yes, it’s the same. The RSS memory used by the test program increased after startup.
About after running for half an hour, the RSS memory no longer increased and remained unchanged.
I guess there is a maximum value and the maximum value is bigger as the program becomes more complex.
I’m wondering if there’s another way to reliably monitor memory changes. To analyze if my original program contains other memory problems

AastaLLL · December 11, 2024, 7:12am

Hi,

We also monitor system memory with tegrastats and no leakage is observed.
That initial increase could be attributed to the internally allocated data structures like staging buffer etc.

Thanks.

system · January 1, 2025, 2:54am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Huge memory leak CUDA Programming and Performance	16	5607	July 27, 2016
`cuCtxCreate` and `cuCtxDestroy` pairs have a memory leak CUDA Programming and Performance cuda , problem	9	1201	January 11, 2024
CUDA memory release Jetson Nano	14	6091	October 14, 2021
Always got this warning when nvprof cuda file "This can happen if device ran out of memory or if a device kernel was stopped due to an assertion" on just HellowWorld GPU CUDA Programming and Performance	9	2556	January 31, 2019
Number of kilobytes transferred to/from shared memory twice the expected CUDA Programming and Performance	12	702	September 29, 2018
FAO: Nvidia Engineers:- Memory Leak in cudaMemcpyAsync Only occurs on Host To Device memory transfer CUDA Programming and Performance	4	5870	August 18, 2010
problem with double precision unpredictable results Different run give differents errors or no error CUDA Programming and Performance	12	2793	September 10, 2010
Constant memory provides no improvement CUDA Programming and Performance cuda , algorithm	16	77	January 17, 2025
Tracking down CUDA illegal memory access CUDA Programming and Performance	1	1189	February 20, 2015
CUDA test performance issue CUDA Programming and Performance	7	1445	November 24, 2014

Memory leak running CUDA C program

Related topics