Huge memory leak

GrzegorzDabrowski · July 25, 2016, 1:27pm

Hi,

I have strange memory leak in linux (4.4.0-31-generic), driver 352.93, Tesla K20m.
Hello world like program allocates about 70MB of memory in OS and doesn’t free after program exit.

Here is the code:
#include <cuda.h>

int main(int argc, char* argv)
{
//cudaSetDevice(0);
int *XiXj_d;

cudaMalloc(&XiXj_d, 1 * sizeof(int));
cudaFree(XiXj_d);

}

Run:
user@cuda2:~/bug$ free -m
total used free shared buffers cached
Mem: 15924 380 15544 1 29 118
-/+ buffers/cache: 232 15692
Swap: 0 0 0

user@cuda2:~/bug$ ./bug

user@cuda2:~/bug$ free -m
total used free shared buffers cached
Mem: 15924 459 15464 1 29 125
-/+ buffers/cache: 305 15619
Swap: 0 0 0

It seems that driver or linux kernel doesn’t free memory. Any idea what is going on here?

Robert_Crovella · July 25, 2016, 1:32pm

Based on your driver version (352.93) I imagine you are using CUDA 7.5

CUDA 7.5 is not compatible with kernel 4.4

The official support matrix for CUDA 7.5 is listed here:

[url]Installation Guide Linux :: CUDA Toolkit Documentation

I would recommend that you switch to an officially supported setup.

CUDA 8RC1 advertises support for Kernel version 4.4 on Ubuntu 16.04

rogertrullo · July 25, 2016, 3:56pm

Hi,
I am experiencing the same problem when I use Caffe.
I have a Tesla K80, Ubuntu 14.04.

GrzegorzDabrowski · July 26, 2016, 7:32am

txbob,

Unfortunately switching to the official kernel (3.13.0-92-generic #139-Ubuntu) for Ubuntu 14.04 doesn’t help. The problem still exists.

Robert_Crovella · July 26, 2016, 2:37pm

Does the memory decrease by 70MB each time you run the program? Or does this only happen once?

I wasn’t able to observe it on CUDA 7.5 on Ubuntu 14.04:

$ cat t1.cu
//#include <cuda.h>

int main(int argc, char* argv[])
{
//cudaSetDevice(0);
int *XiXj_d;

cudaMalloc(&XiXj_d, 1 * sizeof(int));
cudaFree(XiXj_d);
}
$ nvcc t1.cu -o t1
$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2015 NVIDIA Corporation
Built on Tue_Aug_11_14:27:32_CDT_2015
Cuda compilation tools, release 7.5, V7.5.17
$ free -m
             total       used       free     shared    buffers     cached
Mem:         24105       2156      21949          6         95       1567
-/+ buffers/cache:        493      23612
Swap:        24571          0      24571
bob@03c212a19ace:~/misc$ ./t1
bob@03c212a19ace:~/misc$ free -m
             total       used       free     shared    buffers     cached
Mem:         24105       2154      21951          6         95       1567
-/+ buffers/cache:        491      23614
Swap:        24571          0      24571
$ uname -a
Linux 03c212a19ace 3.13.0-32-generic #57-Ubuntu SMP Tue Jul 15 03:51:08 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
$

mconigliaro · July 26, 2016, 4:42pm

I just compiled and ran the test program, and I’m seeing the same thing with Cuda 6.5, NVIDIA driver 346.35, and Ubuntu 14.04 (3.13.0-92-generic). My system seems to lose ~20MB every time I run the program:

$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2014 NVIDIA Corporation
Built on Thu_Jul_17_21:41:27_CDT_2014
Cuda compilation tools, release 6.5, V6.5.12
$ nvcc t1.cu -o t1
$ free -m
             total       used       free     shared    buffers     cached
Mem:         15039       9796       5243          0        183        169
-/+ buffers/cache:       9442       5597
Swap:            0          0          0
$ ./t1
$ free -m
             total       used       free     shared    buffers     cached
Mem:         15039       9815       5224          0        183        169
-/+ buffers/cache:       9461       5578
Swap:            0          0          0

I wonder if this is the cause of the memory leak I’ve been dealing with:

linux - Baffling Memory leak. What is using ~10GB of memory on this system? - Server Fault?

We have other (older) machines that seem to be running fine, but maybe something changed recently on Ubuntu?

mconigliaro · July 26, 2016, 4:59pm

I’m not a Cuda developer, but FWIW, I just added “cudaDeviceReset();” to the end of the test program and recompiled so I could test with cuda-memcheck, and that seems to have made the leak go away:

$ cuda-memcheck --tool memcheck --leak-check full ./t1
========= CUDA-MEMCHECK
========= LEAK SUMMARY: 0 bytes leaked in 0 allocations
========= ERROR SUMMARY: 0 errors
$ free -m
             total       used       free     shared    buffers     cached
Mem:         15039       9933       5106          0        184        169
-/+ buffers/cache:       9579       5460
Swap:            0          0          0
$ ./t1
$ free -m
             total       used       free     shared    buffers     cached
Mem:         15039       9933       5106          0        184        169
-/+ buffers/cache:       9579       5460
Swap:            0          0          0

rogertrullo · July 26, 2016, 5:19pm

if you kill the process without letting it finish, the error will still be there, right?

mconigliaro · July 26, 2016, 8:10pm

I just came across this: ram - Out of Memory Issue - Ask Ubuntu

After upgrading to version 367.35 of the NVIDIA driver, I can’t reproduce the problem with the test program anymore. Now I just have to wait and see if this fixes the memory leak on my production servers…

rogertrullo · July 26, 2016, 8:18pm

Hi @mcongliaro, what GPU do you have?

mconigliaro · July 26, 2016, 8:22pm

This is a g2.2xlarge instance on EC2.

# nvidia-smi -q

==============NVSMI LOG==============

Timestamp                           : Tue Jul 26 20:21:38 2016
Driver Version                      : 367.35

Attached GPUs                       : 1
GPU 0000:00:03.0
    Product Name                    : GRID K520
    Product Brand                   : Grid
    Display Mode                    : Disabled
    Display Active                  : Disabled
    Persistence Mode                : Disabled
    Accounting Mode                 : Disabled
    Accounting Mode Buffer Size     : 1920
    Driver Model
        Current                     : N/A
        Pending                     : N/A
    Serial Number                   : 0321314043755
    GPU UUID                        : GPU-4f723e2d-a35f-51cc-bda4-5c1192b8c968
    Minor Number                    : 0
    VBIOS Version                   : 80.04.D4.00.03
    MultiGPU Board                  : No
    Board ID                        : 0x3
    GPU Part Number                 : 900-12055-0020-000
    Inforom Version
        Image Version               : 2055.0052.00.04
        OEM Object                  : 1.1
        ECC Object                  : N/A
        Power Management Object     : N/A
    GPU Operation Mode
        Current                     : N/A
        Pending                     : N/A
    GPU Virtualization Mode
        Virtualization mode         : Pass-Through
    PCI
        Bus                         : 0x00
        Device                      : 0x03
        Domain                      : 0x0000
        Device Id                   : 0x118A10DE
        Bus Id                      : 0000:00:03.0
        Sub System Id               : 0x101410DE
        GPU Link Info
            PCIe Generation
                Max                 : 3
                Current             : 3
            Link Width
                Max                 : 16x
                Current             : 16x
        Bridge Chip
            Type                    : N/A
            Firmware                : N/A
        Replays since reset         : 0
        Tx Throughput               : N/A
        Rx Throughput               : N/A
    Fan Speed                       : N/A
    Performance State               : P0
    Clocks Throttle Reasons
        Idle                        : Not Active
        Applications Clocks Setting : Not Active
        SW Power Cap                : Not Active
        HW Slowdown                 : Not Active
        Sync Boost                  : Not Active
        Unknown                     : Not Active
    FB Memory Usage
        Total                       : 4036 MiB
        Used                        : 0 MiB
        Free                        : 4036 MiB
    BAR1 Memory Usage
        Total                       : 128 MiB
        Used                        : 2 MiB
        Free                        : 126 MiB
    Compute Mode                    : Default
    Utilization
        Gpu                         : 0 %
        Memory                      : 0 %
        Encoder                     : 0 %
        Decoder                     : 0 %
    Ecc Mode
        Current                     : N/A
        Pending                     : N/A
    ECC Errors
        Volatile
            Single Bit
                Device Memory       : N/A
                Register File       : N/A
                L1 Cache            : N/A
                L2 Cache            : N/A
                Texture Memory      : N/A
                Texture Shared      : N/A
                Total               : N/A
            Double Bit
                Device Memory       : N/A
                Register File       : N/A
                L1 Cache            : N/A
                L2 Cache            : N/A
                Texture Memory      : N/A
                Texture Shared      : N/A
                Total               : N/A
        Aggregate
            Single Bit
                Device Memory       : N/A
                Register File       : N/A
                L1 Cache            : N/A
                L2 Cache            : N/A
                Texture Memory      : N/A
                Texture Shared      : N/A
                Total               : N/A
            Double Bit
                Device Memory       : N/A
                Register File       : N/A
                L1 Cache            : N/A
                L2 Cache            : N/A
                Texture Memory      : N/A
                Texture Shared      : N/A
                Total               : N/A
    Retired Pages
        Single Bit ECC              : N/A
        Double Bit ECC              : N/A
        Pending                     : N/A
    Temperature
        GPU Current Temp            : 52 C
        GPU Shutdown Temp           : 97 C
        GPU Slowdown Temp           : 92 C
    Power Readings
        Power Management            : Supported
        Power Draw                  : 39.44 W
        Power Limit                 : 125.00 W
        Default Power Limit         : 125.00 W
        Enforced Power Limit        : 125.00 W
        Min Power Limit             : 85.00 W
        Max Power Limit             : 130.00 W
    Clocks
        Graphics                    : 797 MHz
        SM                          : 797 MHz
        Memory                      : 2500 MHz
        Video                       : 810 MHz
    Applications Clocks
        Graphics                    : N/A
        Memory                      : N/A
    Default Applications Clocks
        Graphics                    : N/A
        Memory                      : N/A
    Max Clocks
        Graphics                    : 797 MHz
        SM                          : 797 MHz
        Memory                      : 2500 MHz
        Video                       : 810 MHz
    Clock Policy
        Auto Boost                  : N/A
        Auto Boost Default          : N/A
    Processes                       : None

rogertrullo · July 26, 2016, 8:28pm

This is a g2.2xlarge instance on EC2.

# nvidia-smi -q

==============NVSMI LOG==============

Timestamp                           : Tue Jul 26 20:21:38 2016
Driver Version                      : 367.35

Attached GPUs                       : 1
GPU 0000:00:03.0
    Product Name                    : GRID K520
    Product Brand                   : Grid
    Display Mode                    : Disabled
    Display Active                  : Disabled
    Persistence Mode                : Disabled
    Accounting Mode                 : Disabled
    Accounting Mode Buffer Size     : 1920
    Driver Model
        Current                     : N/A
        Pending                     : N/A
    Serial Number                   : 0321314043755
    GPU UUID                        : GPU-4f723e2d-a35f-51cc-bda4-5c1192b8c968
    Minor Number                    : 0
    VBIOS Version                   : 80.04.D4.00.03
    MultiGPU Board                  : No
    Board ID                        : 0x3
    GPU Part Number                 : 900-12055-0020-000
    Inforom Version
        Image Version               : 2055.0052.00.04
        OEM Object                  : 1.1
        ECC Object                  : N/A
        Power Management Object     : N/A
    GPU Operation Mode
        Current                     : N/A
        Pending                     : N/A
    GPU Virtualization Mode
        Virtualization mode         : Pass-Through
    PCI
        Bus                         : 0x00
        Device                      : 0x03
        Domain                      : 0x0000
        Device Id                   : 0x118A10DE
        Bus Id                      : 0000:00:03.0
        Sub System Id               : 0x101410DE
        GPU Link Info
            PCIe Generation
                Max                 : 3
                Current             : 3
            Link Width
                Max                 : 16x
                Current             : 16x
        Bridge Chip
            Type                    : N/A
            Firmware                : N/A
        Replays since reset         : 0
        Tx Throughput               : N/A
        Rx Throughput               : N/A
    Fan Speed                       : N/A
    Performance State               : P0
    Clocks Throttle Reasons
        Idle                        : Not Active
        Applications Clocks Setting : Not Active
        SW Power Cap                : Not Active
        HW Slowdown                 : Not Active
        Sync Boost                  : Not Active
        Unknown                     : Not Active
    FB Memory Usage
        Total                       : 4036 MiB
        Used                        : 0 MiB
        Free                        : 4036 MiB
    BAR1 Memory Usage
        Total                       : 128 MiB
        Used                        : 2 MiB
        Free                        : 126 MiB
    Compute Mode                    : Default
    Utilization
        Gpu                         : 0 %
        Memory                      : 0 %
        Encoder                     : 0 %
        Decoder                     : 0 %
    Ecc Mode
        Current                     : N/A
        Pending                     : N/A
    ECC Errors
        Volatile
            Single Bit
                Device Memory       : N/A
                Register File       : N/A
                L1 Cache            : N/A
                L2 Cache            : N/A
                Texture Memory      : N/A
                Texture Shared      : N/A
                Total               : N/A
            Double Bit
                Device Memory       : N/A
                Register File       : N/A
                L1 Cache            : N/A
                L2 Cache            : N/A
                Texture Memory      : N/A
                Texture Shared      : N/A
                Total               : N/A
        Aggregate
            Single Bit
                Device Memory       : N/A
                Register File       : N/A
                L1 Cache            : N/A
                L2 Cache            : N/A
                Texture Memory      : N/A
                Texture Shared      : N/A
                Total               : N/A
            Double Bit
                Device Memory       : N/A
                Register File       : N/A
                L1 Cache            : N/A
                L2 Cache            : N/A
                Texture Memory      : N/A
                Texture Shared      : N/A
                Total               : N/A
    Retired Pages
        Single Bit ECC              : N/A
        Double Bit ECC              : N/A
        Pending                     : N/A
    Temperature
        GPU Current Temp            : 52 C
        GPU Shutdown Temp           : 97 C
        GPU Slowdown Temp           : 92 C
    Power Readings
        Power Management            : Supported
        Power Draw                  : 39.44 W
        Power Limit                 : 125.00 W
        Default Power Limit         : 125.00 W
        Enforced Power Limit        : 125.00 W
        Min Power Limit             : 85.00 W
        Max Power Limit             : 130.00 W
    Clocks
        Graphics                    : 797 MHz
        SM                          : 797 MHz
        Memory                      : 2500 MHz
        Video                       : 810 MHz
    Applications Clocks
        Graphics                    : N/A
        Memory                      : N/A
    Default Applications Clocks
        Graphics                    : N/A
        Memory                      : N/A
    Max Clocks
        Graphics                    : 797 MHz
        SM                          : 797 MHz
        Memory                      : 2500 MHz
        Video                       : 810 MHz
    Clock Policy
        Auto Boost                  : N/A
        Auto Boost Default          : N/A
    Processes                       : None

Thanks, is it the same as in your servers?

mconigliaro · July 26, 2016, 9:23pm

Yeah, I was testing on an instance that looks exactly like my production servers.

GrzegorzDabrowski · July 27, 2016, 5:30am

Yes, each time. This is serious issue because it is very easy to hang a server from unprivileged user account. Linux OOM killer doesn’t help and server has to be rebooted.

Robert_Crovella · July 27, 2016, 5:34am

I wasn’t able to reproduce it with CUDA 7.5, Ubuntu 14.04, and driver 352.93, which seems to match your setup. So there seems to be some other missing piece to the puzzle. Anyway others here have suggested that they see different behavior with different drivers, so you might try some other newer drivers besides 352.93.

Beyond that you can always file a bug a developer.nvidia.com

GrzegorzDabrowski · July 27, 2016, 5:39am

It looks like the same situation. I also checked slabtop, but without answer. I believe that the bug is somewhere linux kernel or nvidia driver.

Surprising is cudaDeviceReset() resolves this issue, but is seems to be workaround. Thanks you mconigliaro for the solution!

rogertrullo · July 27, 2016, 10:14am

I have the problem in a server without root access. I usually run a process with no hup, but sometimes I just need to kill the process.
If I make a call to cudaDeviceReset() (in a new process) after killing the process, will the lost memory be recovered?? Thanks

Topic		Replies	Views
FAO: Nvidia Engineers:- Memory Leak in cudaMemcpyAsync Only occurs on Host To Device memory transfer CUDA Programming and Performance	4	5873	August 18, 2010
Memory leak running CUDA C program Jetson Orin Nano cuda	8	54	December 11, 2024
trying to get a tesla k10 online. cuda_5.5.22_linux_64.run fails Linux	18	5802	February 16, 2014
Cuda + omp = big slowdown CUDA Programming and Performance	4	1310	August 20, 2013
[Multiple GPUs / Processes] CUDA Memory De/Allocation Slow CUDA Programming and Performance	25	9600	December 4, 2017
CUDA test performance issue CUDA Programming and Performance	7	1446	November 24, 2014
`cuCtxCreate` and `cuCtxDestroy` pairs have a memory leak CUDA Programming and Performance cuda , problem	9	1213	January 11, 2024
why "all CUDA-capable devices are busy or unavailable" ? CUDA Programming and Performance	34	64353	April 20, 2011
Ubuntu 14.04: optimus + CUDA Linux	16	43192	March 10, 2016
Frequent catastrophic crashes on a multiple GPU machine CUDA Setup and Installation	8	4694	October 22, 2017

Huge memory leak

Related topics