Leaking about 1 GB/day in kernel memory

mettlerconsulting · December 20, 2020, 12:48am

I have a GeForce RTX 2070 Super on a Linux Certified laptop running
CentOS Linux release 8.2.2004 (Core) 
Kernel 4.18.0-193.14.2.el8_2.x86_64
NVIDIA Driver Version: 450.57

The kernel leaks memory, maybe 1 GB a day.  After a few days,
I have to reboot.  I'm suspicious the nvidia device driver
may be involved although I am not doing any heavy graphics.
As far as I can tell, my graphics/display is working fine.

Been looking for release notes or bugs lists that might 
discuss this, but haven't found anything.  Maybe google is
letting me down.  Could this be a known problem?  With a 
fix in the works?  Or maybe I've unintentionally ended up 
test piloting a new hardware/software combination.

nvidia-bug-report.log.gz (1.6 MB)

generix · December 20, 2020, 1:16am

No known memory leaks that drastic, at least to me. How did you identify it’s the kernel mem leaking? Statistics available?

mettlerconsulting · December 20, 2020, 2:40am

Memory use report by top as 'used' keeps ratcheting upwards.
Ditto memory used by report by free.
Ditto MemAvailable reported by /proc/meminfo.
Stopping all applications, or logging out, of course reduces used, and
increases MemAvailable, but not quite back to where it was the day before.
I'm been wondering about this for a couple months and think I've
eliminated tmpfs, slab memeory, shared memory, and memory reported for
individual user processes.  They all look reasonable.
The "Mem:" line from free, reported every hour, even at night when the
laptop is basically idle, looks like

                     total        used        free      shared  buff/cache   available
Mem:          31828       22778         619         338        8431        8252
Mem:          31828       22947         443         338        8437        8082
Mem:          31828       23049         343         338        8435        7981
Mem:          31828       23000         408         338        8420        8030
Mem:          31828       23103         377         338        8348        7927
Mem:          31828       23300         308         338        8220        7730
Mem:          31828       23147         500         338        8180        7883
Mem:          31828       23285         400         338        8143        7745
Mem:          31828       23489         275         338        8063        7541

Do you know of anybody who might have tried Centos 8, Kernel
4.18.0-193.14.2.el8_2.x86_64, and nvidia driver 450.57?  Would you
expect a combination like this to work?  I'm worried I've accidently
gotten out on the bleeding edge.  Is anybody else seeing a similar
behavior?

If there are specific statistics that might shed light I can try to
collect them.

generix · December 20, 2020, 2:11pm

RHEL and clones like Centos/Alibaba Linux/Scientific Linux + nvidia is a very common setup in science and compute clouds. So no problems to expect from that.
Please check first if there’s a kernel update available by running system update.
To start kernel memory analysis, you should look into turning on kmem tracing and using the kmemleak module. Those should give you a hint on where to look at.

mettlerconsulting · December 27, 2020, 3:41pm

Follow up for anybody who finds this thread.  I was unsuccessful turning
on kmemleak (I think it has to be compiled into the kernel), and ran into
problems updating the kernel as well.

With the input that I had a standard configuration and there were no
known large nvidia memory leaks, we looked further afield.  We
installed acpid.x86_64 to clean up an nvidia warning, noticed a
hyper-active kworker/kacpid process, and started looking for an acpi
memory leak angle.  This led us to try

   echo "disable" > /sys/firmware/acpi/interrupts/gpe6F

as a work around to a possible acpi memory leak and low and behold,
the system continued to run fine and the leak seems to have stopped.
Likely no nvidia angle to this at all!  Will continue to monitor.

Topic		Replies	Views
Huge memory leak CUDA Programming and Performance	16	5900	July 27, 2016
Nvidia driver - huge memory leak on GeForce 410M Linux	1	1319	February 1, 2017
Google Chrome with 3D window leaks kernel memory Linux kernel	0	64	September 26, 2025
Memory leak for gnome-shell for gdm user [all driver versions] Linux	16	13546	December 4, 2020
Linux Driver 470.63 with T1000 Card memory leak Linux	4	601	November 26, 2021
Fedora 27: NVIDIA driver using 50%(1GB) memory on boot Linux	10	1305	January 6, 2018
[SOLVED] Memory leak in kwin and xorg after Xorg security updates Linux	16	9718	June 7, 2015
Memory usage increases overtime Vulkan	3	1499	June 27, 2016
Kmalloc-128 & kernfs_node_cache leak on Orin NX 16GB (JP 6.1) Jetson Orin NX kernel , nvbugs	17	215	February 9, 2026
Still getting kernel crashdumps on RHEL7.8 + 440.82 in nvkms_alloc Linux hw , kernel , nvbugs	7	851	April 9, 2021

Leaking about 1 GB/day in kernel memory

Related topics