Running our cuda application fills up memory with SUnreclaim
kmalloc-32 SLABS over time.
grep SUnreclaim /proc/meminfo
increased from 243696 KB to 1668540 KB within 24 hours. That’s about 70MB/h. The same software ran nicely for years, the same exact binaries for months.
One node uses 90 GB on these kmalloc-32 slabs. (uptime: 64 days)
We have 9 heterogeneous nodes, 4 of them show this problem.
All of the affected systems run driver 450.66.
All systems install drivers from rpmfusion.
3 affected systems run centos7 x86_64 on an intel dual socket CPU
1 affected system runs centos8 x86_64 on an AMD Ryzen Threadripper 3960X
any ideas?