Possible memory leak in the 510.60 linux driver

Hello,

Some time ago (a couple of weeks maybe?) I noticed that all my programs using graphics libraries (Vulkan, SFML, Allegro) suddenly started leaking memory on exit (as reported by Asan). I was hoping the problem would go away after some system upgrade, but so far it hasn’t.

Apprently, creating and immediately destroying a Vulkan instance is enough to reproduce the leak. Below is a minimal example in C (compiled with clang vkleak.c -o vkleak -Wall -lvulkan -fsanitize=address):

#include <vulkan/vulkan.h>
#include <stdio.h>

int main(void)
{
	VkInstance inst;
	VkApplicationInfo app_info = {
		.sType = VK_STRUCTURE_TYPE_APPLICATION_INFO,
		.pNext = NULL,
		.pApplicationName = "leak_test",
		.applicationVersion = VK_MAKE_VERSION(0, 0, 1),
		.pEngineName = "leak_test",
		.engineVersion = VK_MAKE_VERSION(0, 0, 1),
		.apiVersion = VK_API_VERSION_1_0,
	};
	
	VkInstanceCreateInfo create_info = {};
	create_info.sType = VK_STRUCTURE_TYPE_INSTANCE_CREATE_INFO;
	create_info.pNext = NULL;
	create_info.pApplicationInfo = &app_info;
	create_info.enabledExtensionCount = 0;
	create_info.ppEnabledExtensionNames = NULL;
	create_info.enabledLayerCount = 0;
	create_info.ppEnabledLayerNames = NULL;

	printf("creating Vulkan instance...\n");
	if (vkCreateInstance(&create_info, NULL, &inst) != VK_SUCCESS)
		printf("failed to create Vulkan instance\n");
	
	vkDestroyInstance(inst, NULL);
	printf("destroyed...\n");
	
	return 0;
}

The sanitizer reports a loss of 262524 bytes across 1188 allocations from <unknown module>. The leaks can be further traced down by attaching the following code as a shared library (clang dlclose_hack.c -o libdlclose_hack.so -Wall -shared -g):

int dlclose(void *ptr)
{
	return 0;
}

Leak report with the shared library:

$ LD_PRELOAD="./libdlclose_hack.so" ./vkleak
creating Vulkan instance...
destroyed...

=================================================================
==14459==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 6304 byte(s) in 4 object(s) allocated from:
    #0 0x5581abd6cfa9 in __interceptor_calloc (/home/j/Desktop/vkleak/vkleak+0xcafa9)
    #1 0x7f195d66626f  (/usr/lib/libnvidia-glcore.so.510.60.02+0xe6626f)

Direct leak of 1024 byte(s) in 1 object(s) allocated from:
    #0 0x5581abd6d199 in __interceptor_realloc (/home/j/Desktop/vkleak/vkleak+0xcb199)
    #1 0x7f195d6654aa  (/usr/lib/libnvidia-glcore.so.510.60.02+0xe654aa)

Indirect leak of 143712 byte(s) in 449 object(s) allocated from:
    #0 0x5581abd6cfa9 in __interceptor_calloc (/home/j/Desktop/vkleak/vkleak+0xcafa9)
    #1 0x7f195d66626f  (/usr/lib/libnvidia-glcore.so.510.60.02+0xe6626f)

Indirect leak of 4921 byte(s) in 372 object(s) allocated from:
    #0 0x5581abd6cde9 in __interceptor_malloc (/home/j/Desktop/vkleak/vkleak+0xcade9)
    #1 0x7f195d665ddc  (/usr/lib/libnvidia-glcore.so.510.60.02+0xe65ddc)

Indirect leak of 752 byte(s) in 6 object(s) allocated from:
    #0 0x5581abd6d199 in __interceptor_realloc (/home/j/Desktop/vkleak/vkleak+0xcb199)
    #1 0x7f195d6654aa  (/usr/lib/libnvidia-glcore.so.510.60.02+0xe654aa)

SUMMARY: AddressSanitizer: 156713 byte(s) leaked in 832 allocation(s).

The leaks seem to come from libnvidia-glcore.so. The amount of leaked bytes seems to have changed, so I’m not quite sure what to think about that. Nevertheless, Valgrind notices some leaks as well:

$ LD_PRELOAD="./libdlclose_hack.so" valgrind --leak-check=full ./vkleak 
==15160== Memcheck, a memory error detector
==15160== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==15160== Using Valgrind-3.18.1 and LibVEX; rerun with -h for copyright info
==15160== Command: ./vkleak
==15160== 
creating Vulkan instance...
destroyed...
==15160== 
==15160== HEAP SUMMARY:
==15160==     in use at exit: 556,868 bytes in 3,217 blocks
==15160==   total heap usage: 14,712 allocs, 11,495 frees, 710,655,137 bytes allocated
==15160== 
==15160== 0 bytes in 4 blocks are definitely lost in loss record 1 of 2,618
==15160==    at 0x4845899: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==15160==    by 0x4005492: _dl_find_object_update (in /usr/lib/ld-linux-x86-64.so.2)
==15160==    by 0x400D8F7: dl_open_worker_begin (in /usr/lib/ld-linux-x86-64.so.2)
==15160==    by 0x4A82E17: _dl_catch_exception (in /usr/lib/libc.so.6)
==15160==    by 0x400CD7A: dl_open_worker (in /usr/lib/ld-linux-x86-64.so.2)
==15160==    by 0x4A82E17: _dl_catch_exception (in /usr/lib/libc.so.6)
==15160==    by 0x400D15C: _dl_open (in /usr/lib/ld-linux-x86-64.so.2)
==15160==    by 0x49B174B: dlopen_doit (in /usr/lib/libc.so.6)
==15160==    by 0x4A82E17: _dl_catch_exception (in /usr/lib/libc.so.6)
==15160==    by 0x4A82EE2: _dl_catch_error (in /usr/lib/libc.so.6)
==15160==    by 0x49B124D: _dlerror_run (in /usr/lib/libc.so.6)
==15160==    by 0x49B17D7: dlopen@@GLIBC_2.34 (in /usr/lib/libc.so.6)
==15160== 
==15160== 48 (24 direct, 24 indirect) bytes in 1 blocks are definitely lost in loss record 2,036 of 2,618
==15160==    at 0x484AA83: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==15160==    by 0x10C6626F: ??? (in /usr/lib/libnvidia-glcore.so.510.60.02)
==15160==    by 0x10C5A030: ??? (in /usr/lib/libnvidia-glcore.so.510.60.02)
==15160==    by 0x10C6B9E8: ??? (in /usr/lib/libnvidia-glcore.so.510.60.02)
==15160==    by 0xF24BD68: ??? (in /usr/lib/libGLX_nvidia.so.510.60.02)
==15160==    by 0xF2B20A5: ??? (in /usr/lib/libGLX_nvidia.so.510.60.02)
==15160==    by 0xF24B2E2: ??? (in /usr/lib/libGLX_nvidia.so.510.60.02)
==15160==    by 0x4E9768F: ???
==15160==    by 0x4005E98: call_init (in /usr/lib/ld-linux-x86-64.so.2)
==15160==    by 0x4005FCB: _dl_init (in /usr/lib/ld-linux-x86-64.so.2)
==15160==    by 0x4A82E74: _dl_catch_exception (in /usr/lib/libc.so.6)
==15160==    by 0x400CDDE: dl_open_worker (in /usr/lib/ld-linux-x86-64.so.2)
==15160== 
==15160== 128 bytes in 1 blocks are definitely lost in loss record 2,485 of 2,618
==15160==    at 0x484AA83: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==15160==    by 0x10C6626F: ??? (in /usr/lib/libnvidia-glcore.so.510.60.02)
==15160==    by 0x10C5B769: ??? (in /usr/lib/libnvidia-glcore.so.510.60.02)
==15160==    by 0x10C5A1EE: ??? (in /usr/lib/libnvidia-glcore.so.510.60.02)
==15160==    by 0x10C6A5A8: ??? (in /usr/lib/libnvidia-glcore.so.510.60.02)
==15160==    by 0xF24BD68: ??? (in /usr/lib/libGLX_nvidia.so.510.60.02)
==15160==    by 0xF2B20A5: ??? (in /usr/lib/libGLX_nvidia.so.510.60.02)
==15160==    by 0xF24B2E2: ??? (in /usr/lib/libGLX_nvidia.so.510.60.02)
==15160==    by 0x4E9768F: ???
==15160==    by 0x4005E98: call_init (in /usr/lib/ld-linux-x86-64.so.2)
==15160==    by 0x4005FCB: _dl_init (in /usr/lib/ld-linux-x86-64.so.2)
==15160==    by 0x4A82E74: _dl_catch_exception (in /usr/lib/libc.so.6)
==15160== 
==15160== 571 (128 direct, 443 indirect) bytes in 1 blocks are definitely lost in loss record 2,535 of 2,618
==15160==    at 0x484AA83: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==15160==    by 0x10C6626F: ??? (in /usr/lib/libnvidia-glcore.so.510.60.02)
==15160==    by 0x10C5B769: ??? (in /usr/lib/libnvidia-glcore.so.510.60.02)
==15160==    by 0x10C58718: ??? (in /usr/lib/libnvidia-glcore.so.510.60.02)
==15160==    by 0x10C59D9C: ??? (in /usr/lib/libnvidia-glcore.so.510.60.02)
==15160==    by 0x10C6A572: ??? (in /usr/lib/libnvidia-glcore.so.510.60.02)
==15160==    by 0xF24BD68: ??? (in /usr/lib/libGLX_nvidia.so.510.60.02)
==15160==    by 0xF2B20A5: ??? (in /usr/lib/libGLX_nvidia.so.510.60.02)
==15160==    by 0xF24B2E2: ??? (in /usr/lib/libGLX_nvidia.so.510.60.02)
==15160==    by 0x4E9768F: ???
==15160==    by 0x4005E98: call_init (in /usr/lib/ld-linux-x86-64.so.2)
==15160==    by 0x4005FCB: _dl_init (in /usr/lib/ld-linux-x86-64.so.2)
==15160== 
==15160== 28,133 (6,024 direct, 22,109 indirect) bytes in 1 blocks are definitely lost in loss record 2,612 of 2,618
==15160==    at 0x484AA83: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==15160==    by 0x10C6626F: ??? (in /usr/lib/libnvidia-glcore.so.510.60.02)
==15160==    by 0x10C5D1A5: ??? (in /usr/lib/libnvidia-glcore.so.510.60.02)
==15160==    by 0x10C58700: ??? (in /usr/lib/libnvidia-glcore.so.510.60.02)
==15160==    by 0x10C59D9C: ??? (in /usr/lib/libnvidia-glcore.so.510.60.02)
==15160==    by 0x10C6A572: ??? (in /usr/lib/libnvidia-glcore.so.510.60.02)
==15160==    by 0xF24BD68: ??? (in /usr/lib/libGLX_nvidia.so.510.60.02)
==15160==    by 0xF2B20A5: ??? (in /usr/lib/libGLX_nvidia.so.510.60.02)
==15160==    by 0xF24B2E2: ??? (in /usr/lib/libGLX_nvidia.so.510.60.02)
==15160==    by 0x4E9768F: ???
==15160==    by 0x4005E98: call_init (in /usr/lib/ld-linux-x86-64.so.2)
==15160==    by 0x4005FCB: _dl_init (in /usr/lib/ld-linux-x86-64.so.2)
==15160== 
==15160== 127,842 (1,024 direct, 126,818 indirect) bytes in 1 blocks are definitely lost in loss record 2,618 of 2,618
==15160==    at 0x484ACD3: realloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==15160==    by 0x10C654AA: ??? (in /usr/lib/libnvidia-glcore.so.510.60.02)
==15160==    by 0x10C5B800: ??? (in /usr/lib/libnvidia-glcore.so.510.60.02)
==15160==    by 0x10C58E6E: ??? (in /usr/lib/libnvidia-glcore.so.510.60.02)
==15160==    by 0x10C6B80F: ??? (in /usr/lib/libnvidia-glcore.so.510.60.02)
==15160==    by 0xF24BD68: ??? (in /usr/lib/libGLX_nvidia.so.510.60.02)
==15160==    by 0xF2B20A5: ??? (in /usr/lib/libGLX_nvidia.so.510.60.02)
==15160==    by 0xF24B2E2: ??? (in /usr/lib/libGLX_nvidia.so.510.60.02)
==15160==    by 0x4E9768F: ???
==15160==    by 0x4005E98: call_init (in /usr/lib/ld-linux-x86-64.so.2)
==15160==    by 0x4005FCB: _dl_init (in /usr/lib/ld-linux-x86-64.so.2)
==15160==    by 0x4A82E74: _dl_catch_exception (in /usr/lib/libc.so.6)
==15160== 
==15160== LEAK SUMMARY:
==15160==    definitely lost: 7,328 bytes in 9 blocks
==15160==    indirectly lost: 149,394 bytes in 827 blocks
==15160==      possibly lost: 0 bytes in 0 blocks
==15160==    still reachable: 400,114 bytes in 2,380 blocks
==15160==         suppressed: 32 bytes in 1 blocks
==15160== Reachable blocks (those to which a pointer was found) are not shown.
==15160== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==15160== 
==15160== For lists of detected and suppressed errors, rerun with: -s
==15160== ERROR SUMMARY: 6 errors from 6 contexts (suppressed: 0 from 0)

As you can see, in both cases the leaks are associated with Nvidia libraries. My friend can reproduce the leaks on his computer with a GTX970 and the same driver version. The issue is not present on my laptop with an AMD GPU, nor another PC with an Intel IGPU.

I am attaching the source code to reproduce the leaks with Vulkan and SMFL as well as the output of nvidia-bug-report.sh. Can you please look into this?

Thank you in advance!

nvleaks.tar (10 KB)
nvidia-bug-report.log.gz (322.0 KB)

Basic system info:

$ inxi -SGC
System:
  Host: jasus Kernel: 5.15.32-1-MANJARO arch: x86_64 bits: 64
    Desktop: KDE Plasma v: 5.24.4 Distro: Manjaro Linux
CPU:
  Info: 6-core model: Intel Core i7-8700K bits: 64 type: MT MCP cache:
    L2: 1.5 MiB
  Speed (MHz): avg: 2974 min/max: 800/4700 cores: 1: 800 2: 4364 3: 4400
    4: 3603 5: 3501 6: 3485 7: 2988 8: 1877 9: 1022 10: 800 11: 4479 12: 4369
Graphics:
  Device-1: NVIDIA TU104 [GeForce RTX 2070 SUPER] driver: nvidia v: 510.60.02
  Display: x11 server: X.Org v: 1.21.1.3 with: Xwayland v: 22.1.1 driver:
    X: loaded: nvidia gpu: nvidia,nvidia-nvswitch resolution:
    1: 2560x1440~144Hz 2: 2560x1440~144Hz
  OpenGL: renderer: NVIDIA GeForce RTX 2070 SUPER/PCIe/SSE2
    v: 4.6.0 NVIDIA 510.60.02

Thanks for posting this issue, appreciate you providing the test code.
We have an internal tracking bug (3260444) for the memory leak, and will keep you updated on progress.

1 Like