CUPTI profiling API memory leak

When building samples in the directory /usr/local/cuda/extras/CUPTI/samples/autorange_profiling, I included -Xcompiler -fsanitize=address in NVCCFLAGS. This was done so that I could detect any memory leaks during profiling.

However, after building, I encountered an error when attempting to execute auto_range_profiling.

==3113510==You are trying to dlopen a libnvperf_host.so shared library with RTLD_DEEPBIND flag which is incompatibe with sanitizer runtime (see https://github.com/google/sanitizers/issues/611 for details). If you want to run libnvperf_host.so library under sanitizers please remove RTLD_DEEPBIND from dlopen flags.

To address this issue, I have implemented a workaround by incorporating a dlopen hooker that can modify the RTLD_DEEPBIND flag.

#include <dlfcn.h>
#include <stdio.h>

typedef void* (*orig_dlopen_func_type)(const char*, int);
void* dlopen(const char* filename, int flags) {
  void* result;
  orig_dlopen_func_type original_dlopen;
  original_dlopen = (orig_dlopen_func_type)dlsym(((void*)-1l), "dlopen");
  if (flags & RTLD_DEEPBIND) {
    printf("Intercepted dlopen(%s, %d)\n", filename, flags);
    flags &= ~RTLD_DEEPBIND;
    printf("Adjusted flags to %d\n", flags);
  }
  result = (*original_dlopen)(filename, flags);
  return result;
}

To build the file, use the following command: gcc -shared -o dlopen_intercept.so hook_dlopen.c.

Next, run the program with the command sudo ASAN_OPTIONS=verify_asan_link_order=0 LD_PRELOAD=/media/nvme/xuzhi/cts/dlopen_intercept.so:/lib/aarch64-linux-gnu/libasan.so.5 ./auto_range_profiling. This should properly execute the binary.

According to the ASAN report, there appears to be a memory leak issue as follow.

Usage: ./auto_range_profiling [device_num] [metric_names comma separated]
CUDA Device Number: 0
Compute Capability of Device: 8.7
Intercepted dlopen(libnvperf_host.so, 266)
Adjusted flags to 258
Launching kernel: blocks 64, thread/block 256

Range Name                              Metric Name                                                                                         Metric Value
----------------------------------------------------------------------------------------------------------------------------------------------------------------
0                                       smsp__warps_launched.avg                                                                            8
1                                       smsp__warps_launched.avg                                                                            8

=================================================================
==3115379==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 685058 byte(s) in 610 object(s) allocated from:
    #0 0xffff84482540 in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cc:144
    #1 0xffff8181b7d8  (/usr/local/cuda/targets/aarch64-linux/lib/libnvperf_host.so+0x1417d8)
    #2 0xffff80db5e0c in __libc_start_main ../csu/libc-start.c:308
    #3 0xaaaac240da5c  (/media/nvme/xuzhi/workspace/tune_space/cupti_samples/autorange_profiling/auto_range_profiling+0xda5c)

Direct leak of 42608 byte(s) in 61 object(s) allocated from:
    #0 0xffff84482540 in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cc:144
    #1 0xffff8181b034  (/usr/local/cuda/targets/aarch64-linux/lib/libnvperf_host.so+0x141034)
    #2 0xffff80db5e0c in __libc_start_main ../csu/libc-start.c:308
    #3 0xaaaac240da5c  (/media/nvme/xuzhi/workspace/tune_space/cupti_samples/autorange_profiling/auto_range_profiling+0xda5c)

Direct leak of 2752 byte(s) in 6 object(s) allocated from:
    #0 0xffff8448293c in __interceptor_realloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cc:163
    #1 0xffff8181b860  (/usr/local/cuda/targets/aarch64-linux/lib/libnvperf_host.so+0x141860)
    #2 0xffff80db5e0c in __libc_start_main ../csu/libc-start.c:308
    #3 0xaaaac240da5c  (/media/nvme/xuzhi/workspace/tune_space/cupti_samples/autorange_profiling/auto_range_profiling+0xda5c)

Direct leak of 392 byte(s) in 1 object(s) allocated from:
    #0 0xffff84482540 in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cc:144
    #1 0xffff7df5eaa0  (/lib/aarch64-linux-gnu/libnvcucompat.so+0x8aa0)
    #2 0xffff80db5e0c in __libc_start_main ../csu/libc-start.c:308
    #3 0xaaaac240da5c  (/media/nvme/xuzhi/workspace/tune_space/cupti_samples/autorange_profiling/auto_range_profiling+0xda5c)

Direct leak of 100 byte(s) in 7 object(s) allocated from:
    #0 0xffff84482540 in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cc:144
    #1 0xffff7df5eb4c  (/lib/aarch64-linux-gnu/libnvcucompat.so+0x8b4c)
    #2 0xffff80db5e0c in __libc_start_main ../csu/libc-start.c:308
    #3 0xaaaac240da5c  (/media/nvme/xuzhi/workspace/tune_space/cupti_samples/autorange_profiling/auto_range_profiling+0xda5c)

Direct leak of 96 byte(s) in 3 object(s) allocated from:
    #0 0xffff84482540 in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cc:144
    #1 0xffff8181b6ec  (/usr/local/cuda/targets/aarch64-linux/lib/libnvperf_host.so+0x1416ec)
    #2 0xffff80db5e0c in __libc_start_main ../csu/libc-start.c:308
    #3 0xaaaac240da5c  (/media/nvme/xuzhi/workspace/tune_space/cupti_samples/autorange_profiling/auto_range_profiling+0xda5c)

Direct leak of 28 byte(s) in 1 object(s) allocated from:
    #0 0xffff84482540 in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cc:144
    #1 0xffff7df5f204  (/lib/aarch64-linux-gnu/libnvcucompat.so+0x9204)
    #2 0xffff80db5e0c in __libc_start_main ../csu/libc-start.c:308
    #3 0xaaaac240da5c  (/media/nvme/xuzhi/workspace/tune_space/cupti_samples/autorange_profiling/auto_range_profiling+0xda5c)

Direct leak of 20 byte(s) in 1 object(s) allocated from:
    #0 0xffff84482540 in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cc:144
    #1 0xffff7df5f244  (/lib/aarch64-linux-gnu/libnvcucompat.so+0x9244)
    #2 0xffff80db5e0c in __libc_start_main ../csu/libc-start.c:308
    #3 0xaaaac240da5c  (/media/nvme/xuzhi/workspace/tune_space/cupti_samples/autorange_profiling/auto_range_profiling+0xda5c)

Direct leak of 20 byte(s) in 1 object(s) allocated from:
    #0 0xffff84482540 in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cc:144
    #1 0xffff7df5f264  (/lib/aarch64-linux-gnu/libnvcucompat.so+0x9264)
    #2 0xffff80db5e0c in __libc_start_main ../csu/libc-start.c:308
    #3 0xaaaac240da5c  (/media/nvme/xuzhi/workspace/tune_space/cupti_samples/autorange_profiling/auto_range_profiling+0xda5c)

Direct leak of 16 byte(s) in 1 object(s) allocated from:
    #0 0xffff84482540 in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cc:144
    #1 0xffff7df5ef78  (/lib/aarch64-linux-gnu/libnvcucompat.so+0x8f78)
    #2 0xffff80db5e0c in __libc_start_main ../csu/libc-start.c:308
    #3 0xaaaac240da5c  (/media/nvme/xuzhi/workspace/tune_space/cupti_samples/autorange_profiling/auto_range_profiling+0xda5c)

Direct leak of 12 byte(s) in 1 object(s) allocated from:
    #0 0xffff84482540 in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cc:144
    #1 0xffff7df5f174  (/lib/aarch64-linux-gnu/libnvcucompat.so+0x9174)
    #2 0xffff80db5e0c in __libc_start_main ../csu/libc-start.c:308
    #3 0xaaaac240da5c  (/media/nvme/xuzhi/workspace/tune_space/cupti_samples/autorange_profiling/auto_range_profiling+0xda5c)

Direct leak of 12 byte(s) in 1 object(s) allocated from:
    #0 0xffff84482540 in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cc:144
    #1 0xffff7df5f224  (/lib/aarch64-linux-gnu/libnvcucompat.so+0x9224)
    #2 0xffff80db5e0c in __libc_start_main ../csu/libc-start.c:308
    #3 0xaaaac240da5c  (/media/nvme/xuzhi/workspace/tune_space/cupti_samples/autorange_profiling/auto_range_profiling+0xda5c)

Direct leak of 8 byte(s) in 1 object(s) allocated from:
    #0 0xffff84482540 in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cc:144
    #1 0xffff7df5f194  (/lib/aarch64-linux-gnu/libnvcucompat.so+0x9194)
    #2 0xffff80db5e0c in __libc_start_main ../csu/libc-start.c:308
    #3 0xaaaac240da5c  (/media/nvme/xuzhi/workspace/tune_space/cupti_samples/autorange_profiling/auto_range_profiling+0xda5c)

Direct leak of 6 byte(s) in 1 object(s) allocated from:
    #0 0xffff84482540 in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cc:144
    #1 0xffff8181b8dc  (/usr/local/cuda/targets/aarch64-linux/lib/libnvperf_host.so+0x1418dc)
    #2 0xffff80db5e0c in __libc_start_main ../csu/libc-start.c:308
    #3 0xaaaac240da5c  (/media/nvme/xuzhi/workspace/tune_space/cupti_samples/autorange_profiling/auto_range_profiling+0xda5c)

Direct leak of 1 byte(s) in 1 object(s) allocated from:
    #0 0xffff84482540 in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cc:144
    #1 0xffff7df5ebdc  (/lib/aarch64-linux-gnu/libnvcucompat.so+0x8bdc)
    #2 0xffff80db5e0c in __libc_start_main ../csu/libc-start.c:308
    #3 0xaaaac240da5c  (/media/nvme/xuzhi/workspace/tune_space/cupti_samples/autorange_profiling/auto_range_profiling+0xda5c)

Indirect leak of 120631 byte(s) in 128 object(s) allocated from:
    #0 0xffff84482540 in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cc:144
    #1 0xffff8181b034  (/usr/local/cuda/targets/aarch64-linux/lib/libnvperf_host.so+0x141034)
    #2 0xffff80db5e0c in __libc_start_main ../csu/libc-start.c:308
    #3 0xaaaac240da5c  (/media/nvme/xuzhi/workspace/tune_space/cupti_samples/autorange_profiling/auto_range_profiling+0xda5c)

Indirect leak of 9280 byte(s) in 59 object(s) allocated from:
    #0 0xffff84482540 in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cc:144
    #1 0xffff8181b7d8  (/usr/local/cuda/targets/aarch64-linux/lib/libnvperf_host.so+0x1417d8)
    #2 0xffff80db5e0c in __libc_start_main ../csu/libc-start.c:308
    #3 0xaaaac240da5c  (/media/nvme/xuzhi/workspace/tune_space/cupti_samples/autorange_profiling/auto_range_profiling+0xda5c)

Indirect leak of 1448 byte(s) in 1 object(s) allocated from:
    #0 0xffff84482540 in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cc:144
    #1 0xffff82818948  (/usr/local/cuda/targets/aarch64-linux/lib/libcupti.so.11.4+0xd5948)
    #2 0xffff80db5e0c in __libc_start_main ../csu/libc-start.c:308
    #3 0xaaaac240da5c  (/media/nvme/xuzhi/workspace/tune_space/cupti_samples/autorange_profiling/auto_range_profiling+0xda5c)

Indirect leak of 1216 byte(s) in 2 object(s) allocated from:
    #0 0xffff84482540 in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cc:144
    #1 0xffff8181ac94  (/usr/local/cuda/targets/aarch64-linux/lib/libnvperf_host.so+0x140c94)
    #2 0xffff80db5e0c in __libc_start_main ../csu/libc-start.c:308
    #3 0xaaaac240da5c  (/media/nvme/xuzhi/workspace/tune_space/cupti_samples/autorange_profiling/auto_range_profiling+0xda5c)

Indirect leak of 136 byte(s) in 1 object(s) allocated from:
    #0 0xffff84482540 in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cc:144
    #1 0xffff8284ce24  (/usr/local/cuda/targets/aarch64-linux/lib/libcupti.so.11.4+0x109e24)
    #2 0xffff80db5e0c in __libc_start_main ../csu/libc-start.c:308
    #3 0xaaaac240da5c  (/media/nvme/xuzhi/workspace/tune_space/cupti_samples/autorange_profiling/auto_range_profiling+0xda5c)

Indirect leak of 128 byte(s) in 1 object(s) allocated from:
    #0 0xffff84482724 in __interceptor_calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cc:153
    #1 0xffff82a9d7b0  (/usr/local/cuda/targets/aarch64-linux/lib/libcupti.so.11.4+0x35a7b0)
    #2 0xffff80db5e0c in __libc_start_main ../csu/libc-start.c:308
    #3 0xaaaac240da5c  (/media/nvme/xuzhi/workspace/tune_space/cupti_samples/autorange_profiling/auto_range_profiling+0xda5c)

Indirect leak of 64 byte(s) in 1 object(s) allocated from:
    #0 0xffff84482540 in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cc:144
    #1 0xffff8284c028  (/usr/local/cuda/targets/aarch64-linux/lib/libcupti.so.11.4+0x109028)
    #2 0xffff80db5e0c in __libc_start_main ../csu/libc-start.c:308
    #3 0xaaaac240da5c  (/media/nvme/xuzhi/workspace/tune_space/cupti_samples/autorange_profiling/auto_range_profiling+0xda5c)

Indirect leak of 52 byte(s) in 6 object(s) allocated from:
    #0 0xffff84482540 in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cc:144
    #1 0xffff7df5eb4c  (/lib/aarch64-linux-gnu/libnvcucompat.so+0x8b4c)
    #2 0xffff80db5e0c in __libc_start_main ../csu/libc-start.c:308
    #3 0xaaaac240da5c  (/media/nvme/xuzhi/workspace/tune_space/cupti_samples/autorange_profiling/auto_range_profiling+0xda5c)

Indirect leak of 48 byte(s) in 1 object(s) allocated from:
    #0 0xffff84482540 in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cc:144
    #1 0xffff82a9d74c  (/usr/local/cuda/targets/aarch64-linux/lib/libcupti.so.11.4+0x35a74c)
    #2 0xffff80db5e0c in __libc_start_main ../csu/libc-start.c:308
    #3 0xaaaac240da5c  (/media/nvme/xuzhi/workspace/tune_space/cupti_samples/autorange_profiling/auto_range_profiling+0xda5c)

Indirect leak of 24 byte(s) in 1 object(s) allocated from:
    #0 0xffff84482540 in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cc:144
    #1 0xffff82a9d0f4  (/usr/local/cuda/targets/aarch64-linux/lib/libcupti.so.11.4+0x35a0f4)
    #2 0xffff80db5e0c in __libc_start_main ../csu/libc-start.c:308
    #3 0xaaaac240da5c  (/media/nvme/xuzhi/workspace/tune_space/cupti_samples/autorange_profiling/auto_range_profiling+0xda5c)

Indirect leak of 16 byte(s) in 1 object(s) allocated from:
    #0 0xffff84482540 in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cc:144
    #1 0xffff82a9d88c  (/usr/local/cuda/targets/aarch64-linux/lib/libcupti.so.11.4+0x35a88c)
    #2 0xffff80db5e0c in __libc_start_main ../csu/libc-start.c:308
    #3 0xaaaac240da5c  (/media/nvme/xuzhi/workspace/tune_space/cupti_samples/autorange_profiling/auto_range_profiling+0xda5c)

Indirect leak of 16 byte(s) in 1 object(s) allocated from:
    #0 0xffff84482540 in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cc:144
    #1 0xffff82a9d134  (/usr/local/cuda/targets/aarch64-linux/lib/libcupti.so.11.4+0x35a134)
    #2 0xffff80db5e0c in __libc_start_main ../csu/libc-start.c:308
    #3 0xaaaac240da5c  (/media/nvme/xuzhi/workspace/tune_space/cupti_samples/autorange_profiling/auto_range_profiling+0xda5c)

SUMMARY: AddressSanitizer: 864188 byte(s) leaked in 900 allocation(s).

I have identified that the memory leakage occurs when calling NVPW_InitializeHost in the code block provided.

  NVPW_InitializeHost_Params initializeHostParams = {
      NVPW_InitializeHost_Params_STRUCT_SIZE};
  NVPW_API_CALL(NVPW_InitializeHost(&initializeHostParams));

Could you kindly provide some insight into the root cause of this memory leakage issue and any potential solutions to fix it? Thank you very much

1 Like

Thanks for reaching out. The engineering team is currently investigating this. I will let you know what we find out.

We are trying to create a reproducer internally. Could you please provide the machine architecture, CUDA driver, chip name, and CTK version for getting the reproducer?

Sure!
The machine I tested on is Jetson Orin NX and the architecture is Ampere.
CUDA Driver Version = 11.4
Chip name = GA10B
CTK version = 11.4

Thanks for reporting the issue. We can reproduce the issue and have logged a bug to track this. The leaks are from the python bindings.

Hi,

Excuse me, may I inquire if this bug has already been resolved?

Hi, @zhi_xz
I’m sorry the bug hasn’t been fixed yet.