CUPTI profiling API memory leak

When building samples in the directory /usr/local/cuda/extras/CUPTI/samples/autorange_profiling, I included -Xcompiler -fsanitize=address in NVCCFLAGS. This was done so that I could detect any memory leaks during profiling.

However, after building, I encountered an error when attempting to execute auto_range_profiling.

==3113510==You are trying to dlopen a libnvperf_host.so shared library with RTLD_DEEPBIND flag which is incompatibe with sanitizer runtime (see https://github.com/google/sanitizers/issues/611 for details). If you want to run libnvperf_host.so library under sanitizers please remove RTLD_DEEPBIND from dlopen flags.

To address this issue, I have implemented a workaround by incorporating a dlopen hooker that can modify the RTLD_DEEPBIND flag.

#include <dlfcn.h>
#include <stdio.h>

typedef void* (*orig_dlopen_func_type)(const char*, int);
void* dlopen(const char* filename, int flags) {
  void* result;
  orig_dlopen_func_type original_dlopen;
  original_dlopen = (orig_dlopen_func_type)dlsym(((void*)-1l), "dlopen");
  if (flags & RTLD_DEEPBIND) {
    printf("Intercepted dlopen(%s, %d)\n", filename, flags);
    flags &= ~RTLD_DEEPBIND;
    printf("Adjusted flags to %d\n", flags);
  }
  result = (*original_dlopen)(filename, flags);
  return result;
}

To build the file, use the following command: gcc -shared -o dlopen_intercept.so hook_dlopen.c.

Next, run the program with the command sudo ASAN_OPTIONS=verify_asan_link_order=0 LD_PRELOAD=/media/nvme/xuzhi/cts/dlopen_intercept.so:/lib/aarch64-linux-gnu/libasan.so.5 ./auto_range_profiling. This should properly execute the binary.

According to the ASAN report, there appears to be a memory leak issue as follow.

Usage: ./auto_range_profiling [device_num] [metric_names comma separated]
CUDA Device Number: 0
Compute Capability of Device: 8.7
Intercepted dlopen(libnvperf_host.so, 266)
Adjusted flags to 258
Launching kernel: blocks 64, thread/block 256

Range Name                              Metric Name                                                                                         Metric Value
----------------------------------------------------------------------------------------------------------------------------------------------------------------
0                                       smsp__warps_launched.avg                                                                            8
1                                       smsp__warps_launched.avg                                                                            8

=================================================================
==3115379==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 685058 byte(s) in 610 object(s) allocated from:
    #0 0xffff84482540 in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cc:144
    #1 0xffff8181b7d8  (/usr/local/cuda/targets/aarch64-linux/lib/libnvperf_host.so+0x1417d8)
    #2 0xffff80db5e0c in __libc_start_main ../csu/libc-start.c:308
    #3 0xaaaac240da5c  (/media/nvme/xuzhi/workspace/tune_space/cupti_samples/autorange_profiling/auto_range_profiling+0xda5c)

Direct leak of 42608 byte(s) in 61 object(s) allocated from:
    #0 0xffff84482540 in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cc:144
    #1 0xffff8181b034  (/usr/local/cuda/targets/aarch64-linux/lib/libnvperf_host.so+0x141034)
    #2 0xffff80db5e0c in __libc_start_main ../csu/libc-start.c:308
    #3 0xaaaac240da5c  (/media/nvme/xuzhi/workspace/tune_space/cupti_samples/autorange_profiling/auto_range_profiling+0xda5c)

Direct leak of 2752 byte(s) in 6 object(s) allocated from:
    #0 0xffff8448293c in __interceptor_realloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cc:163
    #1 0xffff8181b860  (/usr/local/cuda/targets/aarch64-linux/lib/libnvperf_host.so+0x141860)
    #2 0xffff80db5e0c in __libc_start_main ../csu/libc-start.c:308
    #3 0xaaaac240da5c  (/media/nvme/xuzhi/workspace/tune_space/cupti_samples/autorange_profiling/auto_range_profiling+0xda5c)

Direct leak of 392 byte(s) in 1 object(s) allocated from:
    #0 0xffff84482540 in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cc:144
    #1 0xffff7df5eaa0  (/lib/aarch64-linux-gnu/libnvcucompat.so+0x8aa0)
    #2 0xffff80db5e0c in __libc_start_main ../csu/libc-start.c:308
    #3 0xaaaac240da5c  (/media/nvme/xuzhi/workspace/tune_space/cupti_samples/autorange_profiling/auto_range_profiling+0xda5c)

Direct leak of 100 byte(s) in 7 object(s) allocated from:
    #0 0xffff84482540 in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cc:144
    #1 0xffff7df5eb4c  (/lib/aarch64-linux-gnu/libnvcucompat.so+0x8b4c)
    #2 0xffff80db5e0c in __libc_start_main ../csu/libc-start.c:308
    #3 0xaaaac240da5c  (/media/nvme/xuzhi/workspace/tune_space/cupti_samples/autorange_profiling/auto_range_profiling+0xda5c)

Direct leak of 96 byte(s) in 3 object(s) allocated from:
    #0 0xffff84482540 in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cc:144
    #1 0xffff8181b6ec  (/usr/local/cuda/targets/aarch64-linux/lib/libnvperf_host.so+0x1416ec)
    #2 0xffff80db5e0c in __libc_start_main ../csu/libc-start.c:308
    #3 0xaaaac240da5c  (/media/nvme/xuzhi/workspace/tune_space/cupti_samples/autorange_profiling/auto_range_profiling+0xda5c)

Direct leak of 28 byte(s) in 1 object(s) allocated from:
    #0 0xffff84482540 in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cc:144
    #1 0xffff7df5f204  (/lib/aarch64-linux-gnu/libnvcucompat.so+0x9204)
    #2 0xffff80db5e0c in __libc_start_main ../csu/libc-start.c:308
    #3 0xaaaac240da5c  (/media/nvme/xuzhi/workspace/tune_space/cupti_samples/autorange_profiling/auto_range_profiling+0xda5c)

Direct leak of 20 byte(s) in 1 object(s) allocated from:
    #0 0xffff84482540 in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cc:144
    #1 0xffff7df5f244  (/lib/aarch64-linux-gnu/libnvcucompat.so+0x9244)
    #2 0xffff80db5e0c in __libc_start_main ../csu/libc-start.c:308
    #3 0xaaaac240da5c  (/media/nvme/xuzhi/workspace/tune_space/cupti_samples/autorange_profiling/auto_range_profiling+0xda5c)

Direct leak of 20 byte(s) in 1 object(s) allocated from:
    #0 0xffff84482540 in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cc:144
    #1 0xffff7df5f264  (/lib/aarch64-linux-gnu/libnvcucompat.so+0x9264)
    #2 0xffff80db5e0c in __libc_start_main ../csu/libc-start.c:308
    #3 0xaaaac240da5c  (/media/nvme/xuzhi/workspace/tune_space/cupti_samples/autorange_profiling/auto_range_profiling+0xda5c)

Direct leak of 16 byte(s) in 1 object(s) allocated from:
    #0 0xffff84482540 in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cc:144
    #1 0xffff7df5ef78  (/lib/aarch64-linux-gnu/libnvcucompat.so+0x8f78)
    #2 0xffff80db5e0c in __libc_start_main ../csu/libc-start.c:308
    #3 0xaaaac240da5c  (/media/nvme/xuzhi/workspace/tune_space/cupti_samples/autorange_profiling/auto_range_profiling+0xda5c)

Direct leak of 12 byte(s) in 1 object(s) allocated from:
    #0 0xffff84482540 in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cc:144
    #1 0xffff7df5f174  (/lib/aarch64-linux-gnu/libnvcucompat.so+0x9174)
    #2 0xffff80db5e0c in __libc_start_main ../csu/libc-start.c:308
    #3 0xaaaac240da5c  (/media/nvme/xuzhi/workspace/tune_space/cupti_samples/autorange_profiling/auto_range_profiling+0xda5c)

Direct leak of 12 byte(s) in 1 object(s) allocated from:
    #0 0xffff84482540 in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cc:144
    #1 0xffff7df5f224  (/lib/aarch64-linux-gnu/libnvcucompat.so+0x9224)
    #2 0xffff80db5e0c in __libc_start_main ../csu/libc-start.c:308
    #3 0xaaaac240da5c  (/media/nvme/xuzhi/workspace/tune_space/cupti_samples/autorange_profiling/auto_range_profiling+0xda5c)

Direct leak of 8 byte(s) in 1 object(s) allocated from:
    #0 0xffff84482540 in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cc:144
    #1 0xffff7df5f194  (/lib/aarch64-linux-gnu/libnvcucompat.so+0x9194)
    #2 0xffff80db5e0c in __libc_start_main ../csu/libc-start.c:308
    #3 0xaaaac240da5c  (/media/nvme/xuzhi/workspace/tune_space/cupti_samples/autorange_profiling/auto_range_profiling+0xda5c)

Direct leak of 6 byte(s) in 1 object(s) allocated from:
    #0 0xffff84482540 in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cc:144
    #1 0xffff8181b8dc  (/usr/local/cuda/targets/aarch64-linux/lib/libnvperf_host.so+0x1418dc)
    #2 0xffff80db5e0c in __libc_start_main ../csu/libc-start.c:308
    #3 0xaaaac240da5c  (/media/nvme/xuzhi/workspace/tune_space/cupti_samples/autorange_profiling/auto_range_profiling+0xda5c)

Direct leak of 1 byte(s) in 1 object(s) allocated from:
    #0 0xffff84482540 in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cc:144
    #1 0xffff7df5ebdc  (/lib/aarch64-linux-gnu/libnvcucompat.so+0x8bdc)
    #2 0xffff80db5e0c in __libc_start_main ../csu/libc-start.c:308
    #3 0xaaaac240da5c  (/media/nvme/xuzhi/workspace/tune_space/cupti_samples/autorange_profiling/auto_range_profiling+0xda5c)

Indirect leak of 120631 byte(s) in 128 object(s) allocated from:
    #0 0xffff84482540 in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cc:144
    #1 0xffff8181b034  (/usr/local/cuda/targets/aarch64-linux/lib/libnvperf_host.so+0x141034)
    #2 0xffff80db5e0c in __libc_start_main ../csu/libc-start.c:308
    #3 0xaaaac240da5c  (/media/nvme/xuzhi/workspace/tune_space/cupti_samples/autorange_profiling/auto_range_profiling+0xda5c)

Indirect leak of 9280 byte(s) in 59 object(s) allocated from:
    #0 0xffff84482540 in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cc:144
    #1 0xffff8181b7d8  (/usr/local/cuda/targets/aarch64-linux/lib/libnvperf_host.so+0x1417d8)
    #2 0xffff80db5e0c in __libc_start_main ../csu/libc-start.c:308
    #3 0xaaaac240da5c  (/media/nvme/xuzhi/workspace/tune_space/cupti_samples/autorange_profiling/auto_range_profiling+0xda5c)

Indirect leak of 1448 byte(s) in 1 object(s) allocated from:
    #0 0xffff84482540 in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cc:144
    #1 0xffff82818948  (/usr/local/cuda/targets/aarch64-linux/lib/libcupti.so.11.4+0xd5948)
    #2 0xffff80db5e0c in __libc_start_main ../csu/libc-start.c:308
    #3 0xaaaac240da5c  (/media/nvme/xuzhi/workspace/tune_space/cupti_samples/autorange_profiling/auto_range_profiling+0xda5c)

Indirect leak of 1216 byte(s) in 2 object(s) allocated from:
    #0 0xffff84482540 in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cc:144
    #1 0xffff8181ac94  (/usr/local/cuda/targets/aarch64-linux/lib/libnvperf_host.so+0x140c94)
    #2 0xffff80db5e0c in __libc_start_main ../csu/libc-start.c:308
    #3 0xaaaac240da5c  (/media/nvme/xuzhi/workspace/tune_space/cupti_samples/autorange_profiling/auto_range_profiling+0xda5c)

Indirect leak of 136 byte(s) in 1 object(s) allocated from:
    #0 0xffff84482540 in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cc:144
    #1 0xffff8284ce24  (/usr/local/cuda/targets/aarch64-linux/lib/libcupti.so.11.4+0x109e24)
    #2 0xffff80db5e0c in __libc_start_main ../csu/libc-start.c:308
    #3 0xaaaac240da5c  (/media/nvme/xuzhi/workspace/tune_space/cupti_samples/autorange_profiling/auto_range_profiling+0xda5c)

Indirect leak of 128 byte(s) in 1 object(s) allocated from:
    #0 0xffff84482724 in __interceptor_calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cc:153
    #1 0xffff82a9d7b0  (/usr/local/cuda/targets/aarch64-linux/lib/libcupti.so.11.4+0x35a7b0)
    #2 0xffff80db5e0c in __libc_start_main ../csu/libc-start.c:308
    #3 0xaaaac240da5c  (/media/nvme/xuzhi/workspace/tune_space/cupti_samples/autorange_profiling/auto_range_profiling+0xda5c)

Indirect leak of 64 byte(s) in 1 object(s) allocated from:
    #0 0xffff84482540 in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cc:144
    #1 0xffff8284c028  (/usr/local/cuda/targets/aarch64-linux/lib/libcupti.so.11.4+0x109028)
    #2 0xffff80db5e0c in __libc_start_main ../csu/libc-start.c:308
    #3 0xaaaac240da5c  (/media/nvme/xuzhi/workspace/tune_space/cupti_samples/autorange_profiling/auto_range_profiling+0xda5c)

Indirect leak of 52 byte(s) in 6 object(s) allocated from:
    #0 0xffff84482540 in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cc:144
    #1 0xffff7df5eb4c  (/lib/aarch64-linux-gnu/libnvcucompat.so+0x8b4c)
    #2 0xffff80db5e0c in __libc_start_main ../csu/libc-start.c:308
    #3 0xaaaac240da5c  (/media/nvme/xuzhi/workspace/tune_space/cupti_samples/autorange_profiling/auto_range_profiling+0xda5c)

Indirect leak of 48 byte(s) in 1 object(s) allocated from:
    #0 0xffff84482540 in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cc:144
    #1 0xffff82a9d74c  (/usr/local/cuda/targets/aarch64-linux/lib/libcupti.so.11.4+0x35a74c)
    #2 0xffff80db5e0c in __libc_start_main ../csu/libc-start.c:308
    #3 0xaaaac240da5c  (/media/nvme/xuzhi/workspace/tune_space/cupti_samples/autorange_profiling/auto_range_profiling+0xda5c)

Indirect leak of 24 byte(s) in 1 object(s) allocated from:
    #0 0xffff84482540 in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cc:144
    #1 0xffff82a9d0f4  (/usr/local/cuda/targets/aarch64-linux/lib/libcupti.so.11.4+0x35a0f4)
    #2 0xffff80db5e0c in __libc_start_main ../csu/libc-start.c:308
    #3 0xaaaac240da5c  (/media/nvme/xuzhi/workspace/tune_space/cupti_samples/autorange_profiling/auto_range_profiling+0xda5c)

Indirect leak of 16 byte(s) in 1 object(s) allocated from:
    #0 0xffff84482540 in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cc:144
    #1 0xffff82a9d88c  (/usr/local/cuda/targets/aarch64-linux/lib/libcupti.so.11.4+0x35a88c)
    #2 0xffff80db5e0c in __libc_start_main ../csu/libc-start.c:308
    #3 0xaaaac240da5c  (/media/nvme/xuzhi/workspace/tune_space/cupti_samples/autorange_profiling/auto_range_profiling+0xda5c)

Indirect leak of 16 byte(s) in 1 object(s) allocated from:
    #0 0xffff84482540 in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cc:144
    #1 0xffff82a9d134  (/usr/local/cuda/targets/aarch64-linux/lib/libcupti.so.11.4+0x35a134)
    #2 0xffff80db5e0c in __libc_start_main ../csu/libc-start.c:308
    #3 0xaaaac240da5c  (/media/nvme/xuzhi/workspace/tune_space/cupti_samples/autorange_profiling/auto_range_profiling+0xda5c)

SUMMARY: AddressSanitizer: 864188 byte(s) leaked in 900 allocation(s).

I have identified that the memory leakage occurs when calling NVPW_InitializeHost in the code block provided.

  NVPW_InitializeHost_Params initializeHostParams = {
      NVPW_InitializeHost_Params_STRUCT_SIZE};
  NVPW_API_CALL(NVPW_InitializeHost(&initializeHostParams));

Could you kindly provide some insight into the root cause of this memory leakage issue and any potential solutions to fix it? Thank you very much

Thanks for reaching out. The engineering team is currently investigating this. I will let you know what we find out.

We are trying to create a reproducer internally. Could you please provide the machine architecture, CUDA driver, chip name, and CTK version for getting the reproducer?

Sure!
The machine I tested on is Jetson Orin NX and the architecture is Ampere.
CUDA Driver Version = 11.4
Chip name = GA10B
CTK version = 11.4

Thanks for reporting the issue. We can reproduce the issue and have logged a bug to track this. The leaks are from the python bindings.

Hi,

Excuse me, may I inquire if this bug has already been resolved?