Segment fault when using cupti (profiling injection) with nvidia triton and tensorrt

I’m trying to use the ‘profiling injection’ sample in CUPTI for profiling Nvidia triton server or trtexec in TensorRT project with the following command:
sudo env LD_PRELOAD=/usr/local/cuda/extras/CUPTI/samples/profiling_injection/ ./build/tritonserver/build/server/build/tritonserver/install/bin/tritonserver (in the dir of triton server with version of 21.07 )
sudo env LD_PRELOAD=/usr/local/cuda/extras/CUPTI/samples/profiling_injection/ ./trtexec (in the dir of TensorRT- )
(I have added the corresponding dir to LD_LIBRARY_PATH)

I get the segment fault information as:
7978 segmentation fault sudo env
7847 segmentation fault sudo env ./trtexec

By using gdb, I get the following calling stack:
(gdb) bt
#0 0x00007fc145f30e1a in ?? () from /usr/local/cuda/extras/CUPTI/lib64/
#1 0x00007fc145c0ee85 in ?? () from /usr/local/cuda/extras/CUPTI/lib64/
#2 0x00007fc145c0d516 in ?? () from /usr/local/cuda/extras/CUPTI/lib64/
#3 0x00007fc145c0daa5 in ?? () from /usr/local/cuda/extras/CUPTI/lib64/
#4 0x00007fc145bbbfc4 in cuptiEnableCallback () from /usr/local/cuda/extras/CUPTI/lib64/
#5 0x00007fc148d1fde0 in register_callbacks() () from /usr/local/cuda/extras/CUPTI/samples/profiling_injection/
#6 0x00007fc148d20093 in InitializeInjection () from /usr/local/cuda/extras/CUPTI/samples/profiling_injection/
#7 0x00007fc148d20152 in dlsym () from /usr/local/cuda/extras/CUPTI/samples/profiling_injection/
#8 0x00007fc0fcae9086 in ?? () from /usr/local/cuda/targets/x86_64-linux/lib/
#9 0x00007fc148fda8d3 in call_init (env=0x7ffc183fa258, argv=0x7ffc183fa248, argc=1, l=) at dl-init.c:72
#10 _dl_init (main_map=0x7fc1491f5170, argc=1, argv=0x7ffc183fa248, env=0x7ffc183fa258) at dl-init.c:119
#11 0x00007fc148fcb0ca in _dl_start_user () from /lib64/
#12 0x0000000000000001 in ?? ()
#13 0x00007ffc183fb657 in ?? ()
#14 0x0000000000000000 in ?? ()

When I comment out CUPTI_API_CALL(cuptiEnableCallback(1, subscriber, CUPTI_CB_DOMAIN_DRIVER_API, CUPTI_DRIVER_TRACE_CBID_cuLaunchKernel));
in line 446 and line 448 of injection_2.cpp, no segment fault occurs.

Does this means the callback function in CUPTI cannot be used to profile Nvidia triton server or TensorRT engine. And I would like to know how to use CUPTI to profile Nvidia Triton server.

The GPU hardware is V100, the driver version is 470.42.01 and CUDA version is 11.4.

Thank you.

Hi John Zhang,

Thanks for reporting this issue. CUPTI callback APIs can be used in the profiling injection and there are no known limitations about this use case. We are able to reproduce the issue at our end. We need to investigate it.

Hi Mjain,

Thanks for your efforts. Is there any solution for this issue?

Hi John Zhang,

We are looking into a potential timing issue with our suggestion to use LD_PRELOAD. Instead, you can try using the environment variable CUDA_INJECTION64_PATH to point to your tool’s shared library. When this environment variable is set, during CUDA initialization it attempts to load the shared object pointed to by the variable and will run the function named ‘InitializeInjection’. You’ll see the sample injection codes already have this symbol exported, so they should work as-is.

We will update our documentation to reflect this recommendation.

PS - running trtexec without arguments doesn’t appear to run any CUDA code. You’ll want to test this with a path that executes CUDA, or the injection tool won’t be loaded. (Since it is loaded as CUDA initializes)