Cupti deadlock at cuptiPCSamplingStop

flyingr · April 22, 2022, 6:37am

I tried to do remote GPU profiling using CUPTI and grpc. The code looks like as follow:

class GPUProfilingServerImpl final public GPUProfilingServer::Service {
    Status DoProfiling(ServerContext* context, const GPUProfilingRequest* request, GPUProfilingResponse* reply) override {
        // initialize param
        cuptiPCSamplingStart(&param);
        sleep(request->duration);
        // initialize param
        cuptiPCSamplingStop(&param);
    }
}

extern "C" InitializeInjection(void) {
    // enable cupti callbacks
    // start grpc server
}

I compile the code to a dynamic lib and set CUDA_INJECTION64_PATH to the lib path. Then I run a cuda program and issued a request using grpc client, and the deadlock occasionally happened. The gdb debug info was as follows;

#0  __lll_lock_wait (futex=futex@entry=0x55718480bdc8, private=0) at lowlevellock.c:52
#1  0x00007f16bdbcd131 in __GI___pthread_mutex_lock (mutex=0x55718480bdc8) at ../nptl/pthread_mutex_lock.c:115
#2  0x00007f16ba15b292 in ?? () from /usr/local/cuda/lib64/libcupti.so.11.6
#3  0x00007f16ba03a746 in ?? () from /usr/local/cuda/lib64/libcupti.so.11.6
#4  0x00007f16ba03aa50 in ?? () from /usr/local/cuda/lib64/libcupti.so.11.6
#5  0x00007f16ba03b48c in ?? () from /usr/local/cuda/lib64/libcupti.so.11.6
#6  0x00007f16bc011495 in ?? () from /lib/x86_64-linux-gnu/libcuda.so.1
#7  0x00007f16bc21c4a0 in ?? () from /lib/x86_64-linux-gnu/libcuda.so.1
#8  0x00007f16bbfb528f in ?? () from /lib/x86_64-linux-gnu/libcuda.so.1
#9  0x00007f16bbfb799f in ?? () from /lib/x86_64-linux-gnu/libcuda.so.1
#10 0x00007f16bc0591c2 in ?? () from /lib/x86_64-linux-gnu/libcuda.so.1
#11 0x000055718388712b in __cudart803 ()
#12 0x00005571838e2006 in cudaLaunchKernel ()

I checked the owner of mutex:

(gdb) p *mutex
$1 = {__data = {__lock = 2, __count = 1, __owner = 2937977, __nusers = 1, __kind = 1, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}},
  __size = "\002\000\000\000\001\000\000\000y\324,\000\001\000\000\000\001", '\000' <repeats 22 times>, __align = 4294967298}

And print the call stack of thread 2917977:

#0  0x00007f1690bd910d in ?? () from /usr/local/cuda/lib64/libnvperf_host.so
#1  0x00007f1690a104e9 in ?? () from /usr/local/cuda/lib64/libnvperf_host.so
#2  0x00007f16ba15c01d in ?? () from /usr/local/cuda/lib64/libcupti.so.11.6
#3  0x00007f16ba15ff9e in ?? () from /usr/local/cuda/lib64/libcupti.so.11.6
#4  0x00007f16ba16034d in ?? () from /usr/local/cuda/lib64/libcupti.so.11.6
#5  0x00007f16ba15aca4 in cuptiPCSamplingStop () from /usr/local/cuda/lib64/libcupti.so.11.6
#6  0x00007f16bb1ba0cd in stopCUptiPCSamplingHandler (signum=12) at gpu_profiler.cpp:875
#7  0x00007f16bb1bcc54 in GPUProfilingServiceImpl::DoProfiling (this=0x7f16b94225b0, context=0x7f16a0010248, request=0x7f16a000f3a0, reply=0x7f16b27f9380) at gpu_profiler.cpp:915

Anyone knows why? thanks.

fyi, I found that that were cupti calls in the callstack of cudaLaunchKernel , which might be the cause of the deadlock. So I tried to disable the cupti callback before calling cuptiPCSamplingStart/Stop and enable it after. But the situation did not change either.

mjain · April 22, 2022, 1:11pm

Hi pkueecsly,

A similar deadlock issue in the cuptiPCSamplingStop call was fixed in the CUDA 11.6 Update 1 release (link). Would it be possible for you to give a try to CUPTI from this release?

And thanks for providing call stack and other relevant details.

flyingr · April 23, 2022, 2:19am

Thanks for your reply.
But I checked the installed CUDA version, it was exactly 11.6.1 (driver version 510.47.03).

mjain · April 25, 2022, 7:58am

Hi pkueecsly,

Can you please provide us the CUPTI library version? By default, it is located at /usr/local/cuda/extras/CUPTI/lib64. Is version libcupti.so.2022.1.0 or libcupti.so.2022.1.1?

flyingr · May 3, 2022, 6:51am

sorry for the late reply.
I checked the CUPTI lib version, it was 2022.1.0. But after I updated it to libcupti.so.2022.1.1, the bug still existed.

mjain · May 4, 2022, 10:35am

Hi pkueecsly,

Sorry to hear that issue is not fixed in the CUPTI from CUDA 11.6 Update 1 release. Would it be possible for you to provide a minimal reproducer for us to debug the issue? And what GPU do you use?

Topic		Replies	Views
Profiling hangs in cuda/cupti .so CUPTI – CUDA Profiler Tools Interface cuda , tensorflow , ubuntu , python	6	660	January 29, 2024
Cuda-gdb deadlock in pgi_managed_new CUDA-GDB	15	1083	November 13, 2023
Whether CUDA supports GPU devices with 8.6 Compute Capability？ CUPTI – CUDA Profiler Tools Interface cuda	14	3495	March 5, 2024
cuptiActivityEnableLatencyTimestamps(1) hangs the process unexpectedly CUPTI – CUDA Profiler Tools Interface	10	569	July 15, 2024
Segment fault when using cupti (profiling injection) with nvidia triton and tensorrt CUPTI – CUDA Profiler Tools Interface	3	1380	April 20, 2022
CUPTI blocks CudaLaunch in multithreaded code CUDA Programming and Performance	1	2485	March 15, 2012
Cupti activity tracer hangs at cuptiActivityFlushAll when tracing pytorch models CUPTI – CUDA Profiler Tools Interface	3	970	October 12, 2021
Usage of CUPTI appears to rarely cause Cuda graph conditional nodes to segfault upon instantiation CUPTI – CUDA Profiler Tools Interface debugging-and-troubleshooting	4	266	September 17, 2024
CUBLAS problem CUDA Programming and Performance	16	3517	July 1, 2010
Cuda 3.2 mutex lock indefinately CUDA Programming and Performance	12	19138	November 12, 2010

Cupti deadlock at cuptiPCSamplingStop

Related topics