I tried to do remote GPU profiling using CUPTI and grpc. The code looks like as follow:
class GPUProfilingServerImpl final public GPUProfilingServer::Service {
Status DoProfiling(ServerContext* context, const GPUProfilingRequest* request, GPUProfilingResponse* reply) override {
// initialize param
cuptiPCSamplingStart(¶m);
sleep(request->duration);
// initialize param
cuptiPCSamplingStop(¶m);
}
}
extern "C" InitializeInjection(void) {
// enable cupti callbacks
// start grpc server
}
I compile the code to a dynamic lib and set CUDA_INJECTION64_PATH
to the lib path. Then I run a cuda program and issued a request using grpc client, and the deadlock occasionally happened. The gdb debug info was as follows;
#0 __lll_lock_wait (futex=futex@entry=0x55718480bdc8, private=0) at lowlevellock.c:52
#1 0x00007f16bdbcd131 in __GI___pthread_mutex_lock (mutex=0x55718480bdc8) at ../nptl/pthread_mutex_lock.c:115
#2 0x00007f16ba15b292 in ?? () from /usr/local/cuda/lib64/libcupti.so.11.6
#3 0x00007f16ba03a746 in ?? () from /usr/local/cuda/lib64/libcupti.so.11.6
#4 0x00007f16ba03aa50 in ?? () from /usr/local/cuda/lib64/libcupti.so.11.6
#5 0x00007f16ba03b48c in ?? () from /usr/local/cuda/lib64/libcupti.so.11.6
#6 0x00007f16bc011495 in ?? () from /lib/x86_64-linux-gnu/libcuda.so.1
#7 0x00007f16bc21c4a0 in ?? () from /lib/x86_64-linux-gnu/libcuda.so.1
#8 0x00007f16bbfb528f in ?? () from /lib/x86_64-linux-gnu/libcuda.so.1
#9 0x00007f16bbfb799f in ?? () from /lib/x86_64-linux-gnu/libcuda.so.1
#10 0x00007f16bc0591c2 in ?? () from /lib/x86_64-linux-gnu/libcuda.so.1
#11 0x000055718388712b in __cudart803 ()
#12 0x00005571838e2006 in cudaLaunchKernel ()
I checked the owner of mutex
:
(gdb) p *mutex
$1 = {__data = {__lock = 2, __count = 1, __owner = 2937977, __nusers = 1, __kind = 1, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}},
__size = "\002\000\000\000\001\000\000\000y\324,\000\001\000\000\000\001", '\000' <repeats 22 times>, __align = 4294967298}
And print the call stack of thread 2917977
:
#0 0x00007f1690bd910d in ?? () from /usr/local/cuda/lib64/libnvperf_host.so
#1 0x00007f1690a104e9 in ?? () from /usr/local/cuda/lib64/libnvperf_host.so
#2 0x00007f16ba15c01d in ?? () from /usr/local/cuda/lib64/libcupti.so.11.6
#3 0x00007f16ba15ff9e in ?? () from /usr/local/cuda/lib64/libcupti.so.11.6
#4 0x00007f16ba16034d in ?? () from /usr/local/cuda/lib64/libcupti.so.11.6
#5 0x00007f16ba15aca4 in cuptiPCSamplingStop () from /usr/local/cuda/lib64/libcupti.so.11.6
#6 0x00007f16bb1ba0cd in stopCUptiPCSamplingHandler (signum=12) at gpu_profiler.cpp:875
#7 0x00007f16bb1bcc54 in GPUProfilingServiceImpl::DoProfiling (this=0x7f16b94225b0, context=0x7f16a0010248, request=0x7f16a000f3a0, reply=0x7f16b27f9380) at gpu_profiler.cpp:915
Anyone knows why? thanks.
fyi, I found that that were cupti calls in the callstack of cudaLaunchKernel
, which might be the cause of the deadlock. So I tried to disable the cupti callback before calling cuptiPCSamplingStart/Stop
and enable it after. But the situation did not change either.