Hi, forum
I’m doing inference using C++ API of TF 2.3.1. Whenever I’m using RunMetadata
message and FULL_TRACE
to profile, I get SIGSEGV for my program.
In inference code, I have something:
tf::RunMetadata run_metadata;
status = session_->Run(input_tensors, output_tensors, &run_metadata);
When I ran my binary, I got errors like, used GDB
to show stacktraces:
2020-11-18 14:35:23.892738: I tensorflow/core/profiler/lib/profiler_session.cc:164] Profiler session started.
2020-11-18 14:35:23.892826: I tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1391] Profiler found 1 GPUs
2020-11-18 14:35:23.915334: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcupti.so.10.2
[New Thread 0x7ffa647e4700 (LWP 2136)]
Thread 172 "test_program" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fff9eee4700 (LWP 2075)]
0x00007ffc4e35390d in ?? () from /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcupti.so.10.2
(gdb) bt
#0 0x00007ffc4e35390d in ?? () from /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcupti.so.10.2
#1 0x00007ffc4e104830 in ?? () from /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcupti.so.10.2
#2 0x00007ffc4e0fa8fb in ?? () from /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcupti.so.10.2
#3 0x00007ffccb0c6fc3 in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#4 0x00007ffc4e0fa5f8 in ?? () from /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcupti.so.10.2
#5 0x00007ffc4e0fbff9 in ?? () from /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcupti.so.10.2
#6 0x00007ffc4e0fc4af in ?? () from /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcupti.so.10.2
#7 0x00007ffc4e109681 in cuptiSubscribe () from /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcupti.so.10.2
I’m currently on CUDA 10.2.89, and my CUPTI lib is:
$: dpkg -l | grep cupti
ii cuda-cupti-10-2 10.2.89-1 amd64 CUDA profiling tools runtime libs.