I am developing a tool to collect CUPTI event information automatically at the runtime. My library (.so, employed with LD_PRELOAD) calls CUPTI at the beginning of user “main” . It registers callback functions like the CUPTI sample “callback_event”. The “callback_event” sample works well on my machine.
At the runtime, an event group is initialized and the callback function is activated successfully for each kernel. However, the returned event value is always 0.
The source codes are attached.
You can compile with
nvcc -O3 -g -std=c++11 --compiler-options ‘-fPIC’ -I/cuda/path/extras/CUPTI/include -c manager.cpp
nvcc -O3 -g -std=c++11 --compiler-options ‘-fPIC’ -c ld_cupti.cpp
nvcc -O3 -g -std=c++11 --compiler-options ‘-fPIC’ --shared -o libld_cupti.so manager.o ld_cupti.o -L/cuda/path/extras/CUPTI/lib64 -lcuda -lcupti -lcudart -L/cuda/path/lib64
At runtime export LD_PRELOAD=/path/to/libld_cupti.so before you start a CUDA program.
Thanks for your help.