Hello,
I am trying to get detailed information about a custom Tensorflow op using nvprof with “–cpu-profiling” on. Unfortunately simply running “nvprof --cpu-profiling on python script.py” does not seem to work as it never finishes. After playing around with it a little, I think the reason for that is simply importing Tensorflow in the script is already too much for the CPU profiling (I tried something like ‘nvprof --cpu-profiling on python -c “import tensorflow”’, which results in weird output or a segfault). Running everything without CPU profiling works without any problems. I also tried using cudaProfilerStart() and cudaProfilerStop() together with the “–profile-from-start off” flag, but the result stays the same. I actually need the CPU information though since my custom Op seems to spend a lot of time on the CPU, where it shouldn’t. Is there a way to get nvprof with CPU Profiling working for a Tensorflow script?