Profiling a python Tensorflow Script with CPU Profiling enabled


I am trying to get detailed information about a custom Tensorflow op using nvprof with “–cpu-profiling” on. Unfortunately simply running “nvprof --cpu-profiling on python” does not seem to work as it never finishes. After playing around with it a little, I think the reason for that is simply importing Tensorflow in the script is already too much for the CPU profiling (I tried something like ‘nvprof --cpu-profiling on python -c “import tensorflow”’, which results in weird output or a segfault). Running everything without CPU profiling works without any problems. I also tried using cudaProfilerStart() and cudaProfilerStop() together with the “–profile-from-start off” flag, but the result stays the same. I actually need the CPU information though since my custom Op seems to spend a lot of time on the CPU, where it shouldn’t. Is there a way to get nvprof with CPU Profiling working for a Tensorflow script?

Sorry to hear about your issues with nvprof. And thank you for reporting problem.

We have looked into the problem and we are able to reproduce at our end. We didn’t see any segfault but hang for both the cases you mentioned. Can you please send the segfault message you are seeing?

We will work on a fix for this issue.

Thanks & Regards

Ok I retried it a couple of times now and did also not get a segfault. I think what I was referring to was a fatal error of the Java Runtime Environment in nvvp when run with the same command as mentioned in my original post (nvvp python -c “import tensorflow” and then activate the option of CPU profiling). I uploaded the log-file of the error here:

I hope that helps a little bit and sorry for the fishy description.

We have reproduced this issue and will be will be working to get a fix into a future release (200474495)