I am using NS to profile a tensorrt backend program. but there is a empty gap in stream timeline. I am wandering why there is a empty gap. It seams high cost tensorrt layer happen in this time.
PS:
I use nsys client at server to collect data, command is: nsys profile --cuda-um-gpu-page-faults=true --delay=60 --duration=300 --output=normal.nsys-rep ./program
It is a little hard to tell from a screen shot, but I think the GPU is starving because the CPU is busy doing something else and is not feeding it work. I would look a the backtrace (hover over text) on the OS runtime library trace line there on that thread. I think it is blocked waiting for some other thread.
thank you for your reply, The backtrace of the OS runtime library seems wait at cuda library(the backtrace is shown below). I can’t upload the report because of upload file size limit .
Anyway, It will be appreciated for you to recheck the result.
@kazmali, you can upload the report to your personal drive and share the link here or in a DM to me. Not much I can help with based on the screenshot. It shows that the CPU thread is blocked from making progress because it is waiting on a pthread_rwlock_ lock to return. This is causing the GPU to starve because it is not given enough work to do before the CPU thread gets blocked. You will need to track down what is holding the lock up. You could use the corresponding NVTX range to zero in on the code that is trying to acquire the lock.