Nsight System Shows Blocked State

In my application, I am launching multiple threads to grab frames from multiple cameras. Each thread is responsible for one camera and when a frame is grabbed, an object detection model, deployed on TensorRT will be utilized to detect the targeted objects in the grabbed frames. To boost some of the functionalities, I use OpenCV compiled with CUDA as well. I noticed when the number of launched threads goes up, I experience a huge amount of delay, whereas the delay value looks much lower when I launch 1 to 3 threads. I tried to profile my application with Nsight System to find the bottleneck. When I assesd the runtimeof the app on Nsight System, saw blocked states at multiple reports as its shown below :

Can anyone explain if the witnessed delay is related to these blocked states and if so, how can I get rid of them?

1 Like

I can’t really say much about what is going on specifically with your application. You know the code far better than I. I also don’t have the times on the screenshot, but based on the width of that block, yeah, I think that is probably a problem.

I don’t know what options you used to run Nsys but there is a good chance that you ran it with CPU backtraces on. If you did that, you should be able to use the top-down or bottom up stacks at in the bottom pane to find out what functions/code paths are dominating the runtime. If you zoom in on the timeline, the backtraces will zoom into the same range, so you can focus on just what is going on before and after the delay that you are seeing to narrow in your search.

Hope that helps.

1 Like

@hwilper Thank you so much for your quick answer.
My OS is Windows 10 and TensorRT version is 8.6.1 if its helpful. I use C++ standard thread library to launch the threads.
I have just recently started working with nsight and dont know if CPU backtraces is on or not.
I attached a wider image and if more images are needed just let me know.:

What were the options you ran the tool with?

Can you try the drop down box on the event view and see if top-down or bottom up are listed there? if they are that means you had sampling on. If not, do another run with CPU sampling turned on.

@hwilper I checked the event view out and there are some lines in a list, like the image below :
Does it show the root cause of the problem ?

That is the correct screen and has the information I was looking for. I would probably switch over to the top-down view if it is available. Then open the bullets for the expensive operations and see what the code paths are there and how much time is being spent in the different functions.