Slow video streaming while using pytorch with cuda

I’m using Jetson Xavier and I have created python multiprocess application for video analytics, containing the following 2 processes (Entirely Separate with no inter process communication):

  1. Simple process for capture video source by openCV and showing it with openCV imshow .
  2. Process that taking constant tenzor and run it through a pytorch nn model with cuda in an infinite loop.
    (I started with 2 connected processes for capture the video and then process the frames concurrent but to debug the problem I broke the connection and ran it on constant tensor)

The problem is that when only the video process is running the video is smooth. But when the NN process works concurrently the video become not smooth, that, even though the fps for the video process is 25 like I designed it by waitkey.
I suspect that I’m using all my gpu resources for the NN and then the frame show rendering is on hold for a few milliseconds until the gpu is free for render. If it’s indeed the reason for my problem, can I determine priority for gpu usage between the two processes? Do you have another idea how to solve it?

This is the pseudo code (the original is offline and I can’t upload it):

Video = cv2.capture (video_source)
While True
	Frame = Video.get()
	# Check fps


model = # loading the pytorch NN model
tenzor = # creating zeros pytorch cuda tenzor
while True

Attaching tegrastats from running time


Please flash Xavier with maxn config and try again:

For checking system loading, you can utilize tegrastats

I tried it and saw no difference,
Do you have other suggestions?



Your GPU utilization is 99% (GR3D_FREQ).

When inferencing NN with GPU, it takes almost all the GPU resource and limits the performance of streaming.
Moreover, it looks like you keep pushing NN jobs(while loop) into GPU, which make the resource occupied all the time.

Actually, it is a little bit tricky to handle difference GPU tasks at the same time.
Here are two suggestions for you:

1. Try our DeepstreamSDK.
We optimize the pipeline from capture -> inference -> display, and the CUDA task only be triggered when needed.
It will help the resource occupied problem of your usecase.

2. Try to run your DNN on DLA, which can offload the GPU for display or video usage.