GPU slowdown with multiple streaming


I have an application that is running a classifier and detector in TensorRT on a T4. Performance is good when I run one stream. However, I can’t reproduce the same performance using multiple streams.
I’m developing using C++.

When I use 1 stream…. I get the following…

GPU memory: 1335 MB
Average time taken for detection: 4.64ms
CPU Usage: 19.8%
GPU utilization is around 70% when the detector is being used but can spike to over 90% when the classifier is called.

When I use 2 streams… I get the following…

GPU memory: 2670 MB
Average time taken for detection: 38ms
CPU Usage: 54% (27% for each stream)
GPU utilization is at 100%

When I use 1 stream, the inference from TensorRT (for both the detection and classification) is fast and I can process up to 40 frames per second. However, there is a drastic slowdown when I use 2 streams.

Any ideas?

2Stream.bmp (5.93 MB)
1Stream.bmp (5.93 MB)

i meet the same question, do you solved the problem?

I had to change my implementation to do batch processing instead. So I run the detection in batches of 50. The GPU utilization spikes to around 40% when an inference is being made and goes back down to 0 when the inference is complete. It essentially allows the GPU core usage to be staggered among multiple streams.

Hope this helps.

I am working on a similar use case where I want to perform inference on 15 parallel streams coming from 15 different video files. My models include two different kinds of detectors and one classifier, all written in tensorflow. I have programming in python. Can you help me understand what would be the best to handle the 15 streams in parallel and pass them as batches to the tensorflow networks.

Did you manage to batch and perform multi-stream inference?