Tensorrtx yolov5 cpp code has more than 10 streams, why?


  • command: nvprof -o tensorrtx.nvvp ./yolov5 -d yolov5s.engine …/samples
  • output:
inference time: 2ms
inference time: 2ms
==571== Generated result file: /app/tensorrtx/yolov5/build/tensorrtx.nvvp


  • expect:
as graph showed, there is more than 10 streams, why? 
in code:
    // Create stream
    cudaStream_t stream;

just call once!

in my opinion, there shoud be 1 or 2((because of default stream)) stream.


TensorRT Version: 8.2.5-1+cuda11.4
GPU Type: V100
Nvidia Driver Version: 515.65.01
CUDA Version: 11.7
CUDNN Version:
Operating System + Version: Ubuntu 20.04.4 LTS
Baremetal or Container (if container which image + tag): container nvcr.io/nvidia/tensorrt:22.05-py3

Relevant Files

tensorrtx.nvvp (692 KB)

Steps To Reproduce

git clone https://github.com/wang-xinyu/tensorrtx.gitcd tensorrtx/
mkdir my
cd my/
git clone https://github.com/ultralytics/yolov5.git
cd ..
cp yolov5/gen_wts.py my/yolov5/
cd  my/yolov5/
cp /app/yolov5/yolov5s.pt .
python gen_wts.py -w yolov5s.pt -o yolov5s.wts
cd ../..
cd yolov5/
mkdir build
cd build/
cp /app/tensorrtx/my/yolov5/yolov5s.wts .
cmake ..
./yolov5 -s yolov5s.wts yolov5s.engine s
./yolov5 -d yolov5s.engine ../samples
nvprof -o tensorrtx.nvvp ./yolov5 -d yolov5s.engine ../samples


We recommend you to please use Nvidia samples.

If you still face the same issue, please share with us the minimal issue repro model and script for better debugging.

Thank you.