Description
- command: nvprof -o tensorrtx.nvvp ./yolov5 -d yolov5s.engine …/samples
- output:
inference time: 2ms
inference time: 2ms
==571== Generated result file: /app/tensorrtx/yolov5/build/tensorrtx.nvvp
- expect:
as graph showed, there is more than 10 streams, why?
in code:
// Create stream
cudaStream_t stream;
CUDA_CHECK(cudaStreamCreate(&stream));
just call once!
in my opinion, there shoud be 1 or 2((because of default stream)) stream.
Environment
TensorRT Version: 8.2.5-1+cuda11.4
GPU Type: V100
Nvidia Driver Version: 515.65.01
CUDA Version: 11.7
CUDNN Version:
Operating System + Version: Ubuntu 20.04.4 LTS
Baremetal or Container (if container which image + tag): container nvcr.io/nvidia/tensorrt:22.05-py3
Relevant Files
tensorrtx.nvvp (692 KB)
Steps To Reproduce
git clone https://github.com/wang-xinyu/tensorrtx.gitcd tensorrtx/
mkdir my
cd my/
git clone https://github.com/ultralytics/yolov5.git
cd ..
cp yolov5/gen_wts.py my/yolov5/
cd my/yolov5/
cp /app/yolov5/yolov5s.pt .
python gen_wts.py -w yolov5s.pt -o yolov5s.wts
cd ../..
cd yolov5/
mkdir build
cd build/
cp /app/tensorrtx/my/yolov5/yolov5s.wts .
cmake ..
make
./yolov5 -s yolov5s.wts yolov5s.engine s
./yolov5 -d yolov5s.engine ../samples
nvprof -o tensorrtx.nvvp ./yolov5 -d yolov5s.engine ../samples