I’ve converted a yolov4 darknet model to an onnx using some conversion script I found online (GitHub - Tianxiaomo/pytorch-YOLOv4: PyTorch ,ONNX and TensorRT implementation of YOLOv4). I then run the generated onnx using tensorRT. I had no errors during the conversion. All output from the tensorRT inference of yolov4 are as it should be so I don’t think I’m doing anything wrong there.
I’ve observed that the speed is not as it should be. I’m working under the assumption that the tensorRT engines should run faster than the darknet models.
The yolov4 tensorRT engine seems to be running slower than the yolov4 darknet. Any reason why?
For comparison, I used a yolov3 onnx (converted using a different script). The tensorRT engine runs faster than the darknet in this case. I use the same inference script as I did for the yolov4.
Thanks for the link and for following up. I’m familiar with the idea of batching for optimization in TensorRT. I’m using a batch of 4 images from now. I am attaching my script here. onnxExpt.cpp (11.0 KB)
The script is a simplified implementation of what I have so far. I load an image “TestImage.jpg”, do some post processing using opencv functions and copy the image over to the GPU for inference. I’ve removed post-processing steps to keep things simpler. I measure the time elapsed from the time when the enqueue function is called to when the stream is synchronized and the output is copied out to the host.
I have also attached a model (yolov4_4_3_416_416.onnx). My script converts this onnx model to a .trt file.
Hi,
The pre and post-processing steps depend so strongly on the particular application, we mostly consider the latency and throughput of the network inference, excluding the data pre and post-processing overhead.
We suggest you to check the profiling data to see the bottleneck and share data as well along with test image so we can better help.
Thanks!