I have nano jetson with jetpack 4.3.
I followed your example of yolov3 and installed onnx2trt package for tensort 6.
I have compiled everything and ran the yolo and it seems to work with ~194 milliseconds per inference (only inference, not including post or pre process).
the issues are:
when I change the onnx_yo_tnesorrt.py script to create engine with 16fp I ran the yolo and got 400 milliseconds per inference
2.when changing the yolov3.cfg to get smaller input then 608- for example 416 I got 604 milliseconds per inference.
is there something I am missing? was this exampled optimized for a specific input size and 32 fp?
do you mean running the network in a batch of 5?
20 fps sounds good but I haven’t managed to get inference better than 250 milliseconds with 416 and 16fp.
can you explain what should I change in your example?
so far the things I did:
1.enable fp16 in the create engine
2.change the yolov3.cfg “width” + “height” from 608 to 416
3. change the output shapes to support the new input.
what additional things I need to do in order to get this fps?
Oh, that’s good. I stream frames via an intel realsense and with a yolov3 and OpenCV I reached around 340ms. I enabled fp16, add 416 width/height and enabled jetson clocks. I am not sure how to do your point 3.
Right now I can think of two better ways to improve my result: doing the inference through TensorRT instead of the OpenCV readNet’s and deleting the part where I show the image with the bounding boxes and the image drawing postprocess. That last part costs around 20ms.
And yes, the input of the network would be a batch of 5 images processed at once.
about stage 3- in the sample supplied there are output shapes which influence the network- I have changed them too.
by fp16 I meant floating point 16 bits- as they called it in the SDK of tensor rt.
I guess that your performance will improve a lot if you convert the network to tensorrt.
are you working on nano jetson?
Yes sorry, floating points 16. I don’t expect the performace to improve a lot, just to get slightly better. Actually I am trying to do it in C++ and its kinda tricky.
Hopefully I will test it son on my jetson nano.