The command that I use for tensorrt execution after building the engine: trtexec --iterations=100 --warmUp=2000 --batch=1 --useDLACore=0 --dumpProfile --loadEngine=alexnet_batch1_dla.engine
Another question: I guess we can’t install any other Jetpack version 5.0.1/5.0, so directly no different version other than TensorRT 8.4.0. May I also ask this issue is reproducible in older TensorRT versions? I’m asking this because newer JetPack version may take some time to be released, which is quite understandable.
If you haven’t already, I would recommend testing the models with INT8 precision enabled.
On a related note, we now have the following GitHub project to help getting started with the DLA. It covers defining and profiling a model, tweaking the model for better DLA compatibility, and performing INT8 calibration. It may help introduce you to some concepts related to working with the DLA.
Please let me know if you have any questions, or feel free to open an issue with feedback on the tutorial if you do take a look.
Your work seems pretty awesome. I had been planning to create such a repo for a while since there is no detailed one. Thanks for sharing this publicly!
However, I could not observe any significant change in AGX Orin in your repo either. Could you please post any neural network results here on AGX Orin if you have any?
We have checked this issue with our internal team and this is expected.
Due to the difference in hardware specification, a relative increase in latency is expected when running FP16 conv operations on Orin DLA as compared to running on Xavier DLA.
This is totally understandable. I appreciate your help on this @AastaLLL . One more note to my initial post: The power results on Orin AGX seem to be expected compared to Xavier AGX.
May I ask for further details on the Orin DLA? Being 5-10x slower than the GPU on Orin makes DLA unusable for any software purposes?
Additionally, we have been introduced as 9x speed-up for Orin DLA. May I ask if there is any type of application domain that we can use DLA with such performance? Sorry to push for frequent questions, but better performance on the DLA side motivated me a lot for my research!
Orin’s DLA has more int8 dense TOPs but fewer fp16 TOPs.
So if you run the model in int8 mode, it’s expected to get better performance compared to Xavier’s DLA.