I use tensorRT c++ API for inferencing on my jetson Xavier and I use deserializeCudaEngine to create engine object from .plan files, everything works fine except these 2 problems.
runtime->setDLACore(1) does not use DLA core but gpu. But trtexec uses DLA modules without this problem.
GPU consumption is never above 50-60% (I load lot of images from folder and do inferencing with batch size 1 - image by image). Also when I run trtexec with batch size 1 it manages to use 100% of GPU.
1. This may be model-dependent.
Not all the operations are supported by DLA.
TensorRT will run the layer on GPU if DLA doesn’t support it. (when fallback is enabled)
2. May I know if you use the sample model for your app and trtexec.
If yes, it looks like that there is something incorrect in your application?
You can check the trtexec sample code for more information.
1. Suppose not all the layers is supported by DLA.
So there are some I/O transfer between GPU and DLA.
Do you see any fallback log when compiling with TensorrRT?
2. You may meet some issue about minimizing the multimedia pipeline’s overhead. (ex.memcpy, bandwidth, …)
Please check if our DeepStream SDK can help. https://developer.nvidia.com/deepstream-sdk
You look like you use different model between your app and trtexec.
Although the model achieves the same thing, it may have a different operation and lead to different TensorRT implementation.
Would you mind to test them with the same model first?
Thanks.
When I use exact same model(uff) between my app and trtexec, trtexec can use DLA but my app does not.
If I output .plan file from trtexec with --use-dla=0 from that same uff and than use that resulting .plan file my app can use DLA.
Only difference is that for my app I used uff_to_plan.cpp converter and in second case I used trtexec to produce .plan file.
Please noticed that when you creating the TensorRT PLAN, the implementation (including hardware and memory …) is decided.
So the PLAN cannot be used crossing system. So as DLA.