I use tensorRT c++ API for inferencing on my jetson Xavier and I use deserializeCudaEngine to create engine object from .plan files, everything works fine except these 2 problems.
runtime->setDLACore(1) does not use DLA core but gpu. But trtexec uses DLA modules without this problem.
GPU consumption is never above 50-60% (I load lot of images from folder and do inferencing with batch size 1 - image by image). Also when I run trtexec with batch size 1 it manages to use 100% of GPU.
What could be the problem ?