and having one question: most of the models(ResnetXt-50, ResNet-34, SSD-MobileNetV1) can’t be fully optimized to run on DLAs 100% (so the GPUFallback takes care of the unsupported operations on GPU), then how to report the performance and power on DLAs in the pasted post?
yes, I used to create the model’s Trt engine first, then use trtexec to do the inferencing on Jetson AGX Orin on DLA. From the log, I can see some layers are running on GPU, so I guess the reported performance number by trtexec is combination from both DLA and GPU. I am wondering how the post can separate that clearly and claim 3-5x power efficiency (DLA vs. GPU).