Hi, please refer to the below MLCommons benchmark link. For the AGX Orin row, it is mentioned that both GPUs and the two DLAs are used for TRT inference.
So, I wanted to know how is the inference run parallely on both the GPU and the DLA.
Also, how can both the DLAs be utilized at the same time while doing the inference.
MLCommons link : v3.0 Results | MLCommons
And, in the corresponding github repo also, I couldn’t find any lines of code which indicate the above behaviour of using both the DLAs concurrently or running inference across multiple devices parallely.
Resnet50 TRT Inference code for reference : https://github.com/mlcommons/inference_results_v3.0/blob/main/closed/NVIDIA/code/resnet50/tensorrt/ResNet50.py