Can the DLAs on the AGX Xavier run multiple models (networks) at the same time? Or are the DLAs bound to a specific network and impractical to swap in at a fast enough rate? Given there are 2 DLA cores, how will it work as the number of models in use on the Xavier increases?
It’s recommended to deploy the same model on the same DLA.
But there are two DLAs on the Xavier, you can deploy two models to the different DLA.
You can use setDLACore(…) to specify which DLA core to use.
Thank you, but this does not quite answer the question. Assume I want to be performing inferences with 4 models at the same time. Can the DLAs be used for all 4 models?
Then perform inference using all 4 models.
This depends on the use case.
There is some overhead when deploying a model.
But to give a further suggestion, we need to know the exact execution time of each model first.
By the way, how about the GPU?
If the GPU is not occupied, it’s recommended to use the following setting for minimal latency.
model1: DLA 0
model2: DLA 1
model3: GPU 0
model4: GPU 0
Also, you can choose which model to put into the DLA based on the supported layer ratio.