Multiple models on DLAs in AGX Xavier 32TOPs

Can the DLAs on the AGX Xavier run multiple models (networks) at the same time? Or are the DLAs bound to a specific network and impractical to swap in at a fast enough rate? Given there are 2 DLA cores, how will it work as the number of models in use on the Xavier increases?

Hi,

It’s recommended to deploy the same model on the same DLA.

But there are two DLAs on the Xavier, you can deploy two models to the different DLA.
You can use setDLACore(…) to specify which DLA core to use.
https://docs.nvidia.com/deeplearning/sdk/tensorrt-api/c_api/classnvinfer1_1_1_i_runtime.html#a09997260fa5ae4c16dfb79f0fe667312

Thanks

Thank you, but this does not quite answer the question. Assume I want to be performing inferences with 4 models at the same time. Can the DLAs be used for all 4 models?
For example:
model1.setDLACore(0)
model2.setDLACore(1)
model3.setDLACore(0)
model4.setDLACore(1)
Then perform inference using all 4 models.

Hi,

This depends on the use case.
There is some overhead when deploying a model.
But to give a further suggestion, we need to know the exact execution time of each model first.

By the way, how about the GPU?
If the GPU is not occupied, it’s recommended to use the following setting for minimal latency.

model1: DLA 0
model2: DLA 1
model3: GPU 0
model4: GPU 0

Also, you can choose which model to put into the DLA based on the supported layer ratio.
Thanks.