As you know the jetson xavier nx has one GPU and two DLAs accelerators for ML models, and I want to use one of TLT models such as peoplenet34 in deepstream on this platform as efficiently as possible.
One solution is that I have to generate 3 engine model from etlt for GPU and DLAs and load three times of one model for each, this cause to use three times of memory.
Q1- The second solution is that load model two times, one for GPU and one for two DLAs as shared model, Is it possible? How?
Q2- The third solution or very efficient method is that use one shared model for GPU and two DLAs, and this solution load one time model. Becasue the generated engines of GPU and DLA have different operation, I guess this way is fail, right? but I don’t know the second solution is possible or no.
You can run DLAs + GPU with a single process.
Please check the following document for information :