Resnet18 cannot be instantiated more than 3 objects

Please provide the following info (tick the boxes after creating this topic):
Software Version
DRIVE OS 6.0.5
DRIVE OS 6.0.4 (rev. 1)
DRIVE OS 6.0.4 SDK
other

Target Operating System
Linux
QNX
other

Hardware Platform
DRIVE AGX Orin Developer Kit (940-63710-0010-D00)
DRIVE AGX Orin Developer Kit (940-63710-0010-C00)
DRIVE AGX Orin Developer Kit (not sure its number)
other

SDK Manager Version
1.9.1.10844
other

Host Machine Version
native Ubuntu Linux 20.04 Host installed with SDK Manager
native Ubuntu Linux 20.04 Host installed with DRIVE OS Docker Containers
native Ubuntu Linux 18.04 Host installed with DRIVE OS Docker Containers
other
After I converted resnet18.onnx to tensorrt engine with DLA, I want to instantiate 3 objects to do somothing using the converted engine file, but the moment I instantiate the third object I will get an error “[trt] 1: [nvdlaUtils.cpp::deserialize :: 154] Error code 1: DLA (NvMediaDlaInit : Init failed.)”. I don’t know what’s going on, please help.

Dear @liuhaomin1,

I want to instantiate 3 objects to do somothing using the converted engine file

Did you use trtexec or C++ code for this experiment? Are you trying to run 3 models on same DLA in parallel. Could you share repro steps?

  • I use c++ code for this experiment. My code is something like:
1. Model model1("test.engine"); //init model1
2. Model model2("test.engine"); //init model2
3. Model model3("test.engine"); //init model3 will get the error described before
  • The strange is when call model1.~Model() after Model model1(…), Model model3(…) can be inited successfully.
1. Model model1("test.engine"); //init model1
2. model1.~Model();
2. Model model2("test.engine"); //init model2
3. Model model3("test.engine"); //init model3 will be successfully inited

Dear @liuhaomin1,
It looks like you are generating trt model for same DLA. Could you share your repro c++ code? How about generating 3 rd model for different DLA and test? Is it ok?

Does “different DLA” mean when converting an onnx model to trt engine the arg "–useDLACore={nums} " should be set to different num value? And How am I gonna know how many DLACores I have on the device tegra orin?

Sorry , I made a mistake, even I add model1.~Model(); model3 cannot be inited successfully.

I tried to set arg --useDLACore=1 for the 3rd model converting , but I still get the init error

Dear @liuhaomin1,
It has two DLAs. So you can generate two DLA models corresponding to DLA 0 and DLA1 and check running them in parallel. Running two models in parallel on same DLA may not be possible.
You can quickly verify inferencing with two DLA in parallel using trtexec and let us know if you see any issues.

OK, could u please give me an example of how to run 2 models in parallel using trtexec . For example , if I have a model test.onnx , is the command should be like this:

trtexec --onnx=test.onnx --useDLACore=0 --useDLACore=1 .......? 
  • Hi, after I converted my onnx model to 2 trt engine using --useDLACore=0/1, should I explicitly set which dlacore to run in my inference code?
  • And when I do the inference using my trt engine(dlacore=1) , I get an warning

    Does it mean that this engine still runs on dlacore0?

Dear @liuhaomin1,

You can quickly verify inferencing with two DLA in parallel using trtexec

For this you can run two instance of trtexec on two different terminals. One terminal uses useDLACore=0 and another uses useDLAcore=1.

after I converted my onnx model to 2 trt engine using --useDLACore=0/1, should I explicitly set which dlacore to run in my inference code?

Yes.

Hi SivaRamaKrishnaNV, I tried

For this you can run two instance of trtexec on two different terminals. One terminal uses useDLACore=0 and another uses useDLAcore=1.

The 2 engines cannot run in parallel on DLACore0 and DLACore1 respectively. I got the error again

[trt] 1: [nvdlaUtils.cpp::deserialize :: 154] Error code 1: DLA (NvMediaDlaInit : Init failed.)

Does it mean that this device cannot run my models in parallel ? or any solution to my problem?

Dear @liuhaomin1,
Could you share model file to repro issue with trtexec?

resnet18.onnx (44.6 MB)
This is the model I use for test.


Does this error mean that there is just one dlacore ?

There is no update from you for a period, assuming this is not an issue any more.
Hence we are closing this topic. If need further support, please open a new one.
Thanks

Dear @liuhaomin1,
Does this mean when you use --useDLACore=0, it is working and you see issue with --useDLACore=1. I notice from TensorRT model use too much memory on DriveOrin - #6 by liuhaomin1 you seems to have same issue with other models as well.
Can you restart the DRIVE AGX Orin Devkit and check again?

@liuhaomin1
Also check out the DLA github page for samples and resources: Recipes and tools for running deep learning workloads on NVIDIA DLA cores for inference applications.

We have a FAQ page that addresses some common questions that we see developers run into: Deep-Learning-Accelerator-SW/FAQ