Resnet18 cannot be instantiated more than 3 objects

liuhaomin1 · March 10, 2023, 9:31am

Please provide the following info (tick the boxes after creating this topic):
Software Version
DRIVE OS 6.0.5
DRIVE OS 6.0.4 (rev. 1)
DRIVE OS 6.0.4 SDK
other

Target Operating System
Linux
QNX
other

Hardware Platform
DRIVE AGX Orin Developer Kit (940-63710-0010-D00)
DRIVE AGX Orin Developer Kit (940-63710-0010-C00)
DRIVE AGX Orin Developer Kit (not sure its number)
other

SDK Manager Version
1.9.1.10844
other

Host Machine Version
native Ubuntu Linux 20.04 Host installed with SDK Manager
native Ubuntu Linux 20.04 Host installed with DRIVE OS Docker Containers
native Ubuntu Linux 18.04 Host installed with DRIVE OS Docker Containers
other
After I converted resnet18.onnx to tensorrt engine with DLA, I want to instantiate 3 objects to do somothing using the converted engine file, but the moment I instantiate the third object I will get an error “[trt] 1: [nvdlaUtils.cpp::deserialize :: 154] Error code 1: DLA (NvMediaDlaInit : Init failed.)”. I don’t know what’s going on, please help.

SivaRamaKrishnaNV · March 10, 2023, 9:41am

Dear @liuhaomin1,

I want to instantiate 3 objects to do somothing using the converted engine file

Did you use trtexec or C++ code for this experiment? Are you trying to run 3 models on same DLA in parallel. Could you share repro steps?

liuhaomin1 · March 13, 2023, 7:07am

I use c++ code for this experiment. My code is something like:

1. Model model1("test.engine"); //init model1
2. Model model2("test.engine"); //init model2
3. Model model3("test.engine"); //init model3 will get the error described before

The strange is when call model1.~Model() after Model model1(…), Model model3(…) can be inited successfully.

1. Model model1("test.engine"); //init model1
2. model1.~Model();
2. Model model2("test.engine"); //init model2
3. Model model3("test.engine"); //init model3 will be successfully inited

SivaRamaKrishnaNV · March 13, 2023, 8:49am

Dear @liuhaomin1,
It looks like you are generating trt model for same DLA. Could you share your repro c++ code? How about generating 3 rd model for different DLA and test? Is it ok?

liuhaomin1 · March 13, 2023, 9:34am

Does “different DLA” mean when converting an onnx model to trt engine the arg "–useDLACore={nums} " should be set to different num value? And How am I gonna know how many DLACores I have on the device tegra orin?

liuhaomin1 · March 13, 2023, 10:24am

Sorry , I made a mistake, even I add model1.~Model(); model3 cannot be inited successfully.

liuhaomin1 · March 13, 2023, 10:26am

I tried to set arg --useDLACore=1 for the 3rd model converting , but I still get the init error

SivaRamaKrishnaNV · March 13, 2023, 10:48am

Dear @liuhaomin1,
It has two DLAs. So you can generate two DLA models corresponding to DLA 0 and DLA1 and check running them in parallel. Running two models in parallel on same DLA may not be possible.
You can quickly verify inferencing with two DLA in parallel using trtexec and let us know if you see any issues.

liuhaomin1 · March 13, 2023, 11:47am

OK， could u please give me an example of how to run 2 models in parallel using trtexec . For example , if I have a model test.onnx , is the command should be like this:

trtexec --onnx=test.onnx --useDLACore=0 --useDLACore=1 .......?

liuhaomin1 · March 14, 2023, 8:28am

Hi, after I converted my onnx model to 2 trt engine using --useDLACore=0/1, should I explicitly set which dlacore to run in my inference code?
And when I do the inference using my trt engine(dlacore=1) , I get an warning

image1015×26 2.14 KB

Does it mean that this engine still runs on dlacore0?

SivaRamaKrishnaNV · March 14, 2023, 8:33am

Dear @liuhaomin1,

You can quickly verify inferencing with two DLA in parallel using trtexec

For this you can run two instance of trtexec on two different terminals. One terminal uses useDLACore=0 and another uses useDLAcore=1.

after I converted my onnx model to 2 trt engine using --useDLACore=0/1, should I explicitly set which dlacore to run in my inference code?

Yes.

liuhaomin1 · March 14, 2023, 11:21am

Hi SivaRamaKrishnaNV, I tried

For this you can run two instance of trtexec on two different terminals. One terminal uses useDLACore=0 and another uses useDLAcore=1.

The 2 engines cannot run in parallel on DLACore0 and DLACore1 respectively. I got the error again

[trt] 1: [nvdlaUtils.cpp::deserialize :: 154] Error code 1: DLA (NvMediaDlaInit : Init failed.)

Does it mean that this device cannot run my models in parallel ? or any solution to my problem?

SivaRamaKrishnaNV · March 14, 2023, 12:32pm

Dear @liuhaomin1,
Could you share model file to repro issue with trtexec?

liuhaomin1 · March 15, 2023, 8:22am

resnet18.onnx (44.6 MB)
This is the model I use for test.

Does this error mean that there is just one dlacore ?

SivaRamaKrishnaNV · March 15, 2023, 8:51am

There is no update from you for a period, assuming this is not an issue any more.
Hence we are closing this topic. If need further support, please open a new one.
Thanks

Dear @liuhaomin1,
Does this mean when you use --useDLACore=0, it is working and you see issue with --useDLACore=1. I notice from TensorRT model use too much memory on DriveOrin - #6 by liuhaomin1 you seems to have same issue with other models as well.
Can you restart the DRIVE AGX Orin Devkit and check again?

ramc · April 26, 2023, 2:47pm

@liuhaomin1
Also check out the DLA github page for samples and resources: Recipes and tools for running deep learning workloads on NVIDIA DLA cores for inference applications.

We have a FAQ page that addresses some common questions that we see developers run into: Deep-Learning-Accelerator-SW/FAQ