Issues when using DLA with TensorRT 7.1.3 compared to TensorRT 6.0.1

chjej202 · August 20, 2020, 3:55pm

Description

I have two Jetson AGX Xavier boards with different Jetpack versions.
One is Jetpack 4.3 (with TensorRT 6.0.1) and the other is Jetpack 4.4 (with TensorRT 7.1.3).

I will call these two devices as Device A and Device B.

Device A: Jetson AGX Xavier with Jetpack 4.3
Device B: Jetson AGX Xavier with Jetpack 4.4

I ran the same code on DLA of both devices, but I found some differences.
I set MAXN nvp model and Jetson_clocks on both devices

1. Inference time with Device B is longer than Device A.

When I ran my code on Device A, it takes 9.4 seconds, but in Device B it took 11.9 seconds.
I don’t know why Device B took more time.

2. Device B has a limitation on creating multiple contexts

I created multiple contexts like the following example codes.

for(j = 0 ; j < NUM_CONTEXTS ; j++)
{
context[j] = engine->createExecutionContext();
assert(context[j]!= nullptr );
}

For Device A, I can create multiple execution contexts, but Device B occurs an error when the certain amount of contexts are created.
For this reason, I can create 8 contexts with Device A, but I can only create 4 contexts in Device B.

Error in Device B looks like the below.

NvMapMemAllocInternalTagged: 1074810371 error 12
NvMapMemHandleAlloc: error 12
NVMEDIA_DLA : 1686, ERROR: runtime loadBare failed. err: 0x6.
../rtExt/dla/native/dlaUtils.cpp (166) - DLA Error in deserialize: 7 (NvMediaDlaLoadLoadable : load loadable failed.)
FAILED_ALLOCATION: std::exception

Another difference is that createExecutionContext() call in Device B takes longer time than createExecutionContext() in Device A. In Device B, it takes a second to create each context, but Device A immediately creates multiple execution contexts.

Do you know why these issues are happened in Jetpack 4.4 with TensorRT 7.1.3?

AakankshaS · August 20, 2020, 4:54pm

Hi @chjej202
Jetson team should be able to help you better here.
Thanks!

AastaLLL · August 21, 2020, 2:33am

Hi,

Would you mind to share your model and the source that can reproduce this issue?
Is this reproducible with trtexec?

Thanks.

chjej202 · August 21, 2020, 10:36am

Hi,

I tried with trtexec and found that this issue is reproducible.

I ran the following command:

user@nvidia:/usr/src/tensorrt/data/resnet50$ …/…/bin/trtexec --avgRuns=300 --deploy=ResNet50_N2.prototxt --fp16 --batch=1 --iterations=300 --output=prob --useDLACore=0 --useSpinWait --allowGPUFallback --streams=8

By changing the streams option, you can change the number of execution contexts to be created.

In Device A (Jetson AGX Xavier with Jetpack 4.3), it occurs no error and shows the following execution time results.

“Average over 300 runs is 6.04887 ms (host walltime is 6.08014 ms, 99% percentile time is 6.21517).”

In Device B (Jetson AGX Xavier with Jetpack 4.4), it shows an error with the following errors on the screen.

NvMapMemAllocInternalTagged: 1074810371 error 12
NvMapMemHandleAlloc: error 12
NVMEDIA_DLA : 1686, ERROR: runtime loadBare failed. err: 0x6.
[08/21/2020-19:23:30] [E] [TRT] …/rtExt/dla/native/dlaUtils.cpp (166) - DLA Error in deserialize: 7 (NvMediaDlaLoadLoadable : load loadable failed.)
[08/21/2020-19:23:30] [E] [TRT] FAILED_ALLOCATION: std::exception

The above error messages are shown 4 times, so it means 4 execution contexts are failed to be created and the rest 4 execution contexts are created properly.

Also, execution time shown on the screen was the following message which took longer than Device A.

Average on 300 runs - GPU latency: 7.01389 ms - Host latency: 7.03811 ms (end to end 7.04715 ms, enqueue 0.338025 ms)

AastaLLL · August 28, 2020, 7:12am

Hi,

Thanks for the reporting.
We can reproduce this issue in our environment.
Will check this with our internal team and update more information with you later.

Here is a memory limitation in 1GiB for DLA intermediate tensor data.
This error may hit the limitation but we need to check this with our internal team first.

Thanks.

chjej202 · September 8, 2020, 7:38am

Thank you. I will wait for your reply.

AastaLLL · September 29, 2020, 5:53am

Hi,

Thanks for your patience.
We are still working on this issue and will keep you updated.

This limitation comes from the memory allocation strategy of DLA, which can only allows 4 runtime instance.
We are working on a different way for allocation.
Will let you know once it is ready.

Thanks.

chjej202 · December 24, 2020, 5:48am

Hi,

I found that Jetpack 4.5 will be released in Jan 2021. Will this issue be fixed together?

AastaLLL · January 4, 2021, 2:10am

Hi,

Sorry that our internal team is still working on this issue.
We will fix this in our future release rather than JetPack4.5.

Thanks.

chjej202 · July 16, 2021, 5:54am

Hi,

I found that Jetpack 4.6 will be released in July 2021. Will this issue be fixed together?

alexm5m91 · December 29, 2021, 10:39pm

Any updates on this? Running into the same issue.

kayccc · January 12, 2022, 3:22am

Please open a new one for your issue with more informaion. Thanks

Topic		Replies	Views
Create multiple contexts for DLA Jetson Xavier NX dla	1	1107	September 19, 2022
Cuda Memory Error when enabling the DLA Jetson AGX Xavier	17	2370	August 5, 2019
Jetpack 4.3 DP DLA running Jetson AGX Xavier	4	887	November 26, 2019
Cannot create DLA engine using trtexec on Xavier Jetson AGX Xavier tensorrt , dla	7	1189	July 1, 2022
DLA Error in deserialize Jetson AGX Xavier dla	1	482	July 14, 2022
Running tensorflow models on DLA Jetson AGX Xavier	8	1829	July 24, 2019
Error when use DLA with yoloV4 tensorRT Jetson AGX Xavier tensorrt , dla	6	1093	April 26, 2023
How to know when to stop generating new DLA context? Jetson AGX Xavier tensorrt , dla	6	1926	August 24, 2021
Can not make tensorrt work on DLA (Jetson Xavier) Jetson AGX Xavier tensorrt , dla	2	688	November 18, 2020
What other limitations are there on the number of TensorRT contexts running concurrently on the DLA？ Jetson AGX Orin tensorrt , dla	10	324	February 10, 2025

Issues when using DLA with TensorRT 7.1.3 compared to TensorRT 6.0.1

Description

Related topics