What other limitations are there on the number of TensorRT contexts running concurrently on the DLA？

sylphiette.m · January 5, 2025, 2:49pm

According to the official documentation, each DLA on the Jetson AGX Orin can theoretically run 16 TensorRT contexts concurrently. But when actually running, I can only concurrently have 10 TensorRT contexts per DLA.

The error is reported at the 11th use of the createExecutionContext() function. Here is the corresponding output from the verbose log:

Total per-runner device persistent memory is 0
Total per-runner host persistent memory is 96
Allocated activation device memory of size 630784
1: [cudlaUtils.cpp::LoadableManager::48] Error Code 1: DLA (Failed to deserialize DLA loadable)

So I’d like to ask: what other limitations are there on the number of TensorRT contexts running concurrently on the DLA？

carolyuu · January 5, 2025, 3:00pm

Hi,
Here are some suggestions for the common issues:

1. Performance

Please run the below command before benchmarking deep learning use case:

$ sudo nvpmodel -m 0
$ sudo jetson_clocks

2. Installation

Installation guide of deep learning frameworks on Jetson:

TensorFlow: Installing TensorFlow for Jetson Platform - NVIDIA Docs
PyTorch: Installing PyTorch for Jetson Platform - NVIDIA Docs
We also have containers that have frameworks preinstalled:
Data Science, Machine Learning, AI, HPC Containers | NVIDIA NGC

3. Tutorial

Startup deep learning tutorial:

Jetson-inference: Hello AI World guide to deploying deep-learning inference networks and deep vision primitives with TensorRT and NVIDIA Jetson
TensorRT sample: Jetson/L4T/TRT Customized Example - eLinux.org

4. Report issue

If these suggestions don’t help and you want to report an issue to us, please attach the model, command/step, and the customized app (if any) with us to reproduce locally.

Thanks!

AastaLLL · January 6, 2025, 3:00am

Hi,

To be more precise, only 16 DLA loadable can be loaded concurrently per core.

But TensorRT might split a model into multiple DLA loadables.
So the real concurrent TensorRT engine on DLA depends on the model architecture.

Thanks.

sylphiette.m · January 6, 2025, 5:40am

Does this mean that my 10 TensorRT contexts are split into 16 DLA loadables, reaching the theoretical cap of 16?

AastaLLL · January 8, 2025, 6:11am

Hi,

It is possible.

To know more about the TensorRT behavior, please share the conversion log with --verbose.
It contains the details of how TensorRT placement the inference tasks.

Thanks.

sylphiette.m · January 9, 2025, 3:08am

Here is my point: My TensorRT contexts are all created from the same engine, and theoretically each context corresponds to the same number of DLA loadables (not sure if this theory is correct). But I don’t get error when running 10 TensorRT contexts concurrently, due to the upper limit of running DLA loadables concurrently on each DLA is 16, if the number of DLA loadables corresponding to each TensorRT context is greater than or equal to 2, then the number of DLA loadables running concurrently is If the number of DLA loadables corresponding to each TensorRT context is greater than or equal to 2, then the number of DLA loadables running concurrently is greater than or equal to 20, which is more than the upper limit and should report an error, but because in fact there is no error, so each TensorRT context can only correspond to one DLA loadable, so when I run 11 TensorRT contexts concurrently, it will only correspond to 11 DLA loadables, and it doesn’t reach the upper limit of 16 DLA loadables.

Attachment is my verbose log. My program is compiling TensorRT engines for each of the 4 onnx models, with two of the engines running on two different DLA core.
build_plan.log (1006.8 KB)

AastaLLL · January 9, 2025, 8:34am

Hi,

Based on your log, there is no layer running on the DLA.
Please help to check if. you convert the model as expected first.

---------- Layers Running on DLA ----------
---------- Layers Running on GPU ----------
[GpuLayer] SCALE: resnetv22_batchnorm0_fwd
[GpuLayer] CONVOLUTION: resnetv22_conv0_fwd + resnetv22_batchnorm1_fwd + resnetv22_relu0_fwd
[GpuLayer] POOLING: resnetv22_pool0_fwd
[GpuLayer] SCALE: resnetv22_stage1_batchnorm0_fwd + resnetv22_stage1_activation0
[GpuLayer] CONVOLUTION: resnetv22_stage1_conv0_fwd + resnetv22_stage1_batchnorm1_fwd + resnetv22_stage1_activation1
[GpuLayer] CONVOLUTION: resnetv22_stage1_conv1_fwd + resnetv22_stage1__plus0
[GpuLayer] SCALE: resnetv22_stage1_batchnorm2_fwd + resnetv22_stage1_activation2
[GpuLayer] CONVOLUTION: resnetv22_stage1_conv2_fwd + resnetv22_stage1_batchnorm3_fwd + resnetv22_stage1_activation3
[GpuLayer] CONVOLUTION: resnetv22_stage1_conv3_fwd + resnetv22_stage1__plus1

Thanks.

sylphiette.m · January 10, 2025, 2:47am

Sorry, this attachment is a verbose log of my two DLA models.
DLA_build_log.log (54.9 KB)

AastaLLL · January 16, 2025, 8:15am

Hi,

The model only has a single DLA loadable:

---------- Layers Running on DLA ----------
[DlaLayer] {ForeignNode[resnetv22_stage2_batchnorm0_fwd...resnetv22_stage2__plus1]}
---------- Layers Running on GPU ----------

Would you mind also checking the RAM usage?
Could you check the overall Managed SRAM / Local DRAM / Global DRAM to see if there are still resources remaining for the 11th loadable?

Thanks.

sylphiette.m · January 23, 2025, 9:04am

Sorry, how to check the RAM usage? I don’t know how to see if there are still resources remaining.

AastaLLL · February 10, 2025, 6:50am

Hi,

Sorry for the late update.

You can find this info in the TensorRT log as well.
For example:

Memory consumption details:
	Pool Sizes: Managed SRAM = 0.5 MiB,	Local DRAM = 1024 MiB,	Global DRAM = 512 MiB
	Required: Managed SRAM = 0.5 MiB,	Local DRAM = 4 MiB,	Global DRAM = 4 MiB

Thanks.

system · March 12, 2025, 2:26am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to know when to stop generating new DLA context? Jetson AGX Xavier tensorrt , dla	7	1788	October 18, 2021
Create multiple contexts for DLA Jetson Xavier NX dla	2	1023	September 19, 2022
Issues when using DLA with TensorRT 7.1.3 compared to TensorRT 6.0.1 Jetson AGX Xavier nvbugs , dla	12	1871	January 12, 2022
How many/much contexts DLA memory can afford? Jetson AGX Xavier tensorrt	2	594	October 18, 2021
DLA and GPU cores at the same time Jetson AGX Xavier dla	20	10451	October 18, 2021
Running tensorRT has a lot of warning message about DLA Jetson AGX Xavier tensorrt , dla	2	925	October 18, 2021
Unexpected performance loss when using GPU, DLA0, DLA1 simultaneously Jetson Xavier NX tensorrt	6	1114	October 18, 2021
Jetpack 4.3 DP DLA running Jetson AGX Xavier	5	801	October 18, 2021
DLA Error in deserialize Jetson AGX Xavier dla	2	416	July 14, 2022
DLA usage on my Xavier Jetson AGX Xavier dla	5	1746	October 18, 2021

What other limitations are there on the number of TensorRT contexts running concurrently on the DLA？

1. Performance

2. Installation

3. Tutorial

4. Report issue

Related topics