Trying using contemporary DLA and GPU on Jetson NX

mgalimberti · January 31, 2023, 4:43pm

**• Hardware Platform = Jetson Xavier NX **
• DeepStream Version = DS-6.0.1
**• JetPack Version = JP-4.2 **
• TensorRT Version = 8.2.1.8

**• Issue Type: not clear if we can use GPU + DLA0 + DLA1 from same process **

• Requirement details

We are trying to run usual Nvidia PeopleNet ver. 2.3.2 on Nvidia NX dev board using contemporary GPU + DLA.

For this we are using deepsteam-app with usual deepstream_Config_file.txt + config_infer_file.txt

Consider that we are already using PeopleNet ver. 2.3.2 on NX in our application and it is working well and correctly using GPU, but when trying to activate DLA we see strange behaviour and performances goes down at least 6 times and no video output is got.

In details: like shown below we declared DLA activation using below config_infer_file [property] context, (from DS documentaiotn is not clear where to decalred DLA activation ) … and we see that a new DLA.engine is created completely different from GPU.engine

Question: is it correct to declare DLA in [property] context ?

[property]
gpu-id=0
net-scale-factor=0.0039215697906911373
tlt-model-key=tlt_encode
#
enable-dla=1           <<<<<<<<<<<<<<<<<<
use-dla-core=0         <<<<<<<<<<<<<<<<<<
#
tlt-encoded-model=../../models/tao_pretrained_models/peopleNet/V2.3.2/resnet34_peoplenet_pruned_int8_v2_3_2_quantized.etlt
labelfile-path=../../models/tao_pretrained_models/peopleNet/V2.3.2/labels.txt
model-engine-file=../../models/tao_pretrained_models/peopleNet/V2.3.2/resnet34_peoplenet_pruned_int8_v2_3_2_quantized.etlt_b2_dla0_int8.engine
int8-calib-file=../../models/tao_pretrained_models/peopleNet/V2.3.2/resnet34_peoplenet_pruned_int8_v2_3_2_quantized.txt
infer-dims=3;544;960
uff-input-blob-name=input_1
batch-size=2
process-mode=1
model-color-format=0
network-mode=1
num-detected-classes=3
cluster-mode=2
interval=0
gie-unique-id=1
output-blob-names=output_bbox/BiasAdd;output_cov/Sigmoid

… when deepstream-app start and no DLA.engine is present then it reads PeopleNet model.file: resnet34_peoplenet_pruned_int8_v2_3_2_quantized.etlt

but when reading tlt-encoded-model for PeopleNet following warnings are got, … so it seems that PeopleNet networks layers are not supported by DLA.

ERROR: Deserialize engine failed because file path: /opt/nvidia/deepstream/deepstream-6.0/samples/configs/tao_pretrained_models/../../models/tao_pretrained_models/peopleNet/V2.3.2/resnet34_peoplenet_pruned_int8_v2_3_2_quantized.etlt_b2_dla0_int8.engine open error
WARNING: [TRT]: Default DLA is enabled but layer output_bbox/bias is not supported on DLA, falling back to GPU.
WARNING: [TRT]: Default DLA is enabled but layer conv1/kernel is not supported on DLA, falling back to GPU.
WARNING: [TRT]: Default DLA is enabled but layer conv1/bias is not supported on DLA, falling back to GPU.
WARNING: [TRT]: Default DLA is enabled but layer bn_conv1/moving_variance is not supported on DLA, falling back to GPU.
WARNING: [TRT]: Default DLA is enabled but layer bn_conv1/Reshape_1/shape is not supported on DLA, falling back to GPU.
WARNING: [TRT]: Default DLA is enabled but layer bn_conv1/batchnorm/add/y is not supported on DLA, falling back to GPU.
WARNING: [TRT]: Default DLA is enabled but layer bn_conv1/gamma is not supported on DLA, falling back to GPU.
WARNING: [TRT]: Default DLA is enabled but layer bn_conv1/Reshape_3/shape is not supported on DLA, falling back to GPU.
WARNING: [TRT]: Default DLA is enabled but layer bn_conv1/beta is not supported on DLA, falling back to GPU.
WARNING: [TRT]: Default DLA is enabled but layer bn_conv1/Reshape_2/shape is not supported on DLA, falling back to GPU.
WARNING: [TRT]: Default DLA is enabled but layer bn_conv1/moving_mean is not supported on DLA, falling back to GPU.
WARNING: [TRT]: Default DLA is enabled but layer bn_conv1/Reshape/shape is not supported on DLA, falling back to GPU.
WARNING: [TRT]: Default DLA is enabled but layer block_1a_conv_1/kernel is not supported on DLA, falling back to GPU.
WARNING: [TRT]: Default DLA is enabled but layer block_1a_conv_1/bias is not supported on DLA, falling back to GPU.

So question:

-) Is it correct that PeopleNet can work only with GPU and will not work on DLA because resNet34 convolutional layers like bn_conv1/xxxxx are not supported on DLA ?

In any case:

-) Can we use 1 single network model like PeopleNet ver.2.3.2 running contemporary on GPU and DLA0 and/or DLA1 ?
how can we declare contemporary use of GPU and DLA ?

-) Or can we use only 1 network model like PeopleNet on GPU and another separated network model like DashCarNet on DLA ?
so we should have two completely different deepstream_Config_file.txt + config_infer_file.txt ?
1 for PeopleNet and 1 for DashCarNet ?

Last thing: when deepstream-app is running using DLA.engine total amount of FPS are about 25 FPS … like shown here below
instead when using GPU.engine it is about 145 FPS

**PERF:  4.68 (3.17)    4.68 (3.18)     4.68 (3.18)     4.68 (3.14)     4.67 (3.15)
**PERF:  4.68 (3.22)    4.68 (3.24)     4.68 (3.30)     4.68 (3.26)     4.68 (3.27)
**PERF:  4.68 (3.33)    4.68 (3.35)     4.68 (3.40)     4.68 (3.37)     4.68 (3.32)
**PERF:  4.68 (3.43)    4.68 (3.44)     4.68 (3.44)     4.68 (3.40)     4.68 (3.42)
**PERF:  4.68 (3.52)    4.68 (3.47)     4.68 (3.52)     4.68 (3.49)     4.68 (3.50)
**PERF:  4.68 (3.54)    4.68 (3.55)     4.68 (3.60)     4.68 (3.57)     4.68 (3.53)
**PERF:  4.67 (3.61)    4.67 (3.63)     4.67 (3.62)     4.67 (3.59)     4.67 (3.60)

Question:

-) Is it correct DLA performaces/FPS  are much lower than GPU perfomormance/FPS  ?

Thanks for support,
M.

Fiona.Chen · February 1, 2023, 5:35am

Can you refer to DeepStream 5.0?

Fiona.Chen · February 1, 2023, 5:44am

Some layers are not supported by DLA does not mean the whole model does not run on DLA. In the case you post, some layers run on GPU while the other layers run on DLA. Developer Guide :: NVIDIA Deep Learning TensorRT Documentation

Fiona.Chen · February 1, 2023, 5:50am

No matter where you want to run the models on, there is always a seperated nvinfer configuration file per model. The deepstream-app configuration file depends on how you want to construct the DeepStream pipeline, it has nothing to do with where you want to run the models on.

mgalimberti · February 8, 2023, 6:09pm

Hi Fiona,
Thanks for suggestion, … I tried to follow them and something came out …
but not yet got the final goal of having output file.mp4 FROM DLA the with detection/tracking … as got from GPU.

It could be I am not correctly interpret Deepstream/documentation ( that is a little confused ) … so here below I attached following files that you can download and try in your NX-DevCard

So if you look and run
run_peoplenet_V2_3_2__GPU__DLA.sh
you see there is following very simple command
deepstream-app -c deepstream_app_source1_peoplenet_V2_3__2__gpu.txt -c deepstream_app_source1_peoplenet_V2_3__dla.txt

Running it I can get DLA working like shown by DS suggested command …
cat /sys/devices/platform/host1x/15880000.nvdla0/power/runtime_status
active

and I can see “deepstream-app” generating interesting logs like below

NvMMLiteBlockCreate : Block : BlockType = 261
NVMEDIA: Reading vendor.tegra.display-size : status: 6
NvMMLiteBlockCreate : Block : BlockType = 261
** INFO: <bus_callback:180>: Pipeline running

PERF(0): 27.99 (26.26) 28.02 (24.98) 28.02 (25.02) 27.99 (25.67) 27.99 (25.85)
PERF(1): 4.99 (4.67) 4.98 (5.38) 4.98 (4.80) 4.98 (4.87) 4.98 (4.95)
PERF(0): 27.39 (26.39) 27.41 (25.51) 27.39 (25.54) 27.41 (25.91) 27.38 (26.06)
PERF(1): 4.98 (4.75) 4.99 (5.29) 4.99 (4.84) 4.99 (4.90) 4.98 (4.96)

and from 5 channel processed in GPU I can get very good output files.mp4 having correct detection/tracking like shown in
“PeopleNet_channel_1__good.PNG”

but from DLA pipeline … I got nothing.

This lacking could be caused by incomplete pipeline_DLA like shown in
pipeline_GPU.png
pipeline_DLA.png
so pipeline_DLA ends with Demuxer … and no sink_to_file arms are present
instead pipeline_GPU ends with Demuxer and 5 sink_to_file arms are present

What am I doing wrong ?
Are below attached GPU and DLA files correct ?
If not … could change them and let me know …

thank you very much for your support
Maurizio

run_peoplenet_V2_3_2__GPU__DLA.sh (457 Bytes)
deepstream_app_source1_peoplenet_V2_3__2__gpu.txt (5.7 KB)
config_infer_primary_peoplenet_V2_3__2__gpu.txt (2.6 KB)
deepstream_app_source1_peoplenet_V2_3__dla.txt (5.1 KB)
config_infer_primary_peoplenet_V2_3_2__dla.txt (2.5 KB)

Pipeline_DLA_incolplete

Fiona.Chen · February 9, 2023, 5:03am

In deepstream_app_source1_peoplenet_V2_3__dla.txt file, you input 5 streams and these sources will be named as “source 0”, “source 1” , …“source 4” automatically because deepstrea-app tool does not take the number in the brackets [sourceX] configuration(please read the source code of deepstream-app). So you will get nothing output with the “source-id=5” in [sink5]. You should set “source-id=0” in [sink5], “source-id=1” in [sink6], …

mgalimberti · February 9, 2023, 5:50pm

Wow … declaring “source-id=0” in [sink5], “source-id=1” in [sink6], …
it works … and outputFiles.mp4 coming from DLA were correctly created.
very good.
many thanks.

I would have other questions, but for the moment it is enough

Only one note, I was obliged to declare [source5] [source6] …
because if using again [source0] [source1] … I got deepstream-app blocking on error saying that context "[source0] was already present … "

So probably I have simply to call each [sourceContext] different from the others and remember that deepstream-app assign automatically always 0,1,2,3,4, …

thanks again,
Maurizio

system · February 23, 2023, 5:51pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

ramc · April 26, 2023, 2:54pm

@mgalimberti
Also check out the DLA github page for samples and resources: Recipes and tools for running deep learning workloads on NVIDIA DLA cores for inference applications.

We have a FAQ page that addresses some common questions that we see developers run into: Deep-Learning-Accelerator-SW/FAQ

Topic		Replies	Views
Unable to use DLA cores in nvinfer DeepStream SDK	9	911	October 12, 2021
Testing DLA using object detection model and deepstream DeepStream SDK dla	5	61	September 3, 2024
Using DLA for Peoplenet tlt model in Deepstream DeepStream SDK jetson-inference , dla	2	1168	October 12, 2021
Some question about using dual DLA of jetson xavier nx DeepStream SDK	8	2235	October 12, 2021
Deepstream-Yolo doesn't use DLA DeepStream SDK	20	610	June 25, 2024
Excuse me, in the new deepstream, I added DLA support in the configuration file of Infer, but it hasn't been implemented yet DeepStream SDK jetson , deepstream	10	48	March 10, 2025
Use GPU+2xDLA in deepstream app config DeepStream SDK	5	541	October 12, 2021
Deepstream-test1 does not work with dla DeepStream SDK	6	307	June 25, 2024
slower when change DefaultDeviceType from GPU to DLA? Jetson AGX Xavier	3	661	October 18, 2021
Running deepstream object detection with DLA on jetson DeepStream SDK dla , jetson	3	88	August 8, 2024

Trying using contemporary DLA and GPU on Jetson NX

Related topics