Trying using contemporary DLA and GPU on Jetson NX

Trying using contemporary DLA and GPU on Jetson NX

**• Hardware Platform = Jetson Xavier NX **
• DeepStream Version = DS-6.0.1
**• JetPack Version = JP-4.2 **
• TensorRT Version = 8.2.1.8

**• Issue Type: not clear if we can use GPU + DLA0 + DLA1 from same process **

• Requirement details

We are trying to run usual Nvidia PeopleNet ver. 2.3.2 on Nvidia NX dev board using contemporary GPU + DLA.

For this we are using deepsteam-app with usual deepstream_Config_file.txt + config_infer_file.txt

Consider that we are already using PeopleNet ver. 2.3.2 on NX in our application and it is working well and correctly using GPU, but when trying to activate DLA we see strange behaviour and performances goes down at least 6 times and no video output is got.

In details: like shown below we declared DLA activation using below config_infer_file [property] context, (from DS documentaiotn is not clear where to decalred DLA activation ) … and we see that a new DLA.engine is created completely different from GPU.engine

Question: is it correct to declare DLA in [property] context ?

[property]
gpu-id=0
net-scale-factor=0.0039215697906911373
tlt-model-key=tlt_encode
#
enable-dla=1           <<<<<<<<<<<<<<<<<<
use-dla-core=0         <<<<<<<<<<<<<<<<<<
#
tlt-encoded-model=../../models/tao_pretrained_models/peopleNet/V2.3.2/resnet34_peoplenet_pruned_int8_v2_3_2_quantized.etlt
labelfile-path=../../models/tao_pretrained_models/peopleNet/V2.3.2/labels.txt
model-engine-file=../../models/tao_pretrained_models/peopleNet/V2.3.2/resnet34_peoplenet_pruned_int8_v2_3_2_quantized.etlt_b2_dla0_int8.engine
int8-calib-file=../../models/tao_pretrained_models/peopleNet/V2.3.2/resnet34_peoplenet_pruned_int8_v2_3_2_quantized.txt
infer-dims=3;544;960
uff-input-blob-name=input_1
batch-size=2
process-mode=1
model-color-format=0
network-mode=1
num-detected-classes=3
cluster-mode=2
interval=0
gie-unique-id=1
output-blob-names=output_bbox/BiasAdd;output_cov/Sigmoid

… when deepstream-app start and no DLA.engine is present then it reads PeopleNet model.file: resnet34_peoplenet_pruned_int8_v2_3_2_quantized.etlt

but when reading tlt-encoded-model for PeopleNet following warnings are got, … so it seems that PeopleNet networks layers are not supported by DLA.

ERROR: Deserialize engine failed because file path: /opt/nvidia/deepstream/deepstream-6.0/samples/configs/tao_pretrained_models/../../models/tao_pretrained_models/peopleNet/V2.3.2/resnet34_peoplenet_pruned_int8_v2_3_2_quantized.etlt_b2_dla0_int8.engine open error
WARNING: [TRT]: Default DLA is enabled but layer output_bbox/bias is not supported on DLA, falling back to GPU.
WARNING: [TRT]: Default DLA is enabled but layer conv1/kernel is not supported on DLA, falling back to GPU.
WARNING: [TRT]: Default DLA is enabled but layer conv1/bias is not supported on DLA, falling back to GPU.
WARNING: [TRT]: Default DLA is enabled but layer bn_conv1/moving_variance is not supported on DLA, falling back to GPU.
WARNING: [TRT]: Default DLA is enabled but layer bn_conv1/Reshape_1/shape is not supported on DLA, falling back to GPU.
WARNING: [TRT]: Default DLA is enabled but layer bn_conv1/batchnorm/add/y is not supported on DLA, falling back to GPU.
WARNING: [TRT]: Default DLA is enabled but layer bn_conv1/gamma is not supported on DLA, falling back to GPU.
WARNING: [TRT]: Default DLA is enabled but layer bn_conv1/Reshape_3/shape is not supported on DLA, falling back to GPU.
WARNING: [TRT]: Default DLA is enabled but layer bn_conv1/beta is not supported on DLA, falling back to GPU.
WARNING: [TRT]: Default DLA is enabled but layer bn_conv1/Reshape_2/shape is not supported on DLA, falling back to GPU.
WARNING: [TRT]: Default DLA is enabled but layer bn_conv1/moving_mean is not supported on DLA, falling back to GPU.
WARNING: [TRT]: Default DLA is enabled but layer bn_conv1/Reshape/shape is not supported on DLA, falling back to GPU.
WARNING: [TRT]: Default DLA is enabled but layer block_1a_conv_1/kernel is not supported on DLA, falling back to GPU.
WARNING: [TRT]: Default DLA is enabled but layer block_1a_conv_1/bias is not supported on DLA, falling back to GPU.

So question:

-) Is it correct that PeopleNet can work only with GPU and will not work on DLA because resNet34 convolutional layers like bn_conv1/xxxxx are not supported on DLA ?

In any case:

-) Can we use 1 single network model like PeopleNet ver.2.3.2 running contemporary on GPU and DLA0 and/or DLA1 ?
how can we declare contemporary use of GPU and DLA ?

-) Or can we use only 1 network model like PeopleNet on GPU and another separated network model like DashCarNet on DLA ?
so we should have two completely different deepstream_Config_file.txt + config_infer_file.txt ?
1 for PeopleNet and 1 for DashCarNet ?

Last thing: when deepstream-app is running using DLA.engine total amount of FPS are about 25 FPS … like shown here below
instead when using GPU.engine it is about 145 FPS

**PERF:  4.68 (3.17)    4.68 (3.18)     4.68 (3.18)     4.68 (3.14)     4.67 (3.15)
**PERF:  4.68 (3.22)    4.68 (3.24)     4.68 (3.30)     4.68 (3.26)     4.68 (3.27)
**PERF:  4.68 (3.33)    4.68 (3.35)     4.68 (3.40)     4.68 (3.37)     4.68 (3.32)
**PERF:  4.68 (3.43)    4.68 (3.44)     4.68 (3.44)     4.68 (3.40)     4.68 (3.42)
**PERF:  4.68 (3.52)    4.68 (3.47)     4.68 (3.52)     4.68 (3.49)     4.68 (3.50)
**PERF:  4.68 (3.54)    4.68 (3.55)     4.68 (3.60)     4.68 (3.57)     4.68 (3.53)
**PERF:  4.67 (3.61)    4.67 (3.63)     4.67 (3.62)     4.67 (3.59)     4.67 (3.60)

Question:

-) Is it correct DLA performaces/FPS  are much lower than GPU perfomormance/FPS  ?  

Thanks for support,
M.

Can you refer to DeepStream 5.0?

Some layers are not supported by DLA does not mean the whole model does not run on DLA. In the case you post, some layers run on GPU while the other layers run on DLA. Developer Guide :: NVIDIA Deep Learning TensorRT Documentation

No matter where you want to run the models on, there is always a seperated nvinfer configuration file per model. The deepstream-app configuration file depends on how you want to construct the DeepStream pipeline, it has nothing to do with where you want to run the models on.

Hi Fiona,
Thanks for suggestion, … I tried to follow them and something came out …
but not yet got the final goal of having output file.mp4 FROM DLA the with detection/tracking … as got from GPU.

It could be I am not correctly interpret Deepstream/documentation ( that is a little confused ) … so here below I attached following files that you can download and try in your NX-DevCard

So if you look and run
run_peoplenet_V2_3_2__GPU__DLA.sh
you see there is following very simple command
deepstream-app -c deepstream_app_source1_peoplenet_V2_3__2__gpu.txt -c deepstream_app_source1_peoplenet_V2_3__dla.txt

Running it I can get DLA working like shown by DS suggested command …
cat /sys/devices/platform/host1x/15880000.nvdla0/power/runtime_status
active

and I can see “deepstream-app” generating interesting logs like below

NvMMLiteBlockCreate : Block : BlockType = 261
NVMEDIA: Reading vendor.tegra.display-size : status: 6
NvMMLiteBlockCreate : Block : BlockType = 261
** INFO: <bus_callback:180>: Pipeline running

PERF(0): 27.99 (26.26) 28.02 (24.98) 28.02 (25.02) 27.99 (25.67) 27.99 (25.85)
PERF(1): 4.99 (4.67) 4.98 (5.38) 4.98 (4.80) 4.98 (4.87) 4.98 (4.95)
PERF(0): 27.39 (26.39) 27.41 (25.51) 27.39 (25.54) 27.41 (25.91) 27.38 (26.06)
PERF(1): 4.98 (4.75) 4.99 (5.29) 4.99 (4.84) 4.99 (4.90) 4.98 (4.96)

and from 5 channel processed in GPU I can get very good output files.mp4 having correct detection/tracking like shown in
“PeopleNet_channel_1__good.PNG”

but from DLA pipeline … I got nothing.

This lacking could be caused by incomplete pipeline_DLA like shown in
pipeline_GPU.png
pipeline_DLA.png
so pipeline_DLA ends with Demuxer … and no sink_to_file arms are present
instead pipeline_GPU ends with Demuxer and 5 sink_to_file arms are present

What am I doing wrong ?
Are below attached GPU and DLA files correct ?
If not … could change them and let me know …

thank you very much for your support
Maurizio

run_peoplenet_V2_3_2__GPU__DLA.sh (457 Bytes)
deepstream_app_source1_peoplenet_V2_3__2__gpu.txt (5.7 KB)
config_infer_primary_peoplenet_V2_3__2__gpu.txt (2.6 KB)
deepstream_app_source1_peoplenet_V2_3__dla.txt (5.1 KB)
config_infer_primary_peoplenet_V2_3_2__dla.txt (2.5 KB)


Pipeline_DLA_incolplete

In deepstream_app_source1_peoplenet_V2_3__dla.txt file, you input 5 streams and these sources will be named as “source 0”, “source 1” , …“source 4” automatically because deepstrea-app tool does not take the number in the brackets [sourceX] configuration(please read the source code of deepstream-app). So you will get nothing output with the “source-id=5” in [sink5]. You should set “source-id=0” in [sink5], “source-id=1” in [sink6], …

Wow … declaring “source-id=0” in [sink5], “source-id=1” in [sink6], …
it works … and outputFiles.mp4 coming from DLA were correctly created.
very good.
many thanks.

I would have other questions, but for the moment it is enough

Only one note, I was obliged to declare [source5] [source6] …
because if using again [source0] [source1] … I got deepstream-app blocking on error saying that context "[source0] was already present … "

So probably I have simply to call each [sourceContext] different from the others and remember that deepstream-app assign automatically always 0,1,2,3,4, …

thanks again,
Maurizio

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

@mgalimberti
Also check out the DLA github page for samples and resources: Recipes and tools for running deep learning workloads on NVIDIA DLA cores for inference applications.

We have a FAQ page that addresses some common questions that we see developers run into: Deep-Learning-Accelerator-SW/FAQ