Jetson AGX Orin，how to use DLA for yolov2_tiny

youchenhit · July 25, 2022, 2:21am

1.use trtexec test yolov2_tiny
/usr/src/tensorrt/bin/trtexec --loadEngine=yolov2_tiny_b32_gpu0_int8.engine --useDLACore=0 --int8
****Throughput: 350.678 qps
2.use deepstream-app test yolov2_tiny
run deepstream-app -c deepstream_app_config_yoloV2_tiny.txt
enable-dla=1
use-dla-core=0
batch-size=1

gst_nvinfer_parse_props
Unknown or legacy key specified ‘is-classifier’ for group [property]
Warn: ‘threshold’ parameter has been deprecated. Use ‘pre-cluster-threshold’ instead.
gstnvtracker: Loading low-level lib at /opt/nvidia/deepstream/deepstream/lib/libnvds_nvmultiobjecttracker.so
gstnvtracker: Batch processing is ON
gstnvtracker: Past frame output is ON
[NvMultiObjectTracker] Initialized
hubin trace generateBackendContext : 1991
Now entering deserializeEngineAndBackend, loading /home/wd/deepstream-6.1/sources/objectDetector_Yolo/yolov2_tiny_b32_gpu0_int8.engine

WARNING: [TRT]: Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors.
0:00:05.788872488 2356797 0x107bd80 INFO nvinfer gstnvinfer.cpp:646:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::deserializeEngineAndBackend() <nvdsinfer_context_impl.cpp:1905> [UID = 1]: deserialized trt engine from :/home/wd/deepstream-6.1/sources/objectDetector_Yolo/yolov2_tiny_b32_gpu0_int8.engine
INFO: [Implicit Engine Info]: layers num: 2
0 INPUT kFLOAT data 3x416x416
1 OUTPUT kFLOAT region_16 425x13x13

hubin trace generateBackendContext : 2009
0:00:06.050293970 2356797 0x107bd80 INFO nvinfer gstnvinfer.cpp:646:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::generateBackendContext() <nvdsinfer_context_impl.cpp:2010> [UID = 1]: Use deserialized engine model: /home/wd/deepstream-6.1/sources/objectDetector_Yolo/yolov2_tiny_b32_gpu0_int8.engine
0:00:06.065220614 2356797 0x107bd80 INFO nvinfer gstnvinfer_impl.cpp:328:notifyLoadModelStatus:<primary_gie> [UID 1]: Load new model:/home/wd/deepstream-6.1/sources/objectDetector_Yolo/config_infer_primary_yoloV2_tiny.txt sucessfully

Runtime commands:
h: Print this help
q: Quit

    p: Pause
    r: Resume

**PERF: FPS 0 (Avg)
**PERF: 0.00 (0.00)
** INFO: <bus_callback:194>: Pipeline ready

Opening in BLOCKING MODE
NvMMLiteOpen : Block : BlockType = 261
NVMEDIA: Reading vendor.tegra.display-size : status: 6
NvMMLiteBlockCreate : Block : BlockType = 261
** INFO: <bus_callback:180>: Pipeline running

**PERF: 58.76 (58.76)
**PERF: 59.02 (58.89)
**PERF: 58.99 (58.86)
**PERF: 58.85 (58.84)
**PERF: 58.90 (58.83)
**PERF: 58.68 (58.86)
**PERF: 58.87 (58.85)

GPU usage：99%

confused!!! why use GPU for infering with small FPS? is anything wrong?

AastaLLL · July 25, 2022, 3:21am

Hi,

Have you also enabled the DLA flag in the YOLOv2 configuration?
Please find the details below:
https://docs.nvidia.com/metropolis/deepstream/dev-guide/text/DS_Quickstart.html#using-dla-for-inference

Thanks.

youchenhit · July 25, 2022, 7:04am

I have enabled the DLA flage as below:
enable-dla=1
use-dla-core=0
now I run “export CUDA_DEVICE_MAX_CONNECTIONS=1” first，but it does not make sense

AastaLLL · July 26, 2022, 3:50am

Hi,

Could you explain more about the result?
Is it working after setting the environment variable?

Thanks.

youchenhit · July 26, 2022, 5:38am

after setting the environment variable， the result does not change，just as before
wd@aidget:~/deepstream-6.1/sources/objectDetector_Yolo$ echo $CUDA_DEVICE_MAX_CONNECTIONS
1
wd@aidget:~/deepstream-6.1/sources/objectDetector_Yolo$ deepstream-app -c deepstream_app_config_yoloV2_tiny.txt
hubin hubin hubin

hubin hubin hubin

gst_nvinfer_parse_props
Unknown or legacy key specified ‘is-classifier’ for group [property]
Warn: ‘threshold’ parameter has been deprecated. Use ‘pre-cluster-threshold’ instead.
gstnvtracker: Loading low-level lib at /opt/nvidia/deepstream/deepstream/lib/libnvds_nvmultiobjecttracker.so
gstnvtracker: Batch processing is ON
gstnvtracker: Past frame output is ON
[NvMultiObjectTracker] Initialized
hubin trace generateBackendContext : 1991
Now entering deserializeEngineAndBackend, loading /home/wd/deepstream-6.1/sources/objectDetector_Yolo/yolov2_tiny_b32_gpu0_int8.engine

WARNING: [TRT]: Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors.
0:00:05.939856639 2215895 0x107d380 INFO nvinfer gstnvinfer.cpp:646:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::deserializeEngineAndBackend() <nvdsinfer_context_impl.cpp:1905> [UID = 1]: deserialized trt engine from :/home/wd/deepstream-6.1/sources/objectDetector_Yolo/yolov2_tiny_b32_gpu0_int8.engine
INFO: [Implicit Engine Info]: layers num: 2
0 INPUT kFLOAT data 3x416x416
1 OUTPUT kFLOAT region_16 425x13x13

hubin trace generateBackendContext : 2009
0:00:06.200726151 2215895 0x107d380 INFO nvinfer gstnvinfer.cpp:646:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::generateBackendContext() <nvdsinfer_context_impl.cpp:2010> [UID = 1]: Use deserialized engine model: /home/wd/deepstream-6.1/sources/objectDetector_Yolo/yolov2_tiny_b32_gpu0_int8.engine
0:00:06.249899242 2215895 0x107d380 INFO nvinfer gstnvinfer_impl.cpp:328:notifyLoadModelStatus:<primary_gie> [UID 1]: Load new model:/home/wd/deepstream-6.1/sources/objectDetector_Yolo/config_infer_primary_yoloV2_tiny.txt sucessfully

Runtime commands:
h: Print this help
q: Quit

    p: Pause
    r: Resume

**PERF: FPS 0 (Avg)
**PERF: 0.00 (0.00)
** INFO: <bus_callback:194>: Pipeline ready

Opening in BLOCKING MODE
NvMMLiteOpen : Block : BlockType = 261
NVMEDIA: Reading vendor.tegra.display-size : status: 6
NvMMLiteBlockCreate : Block : BlockType = 261
** INFO: <bus_callback:180>: Pipeline running

**PERF: 56.47 (56.23)
**PERF: 55.11 (55.66)
**PERF: 55.67 (55.64)
**PERF: 55.96 (55.73)
**PERF: 56.13 (55.83)
**PERF: 55.50 (55.72)
**PERF: 55.30 (55.67)

AastaLLL · July 27, 2022, 9:00am

Thanks for your update.

Please noted that there are some functions in the Deepstream that also use GPU.
Could you check if the DLA is active when the pipeline is executed?

$ watch -n 1 cat /sys/devices/platform/host1x/15880000.nvdla0/power/runtime_status

Thanks.

youchenhit · July 28, 2022, 2:52am

Every 1.0s: cat /sys/devices/platform/host1x/15880000.nvdla0/power/runtime_status aidget: Thu Jul 28 10:50:45 2022

suspended

AastaLLL · July 28, 2022, 8:13am

Thanks for the feedback.

We will give it a try and update more information with you.

youchenhit · July 29, 2022, 1:33am

Thank you！

AastaLLL · August 1, 2022, 5:28am

Hi,

Could you share how do you create the TensorRT engine?

Please noted that layer placement is decided when building time.
Based on your file name, the engine is created for GPU.

Could you help to confirm this?
Thanks.

youchenhit · August 1, 2022, 6:18am

Hi，I build TensorRT engine from config_infer_primary_yoloV2_tiny.txt
and deepstream_app_config_yoloV2_tiny.txt
such as:
config_infer_primary_yoloV2_tiny.txt
[property]
gpu-id=0
#enable-dla=1
#use-dla-core=0
net-scale-factor=0.0039215697906911373
#0=RGB, 1=BGR
model-color-format=0
custom-network-config=yolov2-tiny.cfg
model-file=yolov2-tiny.weights
#model-engine-file=model_b1_gpu0_fp32.engine
labelfile-path=labels.txt
##0=FP32, 1=INT8, 2=FP16 mode
network-mode=1
num-detected-classes=80
gie-unique-id=1
network-type=0
is-classifier=0
#1=DBSCAN, 2=NMS, 3= DBSCAN+NMS Hybrid, 4 = None(No clustering)
cluster-mode=2
maintain-aspect-ratio=1
parse-bbox-func-name=NvDsInferParseCustomYoloV2Tiny
custom-lib-path=nvdsinfer_custom_impl_Yolo/libnvdsinfer_custom_impl_Yolo.so
engine-create-func-name=NvDsInferYoloCudaEngineGet
#scaling-filter=0
#scaling-compute-hw=0

deepstream_app_config_yoloV2_tiny.txt
…………
[primary-gie]
enable=1
gpu-id=0
#model-engine-file=yolov2_tiny_b32_gpu0_int8.engine
#model-engine-file=yolov2_tiny_b1_gpu0_int8.engine
#model-engine-file=yolov2_tiny_b1_gpu0_fp16.engine
labelfile-path=labels.txt
batch-size=1
#Required by the app for OSD, not a plugin property
bbox-border-color0=1;0;0;1
bbox-border-color1=0;1;1;1
bbox-border-color2=0;0;1;1
bbox-border-color3=0;1;0;1
interval=2
gie-unique-id=1
nvbuf-memory-type=0
config-file=config_infer_primary_yoloV2_tiny.txt
…………
than run
deepstream-app -c deepstream_app_config_yoloV2_tiny.txt
get engine
model_b1_gpu0_int8.engine

next change config file to use dla as below
enable-dla=1
use-dla-core=0
model-engine-file=model_b1_gpu0_int8.engine
#custom-network-config=yolov2-tiny.cfg
#model-file=yolov2-tiny.weights
run
deepstream-app -c deepstream_app_config_yoloV2_tiny.txt

does it have another way to build dla engine, or shoud change C++ code to use dla to infering?

AastaLLL · August 3, 2022, 6:32am

Hi,

Did you meet any errors when creating the engine file for DLA?
Since the device platform is decided when building time, you should add the DLA configure before generating the engine.

Thanks.

youchenhit · August 5, 2022, 1:40am

if I set
enable-dla=1
use-dla-core=0
when build engine, it meet errors

AastaLLL · August 5, 2022, 5:36am

Hi,

It should be the problem.

You will need to select the DLA at building time to generate a DLA-based engine.
But since YOLO requires some extra layers, it might not work by default.

Let us check this internally and share the information with you.

AastaLLL · August 8, 2022, 6:45am

Hi,

Here is some information about this issue.

To compile YOLOv2 Tiny with Deepstream, please make the following change:
Update the config_infer_primary_yoloV2_tiny.txt file

diff --git a/config_infer_primary_yoloV2_tiny.txt b/config_infer_primary_yoloV2_tiny.txt
index 60b4a6b..4cc8326 100644
--- a/config_infer_primary_yoloV2_tiny.txt
+++ b/config_infer_primary_yoloV2_tiny.txt
@@ -67,7 +67,9 @@ model-file=yolov2-tiny.weights
 #model-engine-file=yolov2-tiny_b1_gpu0_fp32.engine
 labelfile-path=labels.txt
 ## 0=FP32, 1=INT8, 2=FP16 mode
-network-mode=0
+network-mode=2
+enable-dla=1
+use-dla-core=0
 num-detected-classes=80
 gie-unique-id=1
 network-type=0

However, when compiling the TensorRT engine with DLA, we got the following output log:

...
Total number of yolo layers: 32
Building yolo network complete!
Building the TensorRT Engine...
WARNING: [TRT]: Default DLA is enabled but layer (Unnamed Layer* 31) [PluginV2Ext] is not supported on DLA, falling back to GPU.
WARNING: [TRT]: {ForeignNode[conv_1...conv_15]} cannot be compiled by DLA, fallback to GPU
Building complete!
...

The log indicates all the YOLO layers are fallback to the GPU since the layers are not supported by DLA.
So the model can only run on GPU and will occupy the GPU resources.

Thanks.

system · August 31, 2022, 1:57am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

ramc · April 27, 2023, 2:07am

@youchenhit
Also check out the DLA github page for samples and resources or to report issues: Recipes and tools for running deep learning workloads on NVIDIA DLA cores for inference applications.

We have a FAQ page that addresses some common questions that we see developers run into: Deep-Learning-Accelerator-SW/FAQ

Topic		Replies	Views
Deepstream-Yolo doesn't use DLA DeepStream SDK	20	888	June 25, 2024
DLA performance DeepStream SDK	17	326	September 23, 2024
How to use DLA in deepstream-yolov5 DeepStream SDK	27	3555	October 12, 2021
Excuse me, in the new deepstream, I added DLA support in the configuration file of Infer, but it hasn't been implemented yet DeepStream SDK jetson , deepstream	10	174	March 10, 2025
Tensorrt Python API has a bug in DLA usage Jetson AGX Xavier tensorrt	11	750	August 17, 2022
Trying using contemporary DLA and GPU on Jetson NX DeepStream SDK dla	8	744	April 26, 2023
slower when change DefaultDeviceType from GPU to DLA? Jetson AGX Xavier	3	713	October 18, 2021
Deepstream-test1 does not work with dla DeepStream SDK	6	399	June 25, 2024
Testing DLA using object detection model and deepstream DeepStream SDK dla	5	138	September 3, 2024
Unable to use DLA cores in nvinfer DeepStream SDK	9	1007	October 12, 2021

Jetson AGX Orin，how to use DLA for yolov2_tiny

Related topics