Jetson AGX Orin,how to use DLA for yolov2_tiny

1.use trtexec test yolov2_tiny
/usr/src/tensorrt/bin/trtexec --loadEngine=yolov2_tiny_b32_gpu0_int8.engine --useDLACore=0 --int8
****Throughput: 350.678 qps
2.use deepstream-app test yolov2_tiny
run deepstream-app -c deepstream_app_config_yoloV2_tiny.txt
enable-dla=1
use-dla-core=0
batch-size=1


gst_nvinfer_parse_props
Unknown or legacy key specified ‘is-classifier’ for group [property]
Warn: ‘threshold’ parameter has been deprecated. Use ‘pre-cluster-threshold’ instead.
gstnvtracker: Loading low-level lib at /opt/nvidia/deepstream/deepstream/lib/libnvds_nvmultiobjecttracker.so
gstnvtracker: Batch processing is ON
gstnvtracker: Past frame output is ON
[NvMultiObjectTracker] Initialized
hubin trace generateBackendContext : 1991
Now entering deserializeEngineAndBackend, loading /home/wd/deepstream-6.1/sources/objectDetector_Yolo/yolov2_tiny_b32_gpu0_int8.engine

WARNING: [TRT]: Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors.
0:00:05.788872488 2356797 0x107bd80 INFO nvinfer gstnvinfer.cpp:646:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::deserializeEngineAndBackend() <nvdsinfer_context_impl.cpp:1905> [UID = 1]: deserialized trt engine from :/home/wd/deepstream-6.1/sources/objectDetector_Yolo/yolov2_tiny_b32_gpu0_int8.engine
INFO: [Implicit Engine Info]: layers num: 2
0 INPUT kFLOAT data 3x416x416
1 OUTPUT kFLOAT region_16 425x13x13

hubin trace generateBackendContext : 2009
0:00:06.050293970 2356797 0x107bd80 INFO nvinfer gstnvinfer.cpp:646:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::generateBackendContext() <nvdsinfer_context_impl.cpp:2010> [UID = 1]: Use deserialized engine model: /home/wd/deepstream-6.1/sources/objectDetector_Yolo/yolov2_tiny_b32_gpu0_int8.engine
0:00:06.065220614 2356797 0x107bd80 INFO nvinfer gstnvinfer_impl.cpp:328:notifyLoadModelStatus:<primary_gie> [UID 1]: Load new model:/home/wd/deepstream-6.1/sources/objectDetector_Yolo/config_infer_primary_yoloV2_tiny.txt sucessfully

Runtime commands:
h: Print this help
q: Quit

    p: Pause
    r: Resume

**PERF: FPS 0 (Avg)
**PERF: 0.00 (0.00)
** INFO: <bus_callback:194>: Pipeline ready

Opening in BLOCKING MODE
NvMMLiteOpen : Block : BlockType = 261
NVMEDIA: Reading vendor.tegra.display-size : status: 6
NvMMLiteBlockCreate : Block : BlockType = 261
** INFO: <bus_callback:180>: Pipeline running

**PERF: 58.76 (58.76)
**PERF: 59.02 (58.89)
**PERF: 58.99 (58.86)
**PERF: 58.85 (58.84)
**PERF: 58.90 (58.83)
**PERF: 58.68 (58.86)
**PERF: 58.87 (58.85)


GPU usage:99%

confused!!! why use GPU for infering with small FPS? is anything wrong?

Hi,

Have you also enabled the DLA flag in the YOLOv2 configuration?
Please find the details below:
https://docs.nvidia.com/metropolis/deepstream/dev-guide/text/DS_Quickstart.html#using-dla-for-inference

Thanks.

I have enabled the DLA flage as below:
enable-dla=1
use-dla-core=0
now I run “export CUDA_DEVICE_MAX_CONNECTIONS=1” first,but it does not make sense

Hi,

Could you explain more about the result?
Is it working after setting the environment variable?

Thanks.

after setting the environment variable, the result does not change,just as before
wd@aidget:~/deepstream-6.1/sources/objectDetector_Yolo$ echo $CUDA_DEVICE_MAX_CONNECTIONS
1
wd@aidget:~/deepstream-6.1/sources/objectDetector_Yolo$ deepstream-app -c deepstream_app_config_yoloV2_tiny.txt
hubin hubin hubin

hubin hubin hubin

gst_nvinfer_parse_props
Unknown or legacy key specified ‘is-classifier’ for group [property]
Warn: ‘threshold’ parameter has been deprecated. Use ‘pre-cluster-threshold’ instead.
gstnvtracker: Loading low-level lib at /opt/nvidia/deepstream/deepstream/lib/libnvds_nvmultiobjecttracker.so
gstnvtracker: Batch processing is ON
gstnvtracker: Past frame output is ON
[NvMultiObjectTracker] Initialized
hubin trace generateBackendContext : 1991
Now entering deserializeEngineAndBackend, loading /home/wd/deepstream-6.1/sources/objectDetector_Yolo/yolov2_tiny_b32_gpu0_int8.engine

WARNING: [TRT]: Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors.
0:00:05.939856639 2215895 0x107d380 INFO nvinfer gstnvinfer.cpp:646:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::deserializeEngineAndBackend() <nvdsinfer_context_impl.cpp:1905> [UID = 1]: deserialized trt engine from :/home/wd/deepstream-6.1/sources/objectDetector_Yolo/yolov2_tiny_b32_gpu0_int8.engine
INFO: [Implicit Engine Info]: layers num: 2
0 INPUT kFLOAT data 3x416x416
1 OUTPUT kFLOAT region_16 425x13x13

hubin trace generateBackendContext : 2009
0:00:06.200726151 2215895 0x107d380 INFO nvinfer gstnvinfer.cpp:646:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::generateBackendContext() <nvdsinfer_context_impl.cpp:2010> [UID = 1]: Use deserialized engine model: /home/wd/deepstream-6.1/sources/objectDetector_Yolo/yolov2_tiny_b32_gpu0_int8.engine
0:00:06.249899242 2215895 0x107d380 INFO nvinfer gstnvinfer_impl.cpp:328:notifyLoadModelStatus:<primary_gie> [UID 1]: Load new model:/home/wd/deepstream-6.1/sources/objectDetector_Yolo/config_infer_primary_yoloV2_tiny.txt sucessfully

Runtime commands:
h: Print this help
q: Quit

    p: Pause
    r: Resume

**PERF: FPS 0 (Avg)
**PERF: 0.00 (0.00)
** INFO: <bus_callback:194>: Pipeline ready

Opening in BLOCKING MODE
NvMMLiteOpen : Block : BlockType = 261
NVMEDIA: Reading vendor.tegra.display-size : status: 6
NvMMLiteBlockCreate : Block : BlockType = 261
** INFO: <bus_callback:180>: Pipeline running

**PERF: 56.47 (56.23)
**PERF: 55.11 (55.66)
**PERF: 55.67 (55.64)
**PERF: 55.96 (55.73)
**PERF: 56.13 (55.83)
**PERF: 55.50 (55.72)
**PERF: 55.30 (55.67)

Thanks for your update.

Please noted that there are some functions in the Deepstream that also use GPU.
Could you check if the DLA is active when the pipeline is executed?

$ watch -n 1 cat /sys/devices/platform/host1x/15880000.nvdla0/power/runtime_status

Thanks.

Every 1.0s: cat /sys/devices/platform/host1x/15880000.nvdla0/power/runtime_status aidget: Thu Jul 28 10:50:45 2022

suspended

Thanks for the feedback.

We will give it a try and update more information with you.

Thank you!

Hi,

Could you share how do you create the TensorRT engine?

Please noted that layer placement is decided when building time.
Based on your file name, the engine is created for GPU.

Could you help to confirm this?
Thanks.

Hi,I build TensorRT engine from config_infer_primary_yoloV2_tiny.txt
and deepstream_app_config_yoloV2_tiny.txt
such as:
config_infer_primary_yoloV2_tiny.txt
[property]
gpu-id=0
#enable-dla=1
#use-dla-core=0
net-scale-factor=0.0039215697906911373
#0=RGB, 1=BGR
model-color-format=0
custom-network-config=yolov2-tiny.cfg
model-file=yolov2-tiny.weights
#model-engine-file=model_b1_gpu0_fp32.engine
labelfile-path=labels.txt
##0=FP32, 1=INT8, 2=FP16 mode
network-mode=1
num-detected-classes=80
gie-unique-id=1
network-type=0
is-classifier=0
#1=DBSCAN, 2=NMS, 3= DBSCAN+NMS Hybrid, 4 = None(No clustering)
cluster-mode=2
maintain-aspect-ratio=1
parse-bbox-func-name=NvDsInferParseCustomYoloV2Tiny
custom-lib-path=nvdsinfer_custom_impl_Yolo/libnvdsinfer_custom_impl_Yolo.so
engine-create-func-name=NvDsInferYoloCudaEngineGet
#scaling-filter=0
#scaling-compute-hw=0

deepstream_app_config_yoloV2_tiny.txt
…………
[primary-gie]
enable=1
gpu-id=0
#model-engine-file=yolov2_tiny_b32_gpu0_int8.engine
#model-engine-file=yolov2_tiny_b1_gpu0_int8.engine
#model-engine-file=yolov2_tiny_b1_gpu0_fp16.engine
labelfile-path=labels.txt
batch-size=1
#Required by the app for OSD, not a plugin property
bbox-border-color0=1;0;0;1
bbox-border-color1=0;1;1;1
bbox-border-color2=0;0;1;1
bbox-border-color3=0;1;0;1
interval=2
gie-unique-id=1
nvbuf-memory-type=0
config-file=config_infer_primary_yoloV2_tiny.txt
…………
than run
deepstream-app -c deepstream_app_config_yoloV2_tiny.txt
get engine
model_b1_gpu0_int8.engine

next change config file to use dla as below
enable-dla=1
use-dla-core=0
model-engine-file=model_b1_gpu0_int8.engine
#custom-network-config=yolov2-tiny.cfg
#model-file=yolov2-tiny.weights
run
deepstream-app -c deepstream_app_config_yoloV2_tiny.txt

does it have another way to build dla engine, or shoud change C++ code to use dla to infering?

Hi,

Did you meet any errors when creating the engine file for DLA?
Since the device platform is decided when building time, you should add the DLA configure before generating the engine.

Thanks.

if I set
enable-dla=1
use-dla-core=0
when build engine, it meet errors

Hi,

It should be the problem.

You will need to select the DLA at building time to generate a DLA-based engine.
But since YOLO requires some extra layers, it might not work by default.

Let us check this internally and share the information with you.

Hi,

Here is some information about this issue.

To compile YOLOv2 Tiny with Deepstream, please make the following change:
Update the config_infer_primary_yoloV2_tiny.txt file

diff --git a/config_infer_primary_yoloV2_tiny.txt b/config_infer_primary_yoloV2_tiny.txt
index 60b4a6b..4cc8326 100644
--- a/config_infer_primary_yoloV2_tiny.txt
+++ b/config_infer_primary_yoloV2_tiny.txt
@@ -67,7 +67,9 @@ model-file=yolov2-tiny.weights
 #model-engine-file=yolov2-tiny_b1_gpu0_fp32.engine
 labelfile-path=labels.txt
 ## 0=FP32, 1=INT8, 2=FP16 mode
-network-mode=0
+network-mode=2
+enable-dla=1
+use-dla-core=0
 num-detected-classes=80
 gie-unique-id=1
 network-type=0

However, when compiling the TensorRT engine with DLA, we got the following output log:

...
Total number of yolo layers: 32
Building yolo network complete!
Building the TensorRT Engine...
WARNING: [TRT]: Default DLA is enabled but layer (Unnamed Layer* 31) [PluginV2Ext] is not supported on DLA, falling back to GPU.
WARNING: [TRT]: {ForeignNode[conv_1...conv_15]} cannot be compiled by DLA, fallback to GPU
Building complete!
...

The log indicates all the YOLO layers are fallback to the GPU since the layers are not supported by DLA.
So the model can only run on GPU and will occupy the GPU resources.

Thanks.

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

@youchenhit
Also check out the DLA github page for samples and resources or to report issues: Recipes and tools for running deep learning workloads on NVIDIA DLA cores for inference applications.

We have a FAQ page that addresses some common questions that we see developers run into: Deep-Learning-Accelerator-SW/FAQ