1.use trtexec test yolov2_tiny
/usr/src/tensorrt/bin/trtexec --loadEngine=yolov2_tiny_b32_gpu0_int8.engine --useDLACore=0 --int8
****Throughput: 350.678 qps
2.use deepstream-app test yolov2_tiny
run deepstream-app -c deepstream_app_config_yoloV2_tiny.txt
enable-dla=1
use-dla-core=0
batch-size=1
gst_nvinfer_parse_props
Unknown or legacy key specified ‘is-classifier’ for group [property]
Warn: ‘threshold’ parameter has been deprecated. Use ‘pre-cluster-threshold’ instead.
gstnvtracker: Loading low-level lib at /opt/nvidia/deepstream/deepstream/lib/libnvds_nvmultiobjecttracker.so
gstnvtracker: Batch processing is ON
gstnvtracker: Past frame output is ON
[NvMultiObjectTracker] Initialized
hubin trace generateBackendContext : 1991
Now entering deserializeEngineAndBackend, loading /home/wd/deepstream-6.1/sources/objectDetector_Yolo/yolov2_tiny_b32_gpu0_int8.engine
WARNING: [TRT]: Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors.
0:00:05.788872488 2356797 0x107bd80 INFO nvinfer gstnvinfer.cpp:646:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::deserializeEngineAndBackend() <nvdsinfer_context_impl.cpp:1905> [UID = 1]: deserialized trt engine from :/home/wd/deepstream-6.1/sources/objectDetector_Yolo/yolov2_tiny_b32_gpu0_int8.engine
INFO: [Implicit Engine Info]: layers num: 2
0 INPUT kFLOAT data 3x416x416
1 OUTPUT kFLOAT region_16 425x13x13
hubin trace generateBackendContext : 2009
0:00:06.050293970 2356797 0x107bd80 INFO nvinfer gstnvinfer.cpp:646:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::generateBackendContext() <nvdsinfer_context_impl.cpp:2010> [UID = 1]: Use deserialized engine model: /home/wd/deepstream-6.1/sources/objectDetector_Yolo/yolov2_tiny_b32_gpu0_int8.engine
0:00:06.065220614 2356797 0x107bd80 INFO nvinfer gstnvinfer_impl.cpp:328:notifyLoadModelStatus:<primary_gie> [UID 1]: Load new model:/home/wd/deepstream-6.1/sources/objectDetector_Yolo/config_infer_primary_yoloV2_tiny.txt sucessfully
after setting the environment variable, the result does not change,just as before
wd@aidget:~/deepstream-6.1/sources/objectDetector_Yolo$ echo $CUDA_DEVICE_MAX_CONNECTIONS
1
wd@aidget:~/deepstream-6.1/sources/objectDetector_Yolo$ deepstream-app -c deepstream_app_config_yoloV2_tiny.txt
hubin hubin hubin
hubin hubin hubin
gst_nvinfer_parse_props
Unknown or legacy key specified ‘is-classifier’ for group [property]
Warn: ‘threshold’ parameter has been deprecated. Use ‘pre-cluster-threshold’ instead.
gstnvtracker: Loading low-level lib at /opt/nvidia/deepstream/deepstream/lib/libnvds_nvmultiobjecttracker.so
gstnvtracker: Batch processing is ON
gstnvtracker: Past frame output is ON
[NvMultiObjectTracker] Initialized
hubin trace generateBackendContext : 1991
Now entering deserializeEngineAndBackend, loading /home/wd/deepstream-6.1/sources/objectDetector_Yolo/yolov2_tiny_b32_gpu0_int8.engine
WARNING: [TRT]: Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors.
0:00:05.939856639 2215895 0x107d380 INFO nvinfer gstnvinfer.cpp:646:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::deserializeEngineAndBackend() <nvdsinfer_context_impl.cpp:1905> [UID = 1]: deserialized trt engine from :/home/wd/deepstream-6.1/sources/objectDetector_Yolo/yolov2_tiny_b32_gpu0_int8.engine
INFO: [Implicit Engine Info]: layers num: 2
0 INPUT kFLOAT data 3x416x416
1 OUTPUT kFLOAT region_16 425x13x13
hubin trace generateBackendContext : 2009
0:00:06.200726151 2215895 0x107d380 INFO nvinfer gstnvinfer.cpp:646:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::generateBackendContext() <nvdsinfer_context_impl.cpp:2010> [UID = 1]: Use deserialized engine model: /home/wd/deepstream-6.1/sources/objectDetector_Yolo/yolov2_tiny_b32_gpu0_int8.engine
0:00:06.249899242 2215895 0x107d380 INFO nvinfer gstnvinfer_impl.cpp:328:notifyLoadModelStatus:<primary_gie> [UID 1]: Load new model:/home/wd/deepstream-6.1/sources/objectDetector_Yolo/config_infer_primary_yoloV2_tiny.txt sucessfully
deepstream_app_config_yoloV2_tiny.txt
…………
[primary-gie]
enable=1
gpu-id=0
#model-engine-file=yolov2_tiny_b32_gpu0_int8.engine
#model-engine-file=yolov2_tiny_b1_gpu0_int8.engine
#model-engine-file=yolov2_tiny_b1_gpu0_fp16.engine
labelfile-path=labels.txt
batch-size=1 #Required by the app for OSD, not a plugin property
bbox-border-color0=1;0;0;1
bbox-border-color1=0;1;1;1
bbox-border-color2=0;0;1;1
bbox-border-color3=0;1;0;1
interval=2
gie-unique-id=1
nvbuf-memory-type=0
config-file=config_infer_primary_yoloV2_tiny.txt
…………
than run
deepstream-app -c deepstream_app_config_yoloV2_tiny.txt
get engine
model_b1_gpu0_int8.engine
next change config file to use dla as below
enable-dla=1
use-dla-core=0
model-engine-file=model_b1_gpu0_int8.engine
#custom-network-config=yolov2-tiny.cfg
#model-file=yolov2-tiny.weights
run
deepstream-app -c deepstream_app_config_yoloV2_tiny.txt
does it have another way to build dla engine, or shoud change C++ code to use dla to infering?
Did you meet any errors when creating the engine file for DLA?
Since the device platform is decided when building time, you should add the DLA configure before generating the engine.
You will need to select the DLA at building time to generate a DLA-based engine.
But since YOLO requires some extra layers, it might not work by default.
Let us check this internally and share the information with you.
However, when compiling the TensorRT engine with DLA, we got the following output log:
...
Total number of yolo layers: 32
Building yolo network complete!
Building the TensorRT Engine...
WARNING: [TRT]: Default DLA is enabled but layer (Unnamed Layer* 31) [PluginV2Ext] is not supported on DLA, falling back to GPU.
WARNING: [TRT]: {ForeignNode[conv_1...conv_15]} cannot be compiled by DLA, fallback to GPU
Building complete!
...
The log indicates all the YOLO layers are fallback to the GPU since the layers are not supported by DLA.
So the model can only run on GPU and will occupy the GPU resources.