I am trying to port my ANPR (Automatic Number Plate Recognition) application to jetson. It consists of 3 stages (3 models)
- Plate detection in full frame (Modified Tiny-YoloV3 based darknet model)
- character detection in plates detected in first step (Modified Tiny-YoloV3 based single yolo layer darknet model)
- classification of characters detected in second stage.
Each of the models individually run fine but when i am trying to run them together in a single pipeline using Deepstream-app i get errors:
0:00:20.568228378 10565 0x5597ca5d90 ERROR nvinfer gstnvinfer.cpp:511:gst_nvinfer_logger:<secondary_gie_0> NvDsInferContext[UID 2]:queueInputBatch(): cudaMemcpyAsync for output buffers failed (cudaErrorLaunchFailure)
0:00:20.568442857 10565 0x5597ca5d90 WARN nvinfer gstnvinfer.cpp:1098:gst_nvinfer_input_queue_loop:<secondary_gie_0> error: Failed to queue input batch for inferencing
ERROR from secondary_gie_0: Failed to queue input batch for inferencing
Debug info: gstnvinfer.cpp(1098): gst_nvinfer_input_queue_loop (): /GstPipeline:pipeline/GstBin:secondary_gie_bin/GstNvInfer:secondary_gie_0
0:00:20.568754471 10565 0x5597ca5d90 ERROR nvinfer gstnvinfer.cpp:511:gst_nvinfer_logger:<secondary_gie_0> NvDsInferContext[UID 2]:queueInputBatch(): Failed to make stream wait on event(cudaErrorLaunchFailure)
0:00:20.568869315 10565 0x5597ca5d90 WARN nvinfer gstnvinfer.cpp:1098:gst_nvinfer_input_queue_loop:<secondary_gie_0> error: Failed to queue input batch for inferencing
0:00:20.568754888 10565 0x5597ca5a80 ERROR nvinfer gstnvinfer.cpp:976:get_converted_buffer:<secondary_gie_0> cudaMemset2DAsync failed with error cudaErrorLaunchFailure while converting buffer
0:00:20.568978325 10565 0x5597ca5a80 WARN nvinfer gstnvinfer.cpp:1536:gst_nvinfer_process_objects:<secondary_gie_0> error: Buffer conversion failed
ERROR from secondary_gie_0: Failed to queue input batch for inferencing
Debug info: gstnvinfer.cpp(1098): gst_nvinfer_input_queue_loop (): /GstPipeline:pipeline/GstBin:secondary_gie_bin/GstNvInfer:secondary_gie_0
ERROR from secondary_gie_0: Buffer conversion failed
Debug info: gstnvinfer.cpp(1536): gst_nvinfer_process_objects (): /GstPipeline:pipeline/GstBin:secondary_gie_bin/GstNvInfer:secondary_gie_0
sometimes
0:00:13.787260776 15050 0x5594aa4d90 ERROR nvinfer gstnvinfer.cpp:511:gst_nvinfer_logger:<primary_gie_classifier> NvDsInferContext[UID 1]:log(): cuda/cudaPoolingLayer.cpp (249) - Cuda Error in execute: 4 (unspecified launch failure)
0:00:13.790589265 15050 0x5594aa4a80 ERROR nvinfer gstnvinfer.cpp:976:get_converted_buffer:<secondary_gie_0> cudaMemset2DAsync failed with error cudaErrorLaunchFailure while converting buffer
0:00:13.790663640 15050 0x5594aa4a80 WARN nvinfer gstnvinfer.cpp:1536:gst_nvinfer_process_objects:<secondary_gie_0> error: Buffer conversion failed
ERROR from secondary_gie_0: Buffer conversion failed
Debug info: gstnvinfer.cpp(1536): gst_nvinfer_process_objects (): /GstPipeline:pipeline/GstBin:secondary_gie_bin/GstNvInfer:secondary_gie_0
Quitting
0:00:13.814560984 15050 0x5594aa4d90 ERROR nvinfer gstnvinfer.cpp:511:gst_nvinfer_logger:<primary_gie_classifier> NvDsInferContext[UID 1]:log(): cuda/cudaPoolingLayer.cpp (249) - Cuda Error in execute: 4 (unspecified launch failure)
0:00:13.814720463 15050 0x5594aa4d90 ERROR nvinfer gstnvinfer.cpp:511:gst_nvinfer_logger:<secondary_gie_0> NvDsInferContext[UID 2]:queueInputBatch(): Failed to enqueue inference batch
0:00:13.814777911 15050 0x5594aa4d90 WARN nvinfer gstnvinfer.cpp:1098:gst_nvinfer_input_queue_loop:<secondary_gie_0> error: Failed to queue input batch for inferencing
ERROR from secondary_gie_0: Failed to queue input batch for inferencing
Debug info: gstnvinfer.cpp(1098): gst_nvinfer_input_queue_loop (): /GstPipeline:pipeline/GstBin:secondary_gie_bin/GstNvInfer:secondary_gie_0
very rarely i get this too:
unspecified launch failure in file yoloPlugins.cpp at line 107
line 107 in yoloPlugins.cpp contains
int YoloLayerV3::enqueue(
int batchSize, const void* const* inputs, void** outputs, void* workspace,
cudaStream_t stream)
{
CHECK(cudaYoloLayerV3(
inputs[0], outputs[0], batchSize, m_GridSizeX, m_GridSizeY, m_NumClasses, m_NumBoxes,
m_OutputSize, stream)); //---> line 107
return 0;
}
config file looks like:
# Copyright (c) 2019 NVIDIA Corporation. All rights reserved.
#
# NVIDIA Corporation and its licensors retain all intellectual property
# and proprietary rights in and to this software, related documentation
# and any modifications thereto. Any use, reproduction, disclosure or
# distribution of this software and related documentation without an express
# license agreement from NVIDIA Corporation is strictly prohibited.
[application]
enable-perf-measurement=1
perf-measurement-interval-sec=5
#gie-kitti-output-dir=streamscl
[tiled-display]
enable=1
rows=1
columns=1
width=1280
height=720
gpu-id=0
#(0): nvbuf-mem-default - Default memory allocated, specific to particular platform
#(1): nvbuf-mem-cuda-pinned - Allocate Pinned/Host cuda memory, applicable for Tesla
#(2): nvbuf-mem-cuda-device - Allocate Device cuda memory, applicable for Tesla
#(3): nvbuf-mem-cuda-unified - Allocate Unified cuda memory, applicable for Tesla
#(4): nvbuf-mem-surface-array - Allocate Surface Array memory, applicable for Jetson
nvbuf-memory-type=0
[source0]
enable=1
#Type - 1=CameraV4L2 2=URI 3=MultiURI 4=RTSP
type=3
uri=file:///home/user/Desktop/numberplate_1.mp4
num-sources=1
#drop-frame-interval=2
gpu-id=0
# (0): memtype_device - Memory type Device
# (1): memtype_pinned - Memory type Host Pinned
# (2): memtype_unified - Memory type Unified
cudadec-memtype=0
[sink0]
enable=1
#Type - 1=FakeSink 2=EglSink 3=File
type=2 #5
sync=1
source-id=0
gpu-id=0
qos=0
nvbuf-memory-type=0
overlay-id=1
[osd]
enable=1
gpu-id=0
border-width=1
text-size=15
text-color=1;1;1;1;
text-bg-color=0.3;0.3;0.3;1
font=Serif
show-clock=0
clock-x-offset=800
clock-y-offset=820
clock-text-size=12
clock-color=1;0;0;0
nvbuf-memory-type=0
[streammux]
gpu-id=0
##Boolean property to inform muxer that sources are live
live-source=0
batch-size=1
##time out in usec, to wait after the first buffer is available
##to push the batch even if the complete batch is not formed
batched-push-timeout=40000
## Set muxer output width and height
width=1920
height=1080
##Enable to maintain aspect ratio wrt source, and allow black borders, works
##along with width, height properties
enable-padding=0
nvbuf-memory-type=0
# config-file property is mandatory for any gie section.
# Other properties are optional and if set will override the properties set in
# the infer config file.
[primary-gie]
enable=1
gpu-id=0
#Required by the app for OSD, not a plugin property
bbox-border-color0=1;0;0;1
bbox-border-color1=0;1;1;1
bbox-border-color2=0;0;1;1
bbox-border-color3=0;1;0;1
interval=0
gie-unique-id=1
nvbuf-memory-type=0
config-file=anpr_plate_det_gie_config.txt
[tracker]
enable=1
tracker-width=480
tracker-height=272
#ll-lib-file=/opt/nvidia/deepstream/deepstream-4.0/lib/libnvds_mot_iou.so
ll-lib-file=/opt/nvidia/deepstream/deepstream-4.0/lib/libnvds_mot_klt.so
#ll-config-file required for IOU only
#ll-config-file=iou_config.txt
gpu-id=0
[secondary-gie0]
enable=1
gpu-id=0
gie-unique-id=2
operate-on-gie-id=1
operate-on-class-ids=0
config-file=anpr_char_det_gie_config.txt
[secondary-gie1]
enable=1
gpu-id=0
gie-unique-id=3
operate-on-gie-id=2
operate-on-class-ids=0
config-file=anpr_char_rec_gie_config.txt
[tests]
file-loop=0
I am stuck as to what the error is.
Platform details:
- NVIDIA Jetson NANO/TX1
- Jetpack 4.2.1 [L4T 32.2.0]
- CUDA GPU architecture 5.3
- Libraries:
- CUDA 10.0.326
- cuDNN 7.5.0.56-1+cuda10.0
- TensorRT 5.1.6.1-1+cuda10.0
- Visionworks 1.6.0.500n
- OpenCV 4.1.1 compiled CUDA: YES
- Jetson Performance: active
deepstream-app version 4.0.1
DeepStreamSDK 4.0.1
Plate Detector config (anpr_plate_det_gie_config.txt):
[property]
gpu-id=0
net-scale-factor=1
model-color-format=0
custom-network-config=plate_det.cfg
model-file=plate_det.weights
model-engine-file=model.engine
labelfile-path=plate_det.names
batch-size=1
#0=FP32, 1=INT8, 2=FP16 mode
network-mode=2
process-mode=1
network-type=0
num-detected-classes=1
gie-unique-id=1
maintain-aspect-ratio=1
interval=0
parse-bbox-func-name=NvDsInferParseCustomYoloV3Tiny
custom-lib-path=/opt/nvidia/deepstream/deepstream-4.0/sources/objectDetector_Yolo/nvdsinfer_custom_impl_Yolo_plate_det/libnvdsinfer_custom_impl_Yolo.so
[class-attrs-all]
threshold=0.4
Character Detector config (anpr_char_det_gie_config.txt):
[property]
gpu-id=0
net-scale-factor=1
#0=RGB, 1=BGR, 2=GRAY
model-color-format=0
custom-network-config=char_det.cfg
model-file=char_det.weights
model-engine-file=model_b16_fp32.engine
labelfile-path=char_det.names
batch-size=16
#0=FP32, 1=INT8, 2=FP16 mode
network-mode=0
process-mode=2
network-type=0
num-detected-classes=1
gie-unique-id=2
operate-on-gie-id=1
operate-on-class-ids=0
maintain-aspect-ratio=1
parse-bbox-func-name=NvDsInferParseCustomYoloV3Tiny
custom-lib-path=/opt/nvidia/deepstream/deepstream-4.0/sources/objectDetector_Yolo/nvdsinfer_custom_impl_Yolo_char_det/libnvdsinfer_custom_impl_Yolo.so
[class-attrs-all]
threshold=0.4
Character recognition config (anpr_char_rec_gie_config.txt):
[property]
gpu-id=0
net-scale-factor=1
uff-file=char_rec_model.uff
model-engine-file=char_rec_model.uff_b64_fp32.engine
labelfile-path=char_rec_labels.txt
batch-size=64
# 0=FP32 and 1=INT8 mode
network-mode=0
process-mode=2
network-type=1
#0=RGB, 1=BGR, 2=GRAY
model-color-format=2
gpu-id=0
gie-unique-id=3
operate-on-gie-id=2
operate-on-class-ids=0
is-classifier=1
uff-input-dims=1;32;32;0
uff-input-blob-name=conv2d_input
output-blob-names=dense_1/Softmax
classifier-async-mode=0
classifier-threshold=0.50
If more info required please ask.