Deepstream-app 5 Seg Fault

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU) GPU
• DeepStream Version 5.0.1
• TensorRT Version TRT 7.0.0
• NVIDIA GPU Driver Version (valid for GPU only) R450.51
• Issue Type( questions, new requirements, bugs)
• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing)

################################################################################
#
# Copyright (c) 2020, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
################################################################################

[application]
enable-perf-measurement=1
perf-measurement-interval-sec=600
#gie-kitti-output-dir=streamscl

[tiled-display]
enable=1
rows=4
columns=3
width=1280
height=720
gpu-id=0
#(0): nvbuf-mem-default - Default memory allocated, specific to particular platform
#(1): nvbuf-mem-cuda-pinned - Allocate Pinned/Host cuda memory, applicable for Tesla
#(2): nvbuf-mem-cuda-device - Allocate Device cuda memory, applicable for Tesla
#(3): nvbuf-mem-cuda-unified - Allocate Unified cuda memory, applicable for Tesla
#(4): nvbuf-mem-surface-array - Allocate Surface Array memory, applicable for Jetson
nvbuf-memory-type=0

sources removed

[sink0]
enable=1
#Type - 1=FakeSink 2=EglSink 3=File
type=1
sync=0
source-id=0
gpu-id=0
nvbuf-memory-type=0
#1=mp4 2=mkv
container=1
#1=h264 2=h265
codec=1
output-file=yolov4.mp4

[sink1]
enable=1
#Type - 1=FakeSink 2=EglSink 3=File 4=UDPSink 5=nvoverlaysink 6=MsgConvBroker
type=6
#msg-conv-config=dstest5_msgconv_sample_config.txt
#(0): PAYLOAD_DEEPSTREAM - Deepstream schema payload
#(1): PAYLOAD_DEEPSTREAM_MINIMAL - Deepstream schema payload minimal
#(256): PAYLOAD_RESERVED - Reserved type
#(257): PAYLOAD_CUSTOM   - Custom schema payload
msg-conv-payload-type=0
msg-conv-config=/opt/nvidia/deepstream/deepstream-5.0/sources/deepstream_yolov4/msgconv.txt
msg-broker-proto-lib=/opt/nvidia/deepstream/deepstream-5.0/lib/libnvds_kafka_proto.so
#msg-broker-conn-str=
msg-broker-conn-str=
topic=Raw_Data
#Optional:
msg-broker-config=/opt/nvidia/deepstream/deepstream-5.0/sources/libs/kafka_protocol_adaptor/cfg_kafka.txt

[sink2]
enable=1
#Type - 1=FakeSink 2=EglSink 3=File
type=4
sync=0
source-id=0
gpu-id=0
nvbuf-memory-type=0
codec=2
bitrate=4000000
iframeinterval=10
rtsp-port=8554
profile=0
udp-buffer-size=100000

[osd]
enable=1
gpu-id=0
border-width=1
text-size=12
text-color=1;1;1;1;
text-bg-color=0.3;0.3;0.3;1
font=Serif
show-clock=0
clock-x-offset=800
clock-y-offset=820
clock-text-size=12
clock-color=1;0;0;0
nvbuf-memory-type=0

[streammux]
gpu-id=0
##Boolean property to inform muxer that sources are live
live-source=1
batch-size=10
##time out in usec, to wait after the first buffer is available
##to push the batch even if the complete batch is not formed
batched-push-timeout=40000
## Set muxer output width and height
width=1280
height=720
##Enable to maintain aspect ratio wrt source, and allow black borders, works
##along with width, height properties
enable-padding=0
nvbuf-memory-type=0

# config-file property is mandatory for any gie section.
# Other properties are optional and if set will override the properties set in
# the infer config file.
[primary-gie]
enable=1
gpu-id=0
#model-engine-file=yolov4_1_3_320_512_fp16.engine
labelfile-path=labels.txt
batch-size=10
force-implicit-batch-dim=1
#Required by the app for OSD, not a plugin property
bbox-border-color0=1;0;0;1
bbox-border-color1=0;1;1;1
bbox-border-color2=0;0;1;1
bbox-border-color3=0;1;0;1
interval=0
gie-unique-id=1
nvbuf-memory-type=0
config-file=config_infer_primary_yoloV4.txt

[tracker]
enable=1
tracker-width=640
tracker-height=384
gpu-id=0
#ll-lib-file=/opt/nvidia/deepstream/deepstream-5.0/lib/libnvds_mot_klt.so
ll-lib-file=/opt/nvidia/deepstream/deepstream-5.0/lib/libnvds_nvdcf.so
ll-config-file=tracker_config.yml
enable-batch-process=1

After running for 10hours+ we are getting the below seg fault, has happened 3x now after a few weeks of running.

ERROR: nvdsinfer_context_impl.cpp:1572 Failed to synchronize on cuda copy-coplete-event, cuda err_no:719, err_str:cudaErrorLaunchFailure
12:25:37.866070983 17263 0x55db8604b370 WARN                 nvinfer gstnvinfer.cpp:2012:gst_nvinfer_output_loop:<primary_gie> error: Failed to dequeue output from inferencing. NvDsInferContext error: NVDSINFER_CUDA_ERROR
ERROR from primary_gie: Failed to dequeue output from inferencing. NvDsInferContext error: NVDSINFER_CUDA_ERROR
Debug info: gstnvinfer.cpp(2012): gst_nvinfer_output_loop (): /GstPipeline:pipeline/GstBin:primary_gie_bin/GstNvInfer:primary_gie
12:25:37.866296196 17263 0x55db8604b370 WARN                 nvinfer gstnvinfer.cpp:616:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Warning from NvDsInferContextImpl::releaseBatchOutput() <nvdsinfer_context_impl.cpp:1606> [UID = 1]: Tried to release an outputBatchID which is already with the context
Cuda failure: status=719 in CreateTextureObj at line 2555
nvbufsurftransform.cpp:2624: => Transformation Failed -2

Segmentation fault

I’m confused that there is no source in your pasted configuration file. Will the error happen with our pretrained models inside deepstreamSDK? Can you share your Yolov4 model?

Yeah sorry I removed the sources for security, they were all rtsp type.

Engine:

Will the error happen with our pretrained models inside deepstreamSDK?

How did you generate the engine? Can you share the model file but not the engine file?

I have used PeopleNet pretrained model at other sites with no issues over many months.

I followed the instructions here: yolov4_deepstream/deepstream_yolov4 at master · NVIDIA-AI-IOT/yolov4_deepstream · GitHub

I have generated yolov4 engines for both Jetson arm and x64. No issues so far on Jetson.

How about YoloV3 in your platform?

Have used yolov3 int8 before never seen that seg fault issue

Hi @gabe_ddi,
Could you try removing all “nvbuf-memory-type=0” and check if it helps ?

Thanks!

Have removed all nvbuf-memory-type=0, will monitor and report back

After 1 hour now get the below error after removing nvbuf-memory-type-0

Cuda failure: status=719
ERROR from nvv4l2decoder1740: Failed to allocate required memory.
Debug info: gstv4l2videodec.c(1563): gst_v4l2_video_dec_handle_frame (): /GstPipeline:pipeline/GstBin:multi_src_bin/GstBin:src_sub_bin9/GstDecodeBin:decodebin_elem9/nvv4l2decoder:nvv4l2decoder1740:
Buffer pool activation failed

i think you should run into decoder memory leak issue , could you use CUDA driver < 450

Removed cudadec-memtype=0 from sources and the above error went away.

But i still get:

_output_loop:<primary_gie> error: Failed to dequeue output from inferencing. NvDsInferContext error: NVDSINFER_CUDA_ERROR
Cuda failure: status=716 in CreateTextureObj at line 2555
nvbufsurftransform.cpp:2624: => Transformation Failed -2

Cuda failure: status=716 in CreateTextureObj at line 2555
nvbufsurftransform.cpp:2624: => Cuda failure: status=716 in CreateTextureObj at line 2555
Transformation Failed -2

Segmentation fault

After 2 days of running

As mentioned above, could you try CUDA driver < 450, e.g. 440?

Is there a proper way to downgrade drivers? The reason I ask is that the machine is in the field and inaccessible (7m above the ground) so we cannot have issues in display or output if we downgrade the drivers

Firstly, you need to know how your CUDA driver was installed on the machine, was it installed by deb package, or run package. The CUDA driver uninstall/install steps are different acording to the installation method.

Fro reference, this are the steps I ever used to update the driver which was installed by run package.
For others, you could also refer to guidance in Installation Guide Linux :: CUDA Toolkit Documentation

Uninstall -
~ sudo service lightdm stop
~ sudo /usr/NX/bin/nxserver --shutdown
~ sudo systemctl stop docker
~ sudo ./NVIDIA-Linux-x86_64-390.30.run --uninstall
“If you plan to no longer use the NVIDIA driver, you should make sure that no X screens are configured to use the NVIDIA X driver in your X
configuration file. If you used nvidia-xconfig to configure X, it may have created a backup of your original configuration. Would you like to
run nvidia-xconfig --restore-original-backup to attempt restoration of the original X configuration file?” ===> Yes

Re-Install -
~ sudo ./NVIDIA-Linux-x86_64-418.67.run
“Would you like to register the kernel module sources with DKMS? This will allow DKMS to automatically build a new module, if you install a different kernel later.” ==> No
“WARNING: Ignoring CC version mismatch: The kernel was built with gcc version 7.3.0 (Ubuntu 7.3.0-16ubuntu3), but the current compiler version is cc (Ubuntu 7.4.0-1ubuntu1~18.04) 7.4.0.” ==> No
“Install NVIDIA’s 32-bit compatibility libraries?” ===> Yes
“An incomplete installation of libglvnd was found. Do you want to install a full copy of libglvnd? This will overwrite any existing libglvnd libraries” ==> No
$ sudo systemctl start docker
$ sudo service lightdm start
~ nvidia-smi // to check driver version