NvInfer Element Seg Fault

quinn · October 15, 2021, 5:40pm

We are currently running a custom TLT model through Deepstream on a live stream and we consistently reach a segmentation fault error after a given period of time. The seg fault appears to happen in the dequeueOutputBatch call that is in the gst_nvinfer_output_loop method of gstnvinfer.cpp. The call stack for the fault is given below:

<unknown> 0x00007fdeabbdccff
std::__copy_move<true, true, std::random_access_iterator_tag>::__copy_m<float> stl_algobase.h:368
std::__copy_move_a<true, float*, float*> stl_algobase.h:386
std::__copy_move_a2<true, float*, float*> stl_algobase.h:424
std::copy<std::move_iterator<float*>, float*> stl_algobase.h:456
std::__uninitialized_copy<true>::__uninit_copy<std::move_iterator<float*>, float*> stl_uninitialized.h:101
std::uninitialized_copy<std::move_iterator<float*>, float*> stl_uninitialized.h:134
std::__uninitialized_copy_a<std::move_iterator<float*>, float*, float> stl_uninitialized.h:289
std::__uninitialized_move_if_noexcept_a<float*, float*, std::allocator<float> > stl_uninitialized.h:312
std::vector<float, std::allocator<float> >::_M_realloc_insert<float> vector.tcc:431
std::vector<float, std::allocator<float> >::emplace_back<float> vector.tcc:105
std::vector<float, std::allocator<float> >::push_back stl_vector.h:954
NvDsInferParseCustomFrcnnTLT nvdsinfer_custombboxparser_frcnn_tlt.cpp:325
nvdsinfer::DetectPostprocessor::fillDetectionOutput nvdsinfer_context_impl_output_parsing.cpp:721
nvdsinfer::DetectPostprocessor::parseEachBatch nvdsinfer_context_impl.cpp:711
nvdsinfer::InferPostprocessor::postProcessHost nvdsinfer_context_impl.cpp:584
nvdsinfer::NvDsInferContextImpl::dequeueOutputBatch nvdsinfer_context_impl.cpp:1577
gst_nvinfer_output_loop gstnvinfer.cpp:2014
<unknown> 0x00007fdeaddfd2a5
start_thread 0x00007fdeace176db
clone 0x00007fdeabb6f71f

This sometime occurs after a few minutes, sometime it occurs after several hours. However, it regularly occurs during the stream processing.

The previous solution provided was to test an mp4 file instead of a live stream. However, we were hoping for another solution as

The fault can occur several hours or days after startup and we do not have an available mp4 file of that length
Our only use case is for live stream. So if using an mp4 does solve this issue it would not apply for our use

kayccc · October 17, 2021, 11:02pm

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU)
• DeepStream Version
• JetPack Version (valid for Jetson only)
• TensorRT Version
• NVIDIA GPU Driver Version (valid for GPU only)
• Issue Type( questions, new requirements, bugs)
• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing)
• Requirement details( This is for new requirement. Including the module name-for which plugin or for which sample application, the function description)

quinn · October 18, 2021, 2:18pm

Driver Version: 460.91.03
CUDA Version: 11.2
Deepstream SDK 5.1
To reproduce the issue we run deepstream on a multi-gpu system with 2 or more cameras running live streams and inferring every 3 frames with TLT models. The segfault can occur anytime between 5 minutes and up to 12 hours or later.

Fiona.Chen · October 19, 2021, 2:06am

Are you using NVIDIA-AI-IOT/deepstream_tao_apps: Sample apps to demonstrate how to deploy models trained with TAO on DeepStream (github.com)? Please update to latest version.

quinn · October 20, 2021, 1:50pm

Our currently application is only capable of using TLT-2.0 models. We have plans in the future to integrate TAO models but would it be possible to troubleshoot the issue with TLT-2.0 models?

system · November 23, 2021, 2:03am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.