Deepstream-app stopped with error

DW02689 · November 18, 2021, 6:13pm

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU): Jetson AGX Xavier
• DeepStream Version: 5.0
• JetPack Version (valid for Jetson only): 4.4
• TensorRT Version: TensorRT 7.1.3
• Issue Type( questions, new requirements, bugs):questions/bugs

Hi, I am running a deepstream-app pipeline with config files attached here.
ds_app_config_4ch_yoloV3.txt (5.2 KB)

I was streaming four 10-hr videos sources from an external hard drive and enabled the ‘file-loop = 1’ at the bottom of config, in order to let it start over once the videos come to their end. It’s been running normally for the past 7 days, but today the pipeline threw out an exception and my external hard drive unmounted itself. It stopped as follows:

ERROR: Failed to synchronize on cuda copy-coplete-event, cuda err_no:6, err_str:cudaErrorLaunchTimeout
180:59:10.814824064 17840      0xcc4e680 WARN                 nvinfer gstnvinfer.cpp:2012:gst_nvinfer_output_loop:<primary_gie> error: Failed to dequeue output from inferencing. NvDsInferContext error: NVDSINFER_CUDA_ERROR
180:59:10.815804352 17840      0xcc4e680 WARN                 nvinfer gstnvinfer.cpp:616:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Warning from NvDsInferContextImpl::releaseBatchOutput() <nvdsinfer_context_impl.cpp:1606> [UID = 1]: Tried to release an outputBatchID which is already with the context
ERROR from primary_gie: Failed to dequeue output from inferencing. NvDsInferContext error: NVDSINFER_CUDA_ERROR
Debug info: /dvs/git/dirty/git-master_linux/deepstream/sdk/src/gst-plugins/gst-nvinfer/gstnvinfer.cpp(2012): gst_nvinfer_output_loop (): /GstPipeline:pipeline/GstBin:primary_gie_bin/GstNvInfer:primary_gie
ERROR: Failed to synchronize on cuda copy-coplete-event, cuda err_no:6, err_str:cudaErrorLaunchTimeout
180:59:10.823969120 17840      0xcc4e680 WARN                 nvinfer gstnvinfer.cpp:2012:gst_nvinfer_output_loop:<primary_gie> error: Failed to dequeue output from inferencing. NvDsInferContext error: NVDSINFER_CUDA_ERROR
180:59:10.824070368 17840      0xcc4e680 WARN                 nvinfer gstnvinfer.cpp:616:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Warning from NvDsInferContextImpl::releaseBatchOutput() <nvdsinfer_context_impl.cpp:1606> [UID = 1]: Tried to release an outputBatchID which is already with the context
ERROR from primary_gie: Failed to dequeue output from inferencing. NvDsInferContext error: NVDSINFER_CUDA_ERROR
Debug info: /dvs/git/dirty/git-master_linux/deepstream/sdk/src/gst-plugins/gst-nvinfer/gstnvinfer.cpp(2012): gst_nvinfer_output_loop (): /GstPipeline:pipeline/GstBin:primary_gie_bin/GstNvInfer:primary_gie
180:59:10.844127968 17840      0xcc4e6d0 ERROR                nvinfer gstnvinfer.cpp:1103:get_converted_buffer:<primary_gie> cudaMemset2DAsync failed with error cudaErrorLaunchTimeout while converting buffer
180:59:10.844200192 17840      0xcc4e6d0 WARN                 nvinfer gstnvinfer.cpp:1363:gst_nvinfer_process_full_frame:<primary_gie> error: Buffer conversion failed
ERROR from primary_gie: Buffer conversion failed
Debug info: /dvs/git/dirty/git-master_linux/deepstream/sdk/src/gst-plugins/gst-nvinfer/gstnvinfer.cpp(1363): gst_nvinfer_process_full_frame (): /GstPipeline:pipeline/GstBin:primary_gie_bin/GstNvInfer:primary_gie
Quitting
ERROR: [TRT]: ../rtSafe/safeContext.cpp (133) - Cudnn Error in configure: 7 (CUDNN_STATUS_MAPPING_ERROR)
ERROR: [TRT]: FAILED_EXECUTION: std::exception
ERROR: Failed to enqueue trt inference batch
ERROR: Infer context enqueue buffer failed, nvinfer error:NVDSINFER_TENSORRT_ERROR
180:59:10.873089984 17840      0xcc4e590 WARN                 nvinfer gstnvinfer.cpp:1216:gst_nvinfer_input_queue_loop:<primary_gie> error: Failed to queue input batch for inferencing
ERROR: Failed to make stream wait on event, cuda err_no:6, err_str:cudaErrorLaunchTimeout
ERROR: Preprocessor transform input data failed., nvinfer error:NVDSINFER_CUDA_ERROR
180:59:10.874032896 17840      0xcc4e590 WARN                 nvinfer gstnvinfer.cpp:1216:gst_nvinfer_input_queue_loop:<primary_gie> error: Failed to queue input batch for inferencing
ERROR from primary_gie: Failed to queue input batch for inferencing
Debug info: /dvs/git/dirty/git-master_linux/deepstream/sdk/src/gst-plugins/gst-nvinfer/gstnvinfer.cpp(1216): gst_nvinfer_input_queue_loop (): /GstPipeline:pipeline/GstBin:primary_gie_bin/GstNvInfer:primary_gie
ERROR from primary_gie: Failed to queue input batch for inferencing
Debug info: /dvs/git/dirty/git-master_linux/deepstream/sdk/src/gst-plugins/gst-nvinfer/gstnvinfer.cpp(1216): gst_nvinfer_input_queue_loop (): /GstPipeline:pipeline/GstBin:primary_gie_bin/GstNvInfer:primary_gie
NvMOT_DeInit - Context Handle Size:8
ERROR: [TRT]: engine.cpp (179) - Cuda Error in ~ExecutionContext: 702 (the launch timed out and was terminated)
ERROR: [TRT]: INTERNAL_ERROR: std::exception
ERROR: [TRT]: Parameter check failed at: ../rtSafe/safeContext.cpp::terminateCommonContext::155, condition: cudnnDestroy(context.cudnn) failure.
ERROR: [TRT]: Parameter check failed at: ../rtSafe/safeContext.cpp::terminateCommonContext::165, condition: cudaEventDestroy(context.start) failure.
ERROR: [TRT]: Parameter check failed at: ../rtSafe/safeContext.cpp::terminateCommonContext::170, condition: cudaEventDestroy(context.stop) failure.
ERROR: [TRT]: ../rtSafe/safeRuntime.cpp (32) - Cuda Error in free: 702 (the launch timed out and was terminated)
terminate called after throwing an instance of 'nvinfer1::CudaError'
  what():  std::exception
./run.sh: line 1: 17840 Aborted                 (core dumped) deepstream-app -c trafficYolo_deepstream/ds_app_config_4ch_yoloV3.txt -t

It looks like errors coming from inside of CUDA. Would you mind telling me what’s going on here? Thank you!

mchi · November 19, 2021, 1:25am

There is no update from you for a period, assuming this is not an issue any more.
Hence we are closing this topic. If need further support, please open a new one.
Thanks

Can you share the log generated with “sudo nvidia-bug-report-tegra.sh” when issue happens?

system · December 13, 2021, 6:13am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.