DeepStream API error

Please provide complete information as applicable to your setup.

Hardware Platform (Jetson / GPU) : GPU Tesla T4
DeepStream Version : 6.0.1
TensorRT Version : 8.0.1.6
NVIDIA GPU Driver Version : 470.129.06
Issue Type : bugs
How to reproduce the issue ?
I have a program developed based on DeepStream, running on a cloud server with 4 Tesla T4 GPUs. I encountered some issues. When I create 40 pipelines, I use polling to place plugins with GPU ID settings onto the corresponding GPUs for processing. This approach avoids memory transfers between GPUs and improves the efficiency of the program. At the end of the pipeline, there is a custom plugin that I built based on the official example. After successfully running the program with 40 pipelines for a certain period of time, this plugin starts throwing errors. Upon investigation, I found that most of the errors are related to data transfers, including DeepStream API errors and errors related to copying data from the GPU to the CPU. The error messages are as follows:

error1:

#0 0x00007ffff3bdd6a3 in g_list_first ()
at /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0
#1 0x00007ffff464830a in nvds_acquire_meta_from_pool ()
at /opt/nvidia/deepstream/deepstream-6.0/lib/libnvds_meta.so
#2 0x00007ffff46459ce in nvds_acquire_user_meta_from_pool ()
at /opt/nvidia/deepstream/deepstream-6.0/lib/libnvds_meta.so
#3 0x00007fff9fd73574 in gst_tep_transform_ip(_GstBaseTransform*, _GstBuffer*) ()
at /usr/lib/x86_64-linux-gnu/gstreamer-1.0/deepstream/libnvdsgst_TrafficEventProcess.so

error2:

#0 0x00007ffff1ce9c3e in _int_free (have_lock=0, p=0x7ff96dfe4100, av=0x7ff96c000020) at malloc.c:4310
#1 0x00007ffff1ce9c3e in __GI___libc_free (mem=0x7ff96dfe4110)
at malloc.c:3134
#2 0x00007ffff46477af in release_obj_meta ()
at /opt/nvidia/deepstream/deepstream-6.0/lib/libnvds_meta.so
#3 0x00007ffff4648b0b in nvds_clear_meta_list ()
at /opt/nvidia/deepstream/deepstream-6.0/lib/libnvds_meta.so
#4 0x00007ffff464748f in release_frame_meta ()
at /opt/nvidia/deepstream/deepstream-6.0/lib/libnvds_meta.so
#5 0x00007ffff46481b6 in nvds_destroy_meta_pool ()
at /opt/nvidia/deepstream/deepstream-6.0/lib/libnvds_meta.so
#6 0x00007ffff4647c83 in nvds_destroy_frame_meta_pool ()
at /opt/nvidia/deepstream/deepstream-6.0/lib/libnvds_meta.so
#7 0x00007ffff46457bb in nvds_destroy_batch_meta ()
at /opt/nvidia/deepstream/deepstream-6.0/lib/libnvds_meta.so
#8 0x00007ffff4645fca in nvds_batch_meta_release_func ()
at /opt/nvidia/deepstream/deepstream-6.0/lib/libnvds_meta.so
#9 0x00007ffff484dff5 in gst_nvds_meta_free ()
at /opt/nvidia/deepstream/deepstream-6.0/lib/libnvdsgst_meta.so
#10 0x00007ffff41405ef in gst_buffer_foreach_meta ()
at /usr/lib/x86_64-linux-gnu/libgstreamer-1.0.so.0
#11 0x00007ffff414361e in gst_buffer_pool_release_buffer ()

error3:

API cudaMemcpy() also causes the error to occur.

Requirement details :
This error has not occurred before. I optimized the program this time by assigning the corresponding GPUs to all plugins in the pipeline that can have a GPU ID, which indeed improved GPU utilization. Since I am using YOLOv7 for inference in between, I aim for inference time to be less than 80ms. Prior to optimization, with 4 T4 GPUs, only 30 pipelines could guarantee inference time within 80ms. After optimization, I can now ensure that 44 pipelines have inference time less than 80ms. Below is the diagram of my pipeline construction.

In a complete pipeline, I have set the GPU ID for the following plugins: nvvideoconvert, nvv4l2decoder, nvinfer, nvstreammux, nvvideoconvert, nvdsosd, and nveglglessink. However, my custom plugin does not have GPU ID setting functionality, so it copies data from CUDA to the CPU for processing.

I believe that the GPUs are allocated with the same resources. You can see the memory usage of the 44 pipelines as shown in the following image.

error4:

ERROR: nvdsinfer_context_impl.cpp:617 postprocessing cudaMemcpyAsync for output buffers failed, cuda err_no:700, err_str:cudaErrorIllegalAddress
CUDA failure: CUDA_ERROR_ILLEGAL_ADDRESS
ERROR: nvdsinfer_context_impl.cpp:1658 post cuda process failed., nvinfer error:NVDSINFER_CUDA_ERROR
0:10:34.022086886 5800 0x7ffac801ff20 WARN nvinfer gstnvinfer.cpp:1324:gst_nvinfer_input_queue_loop: error: Failed to queue input batch for inferencing
CUDA failure: an illegal memory access was encountered in file yoloPlugins.cpp at line 264
an illegal memory access was encountered in file yoloPlugins.cpp at line 264
CUDA_ERROR_ILLEGAL_ADDRESS
ERROR: nvdsinfer_context_impl.cpp:525 Failed to record cuda event, cuda err_no:700, err_str:cudaErrorIllegalAddress
ERROR: nvdsinfer_context_impl.cpp:1619 Preprocessor transform input data failed., nvinfer error:NVDSINFER_CUDA_ERROR
CUDA_ERROR_ILLEGAL_ADDRESS
0:10:34.022284517 5800 0x7ffd10a46800 WARN nvinfer gstnvinfer.cpp:1324:gst_nvinfer_input_queue_loop: error: Failed to queue input batch for inferencing
CUDA failure: CUDA failure: an illegal memory access was encountered in file yoloPlugins.cpp at line CUDA failure: 230
an illegal memory access was encounteredCUDA_ERROR_ILLEGAL_ADDRESS
nvbufsurface: NvBufSurfaceSysToHWCopy: failed in mem copy
CUDA_ERROR_ILLEGAL_ADDRESS
ERROR: nvdsinfer_context_impl.cpp:518 Failed to add cudaStream callback for returning input buffers, cuda err_no:700, err_str:cudaErrorIllegalAddress
CUDA failure: an illegal memory access was encountered in file yoloPlugins.cpp at line 264
ERROR: nvdsinfer_context_impl.cpp:1619 Preprocessor transform input data failed., nvinfer error:NVDSINFER_CUDA_ERROR
ERROR: nvdsinfer_context_impl.cpp:1655 postprocessing cuda waiting event failed , cuda err_no:700, err_str:cudaErrorIllegalAddress
0:10:34.022512841 5800 0x7ffd000256d0 WARN nvinfer gstnvinfer.cpp:1324:gst_nvinfer_input_queue_loop: error: Failed to queue input batch for inferencing
0:10:34.022529339 5800 0x7ff7fc035850 WARN nvinfer gstnvinfer.cpp:1324:gst_nvinfer_input_queue_loop: error: Failed to queue input batch for inferencing
CUDA failure: an illegal memory access was encountered in file yoloPlugins.cpp at line 230
[ ERROR: CUDA Runtime ] an illegal memory access was encountered
CUDA_ERROR_ILLEGAL_ADDRESS
[ ERROR: CUDA Runtime ] an illegal memory access was encountered
CUDA failure: an illegal memory access was encountered in file yoloPlugins.cpp at line 261
in file yoloPlugins.cpp at line 230
CUDA_ERROR_ILLEGAL_ADDRESS
an illegal memory access was encounteredCUDA failure: an illegal memory access was encountered in file yoloPlugins.cpp at line 261
CUDA failure: CUDA_ERROR_ILLEGAL_ADDRESS
in file CUDA_ERROR_ILLEGAL_ADDRESS
CUDA failure: yoloPlugins.cppan illegal memory access was encountered in file yoloPlugins.cpp at line 264
at line CUDA_ERROR_ILLEGAL_ADDRESS
261
CUDA failure: an illegal memory access was encountered in file yoloPlugins.cpp at line 261
CUDA_ERROR_ILLEGAL_ADDRESS
an illegal memory access was encountered in file yoloPlugins.cpp at line 230
0:10:34.027862661 5800 0x7ffb3403b540 ERROR nvinfer gstnvinfer.cpp:1192:get_converted_buffer: cudaMemset2DAsync failed with error cudaErrorIllegalAddress while converting buffer
ERROR: nvdsinfer_context_impl.cpp:341 Failed to make stream wait on event, cuda err_no:700, err_str:cudaErrorIllegalAddress
0:10:34.030304733 5800 0x7ffb3403b540 WARN nvinfer gstnvinfer.cpp:1472:gst_nvinfer_process_full_frame: error: Buffer conversion failed
ERROR: nvdsinfer_context_impl.cpp:341 Failed to make stream wait on event, cuda err_no:700, err_str:cudaErrorIllegalAddress
ERROR: nvdsinfer_context_impl.cpp:1619 Preprocessor transform input data failed., nvinfer error:NVDSINFER_CUDA_ERROR
ERROR: nvdsinfer_context_impl.cpp:1619 Preprocessor transform input data failed., nvinfer error:NVDSINFER_CUDA_ERROR
0:10:34.030389027 5800 0x7ff7fc035850 WARN nvinfer gstnvinfer.cpp:1324:gst_nvinfer_input_queue_loop: error: Failed to queue input batch for inferencing
0:10:34.030411288 5800 0x7ffac801ff20 WARN nvinfer gstnvinfer.cpp:1324:gst_nvinfer_input_queue_loop: error: Failed to queue input batch for inferencing
ERROR: nvdsinfer_context_impl.cpp:341 Failed to make stream wait on event, cuda err_no:700, err_str:cudaErrorIllegalAddress
ERROR: nvdsinfer_context_impl.cpp:1619 Preprocessor transform input data failed., nvinfer error:NVDSINFER_CUDA_ERROR
0:10:34.030477701 5800 0x7ffac801ff20 WARN nvinfer gstnvinfer.cpp:1324:gst_nvinfer_input_queue_loop: error: Failed to queue input batch for inferencing
ERROR: nvdsinfer_context_impl.cpp:341 Failed to make stream wait on event, cuda err_no:700, err_str:cudaErrorIllegalAddress
ERROR: nvdsinfer_context_impl.cpp:1619 Preprocessor transform input data failed., nvinfer error:NVDSINFER_CUDA_ERROR
0:10:34.030531543 5800 0x7ffac801ff20 WARN nvinfer gstnvinfer.cpp:1324:gst_nvinfer_input_queue_loop: error: Failed to queue input batch for inferencing
Cuda failure: status=700
Cuda failure: status=700 in cuResData at line 316
Cuda failure: status=700 in cuResData at line 337
Cuda failure: status=700 in cuResData at line 316
Cuda failure: status=700 in cuResData at line 337

These errors occur randomly, and only one type of error appears each time.

I conducted another test using four command windows, assigning and setting each to use one of the four GPUs. Then, I ran four programs, totaling 44 pipelines, and no errors occurred, as shown in the following image.

Has no one replied to me? Please help me, thank you very much!

According to your description, seems the problem is related to your customized plugin. Have you debug with your plugin? Have you tried 44 pipelines without your customized plugin?

Indeed, as you said, after I close the custom plugin, it won’t crash. I guess it’s just that I didn’t specify a gpu id for the plugin. So how do I set the gpu id attribute in the custom plugin, and is there a routine?

I have added the gpu id attribute to my custom plugin according to the plugin specification, specifying that data should be copied from the graphics memory to the CPU for processing on the GPU where the pipeline is located. However, after running for a period of time, a similar error will still be reported. Upon my own inspection, I found that the GPU specification was successful. Do you have any thoughts on this phenomenon? My custom plugin is to post-process the detection results of YOLO. All of them will be copied from the graphics to memory, then processed, and then transmitted to the main program through standardized data.

How did you do that? With CUDA APIs?

I imitated the gst-nvdsosd plugin in the deepstream folder and made gpu-id modifications in the custom plugin. I modified the plugin’s gst_ tep_ start, gst_ tep_stop, gst_ tep_ set_ caps, gst_ tep_ class_ init, gst_ tep_ set_ troperty, gst_ tep_ get_ property, gst_ tep_ transform_ ip. At gst_ tep_ start, gst_ tep_ stop, gst_ tep_ set_ caps, gst_ tep_ transform_ ip uses the following functions

cudaError_t CUerr = cudaSuccess;
CUerr = cudaSetDevice (tep->gpu_id);
if (CUerr != cudaSuccess) {
    std::cerr<<"Unable to set device"<<std::endl;
    return GST_FLOW_ERROR;
}
GST_LOG_OBJECT (tep, "SETTING CUDA DEVICE = %d in tep func=%s\n",
 tep->gpu_id, __func__);

Another strange phenomenon is that the video sources of the pipelines I am currently testing are all local videos. I found that when I run 44 pipelines, after all pipelines are correctly established and started running for a period of time, there will always be several pipelines that cannot pull the stream for a period of time. My program has a watchdog like function, and if the stream cannot be pulled within 20 seconds, it will disconnect the pipeline and use the stream before the pipeline_ ID (because stream_id is used for GPU rotation allocation to ensure allocation to the original GPU) Reconnect the plugin and create a new pipeline. After closing the pipeline multiple times and creating it, the previous errors will occur, which means that the errors will only occur after closing the pipeline that cannot be pulled into the flow and recreating the pipeline.

But I did a new test and only used 4 pipelines. If they were allocated to 4 GPUs, there would be no pipeline that could not pull the stream, and the watchdog mechanism would only occur after the local video ran, and the previous errors would not be reported.

Have you debug with your plugin to find out the root cause of the crashing?

What does the “pull the stream” mean?
Have you tested only with simple gst-launch pipelines?

First question:

As I mentioned earlier, the errors I encounter each time are different, but upon reviewing the errors, I noticed they are related to CUDA and malloc. Here are the recent error messages from my crashes:

error5:
#0 0x00007ffff1c90e87 in __GI_raise (sig=sig@entry=6)
at …/sysdeps/unix/sysv/linux/raise.c:51
#1 0x00007ffff1c927f1 in __GI_abort () at abort.c:79
#2 0x00007ffff1cdb837 in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x7ffff1e08a7b “%s\n”) at …/sysdeps/posix/libc_fatal.c:181
#3 0x00007ffff1ce28ba in malloc_printerr (str=str@entry=0x7ffff1e06cfc “malloc(): memory corruption”) at malloc.c:5342
#4 0x00007ffff1ce6a04 in _int_malloc (av=av@entry=0x7ffe50000020, bytes=bytes@entry=4280) at malloc.c:3748
#5 0x00007ffff1ce92ad in __GI___libc_malloc (bytes=4280) at malloc.c:3075
#6 0x00007ffeec6b5f9e in () at /usr/lib/x86_64-linux-gnu/libnvcuvid.so.1
#7 0x00007ffeec70c7ae in () at /usr/lib/x86_64-linux-gnu/libnvcuvid.so.1
#8 0x00007ffeec70d078 in () at /usr/lib/x86_64-linux-gnu/libnvcuvid.so.1
#9 0x00007ffeec6b5adb in () at /usr/lib/x86_64-linux-gnu/libnvcuvid.so.1
#10 0x00007ffeed011b0e in cuvidv4l2_dec_enqueue_instream_buffers(v4l2_decoder_context_rec*) () at ///opt/nvidia/deepstream/deepstream-6.0/lib/libcuvidv4l2.so
#11 0x00007ffeed011e44 in cuvidv4l2_dec_thread_func(void*) ()
at ///opt/nvidia/deepstream/deepstream-6.0/lib/libcuvidv4l2.so
#12 0x00007ffeecffeaea in thread_wrapper(void*) ()
at ///opt/nvidia/deepstream/deepstream-6.0/lib/libcuvidv4l2.so
#13 0x00007ffff25eb6db in start_thread (arg=0x7ff8d57fa700)
at pthread_create.c:463
#14 0x00007ffff1d7361f in clone ()
—Type to continue, or q to quit—
at …/sysdeps/unix/sysv/linux/x86_64/clone.S:95

error6:
CUDA failure: CUDA failure: CUDA_ERROR_ILLEGAL_ADDRESS
CUDA_ERROR_ILLEGAL_ADDRESS
CUDA failure: CUDA_ERROR_ILLEGAL_ADDRESS
CUDA_ERROR_ILLEGAL_ADDRESS
an illegal memory access was encountered in file yoloPlugins.cpp at line CUDA_ERROR_ILLEGAL_ADDRESS
CUDA failure: an illegal memory access was encountered in file yoloPlugins.cpp at line 234
an illegal memory access was encounterednvbufsurface: NvBufSurfaceSysToHWCopy: failed in mem copy
CUDA_ERROR_ILLEGAL_ADDRESS
0:09:46.950747120 30752 0x7ffe9c05d720 ERROR nvinfer gstnvinfer.cpp:1192:get_converted_buffer: cudaMemset2DAsync failed with error cudaErrorIllegalAddress while converting buffer
0:09:46.950938567 30752 0x7ffe9c05d720 WARN nvinfer gstnvinfer.cpp:1472:gst_nvinfer_process_full_frame: error: Buffer conversion failed
CUDA_ERROR_ILLEGAL_ADDRESS
CUDA_ERROR_ILLEGAL_ADDRESS
ERROR: nvdsinfer_context_impl.cpp:636 Failed to record batch cuda copy-complete-event, cuda err_no:700, err_str:cudaErrorIllegalAddress
ERROR: nvdsinfer_context_impl.cpp:341 Failed to make stream wait on event, cuda err_no:700, err_str:cudaErrorIllegalAddress
in file yoloPlugins.cpp at line 261
ERROR: nvdsinfer_context_impl.cpp:1619 Preprocessor transform input data failed., nvinfer error:NVDSINFER_CUDA_ERROR
234
0:09:46.951156910 30752 0x7ffd7c07ef20 WARN nvinfer gstnvinfer.cpp:1324:gst_nvinfer_input_queue_loop: error: Failed to queue input batch for inferencing
CUDA_ERROR_ILLEGAL_ADDRESS
ERROR: nvdsinfer_context_impl.cpp:1658 post cuda process failed., nvinfer error:NVDSINFER_CUDA_ERROR
nvbufsurface: NvBufSurfaceSysToHWCopy: failed in mem copy
0:09:46.951211583 30752 0x7ffd7c018c50 WARN nvinfer gstnvinfer.cpp:1324:gst_nvinfer_input_queue_loop: error: Failed to queue input batch for inferencing
nvbufsurface: NvBufSurfaceSysToHWCopy: failed in mem copy
CUDA_ERROR_ILLEGAL_ADDRESS
CUDA_ERROR_ILLEGAL_ADDRESS
an illegal memory access was encountered in file yoloPlugins.cpp at line 229
CUDA_ERROR_ILLEGAL_ADDRESS
CUDA_ERROR_ILLEGAL_ADDRESS
ERROR: nvdsinfer_context_impl.cpp:341 Failed to make stream wait on event, cuda err_no:700, err_str:cudaErrorIllegalAddress
CUDA_ERROR_ILLEGAL_ADDRESS
CUDA_ERROR_ILLEGAL_ADDRESS
CUDA_ERROR_ILLEGAL_ADDRESS
CUDA failure: an illegal memory access was encountered in file yoloPlugins.cpp at line 234
CUDA_ERROR_ILLEGAL_ADDRESS
CUDA_ERROR_ILLEGAL_ADDRESS
ERROR: nvdsinfer_context_impl.cpp:1763 Failed to synchronize on cuda copy-coplete-event, cuda err_no:700, err_str:cudaErrorIllegalAddress
CUDA failure: an illegal memory access was encountered in file yoloPlugins.cpp at line 261
CUDA_ERROR_ILLEGAL_ADDRESS
CUDA_ERROR_ILLEGAL_ADDRESS
CUDA_ERROR_ILLEGAL_ADDRESS
0:09:46.953793231 30752 0x7ffd04043d40 ERROR nvinfer gstnvinfer.cpp:1192:get_converted_buffer: cudaMemset2DAsync failed with error cudaErrorIllegalAddress while converting buffer
0:09:46.954618212 30752 0x7ffd04043d40 WARN nvinfer gstnvinfer.cpp:1472:gst_nvinfer_process_full_frame: error: Buffer conversion failed
ERROR: nvdsinfer_context_impl.cpp:1763 Failed to synchronize on cuda copy-coplete-event, cuda err_no:700, err_str:cudaErrorIllegalAddress
[ ERROR: CUDA Runtime ] an illegal memory access was encountered
ERROR: nvdsinfer_context_impl.cpp:1619 Preprocessor transform input data failed., nvinfer error:NVDSINFER_CUDA_ERROR
ERROR: nvdsinfer_context_impl.cpp:341 Failed to make stream wait on event, cuda err_no:700, err_str:cudaErrorIllegalAddress
0:09:46.959701818 30752 0x7ffe9c01c2d0 WARN nvinfer gstnvinfer.cpp:2325:gst_nvinfer_output_loop: error: Failed to dequeue output from inferencing. NvDsInferContext error: NVDSINFER_CUDA_ERROR
ERROR: nvdsinfer_context_impl.cpp:1619 Preprocessor transform input data failed., nvinfer error:NVDSINFER_CUDA_ERROR
ERROR: nvdsinfer_context_impl.cpp:341 Failed to make stream wait on event, cuda err_no:700, err_str:cudaErrorIllegalAddress
ERROR: nvdsinfer_context_impl.cpp:1619 Preprocessor transform input data failed., nvinfer error:NVDSINFER_CUDA_ERROR
ERROR: nvdsinfer_context_impl.cpp:1763 Failed to synchronize on cuda copy-coplete-event, cuda err_no:700, err_str:cudaErrorIllegalAddress
0:09:46.959837564 30752 0x7ffd7c02b2d0 WARN nvinfer gstnvinfer.cpp:1324:gst_nvinfer_input_queue_loop: error: Failed to queue input batch for inferencing
ERROR: nvdsinfer_context_impl.cpp:1763 Failed to synchronize on cuda copy-coplete-event, cuda err_no:700, err_str:cudaErrorIllegalAddress
0:09:46.960815621 30752 0x7ffd7c07ef70 WARN nvinfer gstnvinfer.cpp:2325:gst_nvinfer_output_loop: error: Failed to dequeue output from inferencing. NvDsInferContext error: NVDSINFER_CUDA_ERROR
ERROR: nvdsinfer_context_impl.cpp:341 Failed to make stream wait on event, cuda err_no:700, err_str:cudaErrorIllegalAddress
ERROR: nvdsinfer_context_impl.cpp:1619 Preprocessor transform input data failed., nvinfer error:NVDSINFER_CUDA_ERROR
0:09:46.961351290 30752 0x7ffe9c01c2d0 WARN nvinfer gstnvinfer.cpp:635:gst_nvinfer_logger: NvDsInferContext[UID 1]: Warning from NvDsInferContextImpl::releaseBatchOutput() <nvdsinfer_context_impl.cpp:1797> [UID = 1]: Tried to release an outputBatchID which is already with the context
0:09:46.963847260 30752 0x7ff8b40312d0 WARN nvinfer gstnvinfer.cpp:1324:gst_nvinfer_input_queue_loop: error: Failed to queue input batch for inferencing
0:09:46.963959287 30752 0x7ffd7c02b320 WARN nvinfer gstnvinfer.cpp:2325:gst_nvinfer_output_loop: error: Failed to dequeue output from inferencing. NvDsInferContext error: NVDSINFER_CUDA_ERROR
Cuda failure: status=700
0:09:46.964016306 30752 0x7ffd7c07ef70 WARN nvinfer gstnvinfer.cpp:635:gst_nvinfer_logger: NvDsInferContext[UID 1]: Warning from NvDsInferContextImpl::releaseBatchOutput() <nvdsinfer_context_impl.cpp:1797> [UID = 1]: Tried to release an outputBatchID which is already with the context
0:09:46.964087206 30752 0x7ffe9c05d450 WARN nvinfer gstnvinfer.cpp:1324:gst_nvinfer_input_queue_loop: error: Failed to queue input batch for inferencing
0:09:46.968565610 30752 0x7ffd7c07ef20 WARN nvinfer gstnvinfer.cpp:1324:gst_nvinfer_input_queue_loop: error: Failed to queue input batch for inferencing
0:09:46.964121969 30752 0x7ffd7c02b320 WARN nvinfer gstnvinfer.cpp:635:gst_nvinfer_logger: NvDsInferContext[UID 1]: Warning from NvDsInferContextImpl::releaseBatchOutput() <nvdsinfer_context_impl.cpp:1797> [UID = 1]: Tried to release an outputBatchID which is already with the context
ERROR: nvdsinfer_context_impl.cpp:341 Failed to make stream wait on event, cuda err_no:700, err_str:cudaErrorIllegalAddress
ERROR: nvdsinfer_context_impl.cpp:1619 Preprocessor transform input data failed., nvinfer error:NVDSINFER_CUDA_ERROR
0:09:46.969084826 30752 0x7ffd7c07ef20 WARN nvinfer gstnvinfer.cpp:1324:gst_nvinfer_input_queue_loop: error: Failed to queue input batch for inferencing
ERROR: nvdsinfer_context_impl.cpp:341 Failed to make stream wait on event, cuda err_no:700, err_str:cudaErrorIllegalAddress
ERROR: nvdsinfer_context_impl.cpp:1619 Preprocessor transform input data failed., nvinfer error:NVDSINFER_CUDA_ERROR
0:09:46.974592641 30752 0x7ffe9c05d4a0 WARN nvinfer gstnvinfer.cpp:2325:gst_nvinfer_output_loop: error: Failed to dequeue output from inferencing. NvDsInferContext error: NVDSINFER_CUDA_ERROR
0:09:46.975133265 30752 0x7ffe9c05d4a0 WARN nvinfer gstnvinfer.cpp:635:gst_nvinfer_logger: NvDsInferContext[UID 1]: Warning from NvDsInferContextImpl::releaseBatchOutput() <nvdsinfer_context_impl.cpp:1797> [UID = 1]: Tried to release an outputBatchID which is already with the context
0:09:46.976389671 30752 0x7ffe9c05d450 WARN nvinfer gstnvinfer.cpp:1324:gst_nvinfer_input_queue_loop: error: Failed to queue input batch for inferencing
ERROR: nvdsinfer_context_impl.cpp:341 Failed to make stream wait on event, cuda err_no:700, err_str:cudaErrorIllegalAddress
ERROR: nvdsinfer_context_impl.cpp:1619 Preprocessor transform input data failed., nvinfer error:NVDSINFER_CUDA_ERROR
0:09:46.976494905 30752 0x7ffe9c05d450 WARN nvinfer gstnvinfer.cpp:1324:gst_nvinfer_input_queue_loop: error: Failed to queue input batch for inferencing
ERROR: [TRT]: 1: [pointWiseV2Helpers.h::launchPwgenKernel::532] Error Code 1: Cuda Driver (an illegal memory access was encountered)
ERROR: nvdsinfer_backend.cpp:506 Failed to enqueue trt inference batch
ERROR: nvdsinfer_context_impl.cpp:1643 Infer context enqueue buffer failed, nvinfer error:NVDSINFER_TENSORRT_ERROR
ERROR: [TRT]: 1: Unexpected exception CUDA_ERROR_ILLEGAL_ADDRESS
ERROR: [TRT]: 1: Unexpected exception CUDA_ERROR_ILLEGAL_ADDRESS
ERROR: [TRT]: 1: Unexpected exception CUDA_ERROR_ILLEGAL_ADDRESS
ERROR: [TRT]: 1: Unexpected exception CUDA_ERROR_ILLEGAL_ADDRESS
0:09:46.993861446 30752 0x7ffc1c032850 WARN nvinfer gstnvinfer.cpp:1324:gst_nvinfer_input_queue_loop: error: Failed to queue input batch for inferencing
ERROR: nvdsinfer_backend.cpp:506 Failed to enqueue trt inference batch
ERROR: [TRT]: 1: Unexpected exception CUDA_ERROR_ILLEGAL_ADDRESS
ERROR: nvdsinfer_backend.cpp:506 Failed to enqueue trt inference batch
ERROR: [TRT]: 1: Unexpected exception CUDA_ERROR_ILLEGAL_ADDRESS
ERROR: nvdsinfer_backend.cpp:506 Failed to enqueue trt inference batch
ERROR: nvdsinfer_context_impl.cpp:1643 Infer context enqueue buffer failed, nvinfer error:NVDSINFER_TENSORRT_ERROR
ERROR: [TRT]: 1: Unexpected exception CUDA_ERROR_ILLEGAL_ADDRESS
ERROR: nvdsinfer_backend.cpp:506 Failed to enqueue trt inference batch
0:09:46.994036495 30752 0x7ffe9c017850 WARN nvinfer gstnvinfer.cpp:1324:gst_nvinfer_input_queue_loop: error: Failed to queue input batch for inferencing
ERROR: nvdsinfer_context_impl.cpp:1643 Infer context enqueue buffer failed, nvinfer error:NVDSINFER_TENSORRT_ERROR
ERROR: [TRT]: 1: [reformat.cpp::executeCutensor::385] Error Code 1: CuTensor (Internal cuTensor permutate execute failed)
0:09:46.994088868 30752 0x7ffc74021720 WARN nvinfer gstnvinfer.cpp:1324:gst_nvinfer_input_queue_loop: error: Failed to queue input batch for inferencing
ERROR: nvdsinfer_backend.cpp:506 Failed to enqueue trt inference batch
ERROR: [TRT]: 1: [apiCheck.cpp::apiCatchCudaError::17] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
ERROR: nvdsinfer_context_impl.cpp:1643 Infer context enqueue buffer failed, nvinfer error:NVDSINFER_TENSORRT_ERROR
ERROR: nvdsinfer_backend.cpp:506 Failed to enqueue trt inference batch
ERROR: [TRT]: 1: Unexpected exception CUDA_ERROR_ILLEGAL_ADDRESS
ERROR: nvdsinfer_context_impl.cpp:1643 Infer context enqueue buffer failed, nvinfer error:NVDSINFER_TENSORRT_ERROR
0:09:46.994146035 30752 0x7ff9dc09eed0 WARN nvinfer gstnvinfer.cpp:1324:gst_nvinfer_input_queue_loop: error: Failed to queue input batch for inferencing
ERROR: nvdsinfer_context_impl.cpp:341 Failed to make stream wait on event, cuda err_no:700, err_str:cudaErrorIllegalAddress
0:09:46.994192859 30752 0x7ff90400a280 WARN nvinfer gstnvinfer.cpp:1324:gst_nvinfer_input_queue_loop: error: Failed to queue input batch for inferencing
ERROR: nvdsinfer_context_impl.cpp:1619 Preprocessor transform input data failed., nvinfer error:NVDSINFER_CUDA_ERROR
0:09:46.994234185 30752 0x7ffc1c032850 WARN nvinfer gstnvinfer.cpp:1324:gst_nvinfer_input_queue_loop: error: Failed to queue input batch for inferencing
ERROR: [TRT]: 1: Unexpected exception CUDA_ERROR_ILLEGAL_ADDRESS
ERROR: nvdsinfer_backend.cpp:506 Failed to enqueue trt inference batch
ERROR: nvdsinfer_context_impl.cpp:341 Failed to make stream wait on event, cuda err_no:700, err_str:cudaErrorIllegalAddress
ERROR: nvdsinfer_context_impl.cpp:1643 Infer context enqueue buffer failed, nvinfer error:NVDSINFER_TENSORRT_ERROR
ERROR: nvdsinfer_context_impl.cpp:341 Failed to make stream wait on event, cuda err_no:700, err_str:cudaErrorIllegalAddress
ERROR: nvdsinfer_backend.cpp:506 Failed to enqueue trt inference batch
ERROR: [TRT]: 1: Unexpected exception CUDA_ERROR_ILLEGAL_ADDRESS
0:09:46.994326952 30752 0x7ffd7c07ed90 WARN nvinfer gstnvinfer.cpp:1324:gst_nvinfer_input_queue_loop: error: Failed to queue input batch for inferencing
ERROR: nvdsinfer_context_impl.cpp:341 Failed to make stream wait on event, cuda err_no:700, err_str:cudaErrorIllegalAddress
ERROR: nvdsinfer_context_impl.cpp:1619 Preprocessor transform input data failed., nvinfer error:NVDSINFER_CUDA_ERROR
0:09:46.994395920 30752 0x7ffc74021720 WARN nvinfer gstnvinfer.cpp:1324:gst_nvinfer_input_queue_loop: error: Failed to queue input batch for inferencing
ERROR: nvdsinfer_context_impl.cpp:1643 Infer context enqueue buffer failed, nvinfer error:NVDSINFER_TENSORRT_ERROR
ERROR: [TRT]: 1: Unexpected exception CUDA_ERROR_ILLEGAL_ADDRESS
0:09:46.994434956 30752 0x7ff9dc0188a0 WARN nvinfer gstnvinfer.cpp:1324:gst_nvinfer_input_queue_loop: error: Failed to queue input batch for inferencing
ERROR: nvdsinfer_backend.cpp:506 Failed to enqueue trt inference batch
ERROR: nvdsinfer_context_impl.cpp:1643 Infer context enqueue buffer failed, nvinfer error:NVDSINFER_TENSORRT_ERROR
ERROR: [TRT]: 1: [hardwareContext.cpp::configure::92] Error Code 1: Cudnn (CUDNN_STATUS_INTERNAL_ERROR)
0:09:46.994494845 30752 0x7ffb6c079a80 WARN nvinfer gstnvinfer.cpp:1324:gst_nvinfer_input_queue_loop: error: Failed to queue input batch for inferencing
ERROR: nvdsinfer_context_impl.cpp:1619 Preprocessor transform input data failed., nvinfer error:NVDSINFER_CUDA_ERROR
0:09:46.994552063 30752 0x7ff90400a280 WARN nvinfer gstnvinfer.cpp:1324:gst_nvinfer_input_queue_loop: error: Failed to queue input batch for inferencing
ERROR: nvdsinfer_context_impl.cpp:1619 Preprocessor transform input data failed., nvinfer error:NVDSINFER_CUDA_ERROR
ERROR: nvdsinfer_context_impl.cpp:341 Failed to make stream wait on event, cuda err_no:700, err_str:cudaErrorIllegalAddress
0:09:46.994588477 30752 0x7ffe9c017850 WARN nvinfer gstnvinfer.cpp:1324:gst_nvinfer_input_queue_loop: error: Failed to queue input batch for inferencing
ERROR: nvdsinfer_context_impl.cpp:1619 Preprocessor transform input data failed., nvinfer error:NVDSINFER_CUDA_ERROR
ERROR: nvdsinfer_context_impl.cpp:341 Failed to make stream wait on event, cuda err_no:700, err_str:cudaErrorIllegalAddress
ERROR: nvdsinfer_context_impl.cpp:1619 Preprocessor transform input data failed., nvinfer error:NVDSINFER_CUDA_ERROR
0:09:46.994648045 30752 0x7ffb6c079a80 WARN nvinfer gstnvinfer.cpp:1324:gst_nvinfer_input_queue_loop: error: Failed to queue input batch for inferencing
ERROR: nvdsinfer_backend.cpp:506 Failed to enqueue trt inference batch
ERROR: nvdsinfer_context_impl.cpp:1643 Infer context enqueue buffer failed, nvinfer error:NVDSINFER_TENSORRT_ERROR
ERROR: nvdsinfer_context_impl.cpp:341 Failed to make stream wait on event, cuda err_no:700, err_str:cudaErrorIllegalAddress
0:09:46.994730436 30752 0x7ff848086450 WARN nvinfer gstnvinfer.cpp:1324:gst_nvinfer_input_queue_loop: error: Failed to queue input batch for inferencing
ERROR: nvdsinfer_context_impl.cpp:1619 Preprocessor transform input data failed., nvinfer error:NVDSINFER_CUDA_ERROR
ERROR: nvdsinfer_backend.cpp:506 Failed to enqueue trt inference batch
0:09:46.994786832 30752 0x7ffd7c07ed90 WARN nvinfer gstnvinfer.cpp:1324:gst_nvinfer_input_queue_loop: error: Failed to queue input batch for inferencing
ERROR: nvdsinfer_context_impl.cpp:341 Failed to make stream wait on event, cuda err_no:700, err_str:cudaErrorIllegalAddress
ERROR: nvdsinfer_context_impl.cpp:1619 Preprocessor transform input data failed., nvinfer error:NVDSINFER_CUDA_ERROR
0:09:46.994842873 30752 0x7ff848086450 WARN nvinfer gstnvinfer.cpp:1324:gst_nvinfer_input_queue_loop: error: Failed to queue input batch for inferencing
ERROR: nvdsinfer_backend.cpp:506 Failed to enqueue trt inference batch
ERROR: nvdsinfer_context_impl.cpp:1643 Infer context enqueue buffer failed, nvinfer error:NVDSINFER_TENSORRT_ERROR
ERROR: nvdsinfer_context_impl.cpp:1643 Infer context enqueue buffer failed, nvinfer error:NVDSINFER_TENSORRT_ERROR
0:09:46.994911191 30752 0x7ffa94016000 WARN nvinfer gstnvinfer.cpp:1324:gst_nvinfer_input_queue_loop: error: Failed to queue input batch for inferencing
ERROR: nvdsinfer_context_impl.cpp:1643 Infer context enqueue buffer failed, nvinfer error:NVDSINFER_TENSORRT_ERROR
0:09:46.994931300 30752 0x7ffd7c0188f0 WARN nvinfer gstnvinfer.cpp:1324:gst_nvinfer_input_queue_loop: error: Failed to queue input batch for inferencing
0:09:46.994964900 30752 0x7ff9dc018a30 WARN nvinfer gstnvinfer.cpp:1324:gst_nvinfer_input_queue_loop: error: Failed to queue input batch for inferencing
ERROR: nvdsinfer_backend.cpp:506 Failed to enqueue trt inference batch
ERROR: nvdsinfer_context_impl.cpp:341 Failed to make stream wait on event, cuda err_no:700, err_str:cudaErrorIllegalAddress
ERROR: nvdsinfer_context_impl.cpp:1619 Preprocessor transform input data failed., nvinfer error:NVDSINFER_CUDA_ERROR
0:09:46.995033947 30752 0x7ffa94016000 WARN nvinfer gstnvinfer.cpp:1324:gst_nvinfer_input_queue_loop: error: Failed to queue input batch for inferencing
ERROR: nvdsinfer_context_impl.cpp:341 Failed to make stream wait on event, cuda err_no:700, err_str:cudaErrorIllegalAddress
ERROR: nvdsinfer_context_impl.cpp:1619 Preprocessor transform input data failed., nvinfer error:NVDSINFER_CUDA_ERROR
ERROR: nvdsinfer_context_impl.cpp:341 Failed to make stream wait on event, cuda err_no:700, err_str:cudaErrorIllegalAddress
0:09:46.995072393 30752 0x7ff9dc0188a0 WARN nvinfer gstnvinfer.cpp:1324:gst_nvinfer_input_queue_loop: error: Failed to queue input batch for inferencing
ERROR: nvdsinfer_context_impl.cpp:1619 Preprocessor transform input data failed., nvinfer error:NVDSINFER_CUDA_ERROR
ERROR: nvdsinfer_context_impl.cpp:341 Failed to make stream wait on event, cuda err_no:700, err_str:cudaErrorIllegalAddress
0:09:46.995110341 30752 0x7ffd7c07ed90 WARN nvinfer gstnvinfer.cpp:1324:gst_nvinfer_input_queue_loop: error: Failed to queue input batch for inferencing
ERROR: nvdsinfer_context_impl.cpp:1619 Preprocessor transform input data failed., nvinfer error:NVDSINFER_CUDA_ERROR
0:09:46.995158783 30752 0x7ff90400a280 WARN nvinfer gstnvinfer.cpp:1324:gst_nvinfer_input_queue_loop: error: Failed to queue input batch for inferencing
ERROR: nvdsinfer_context_impl.cpp:341 Failed to make stream wait on event, cuda err_no:700, err_str:cudaErrorIllegalAddress
ERROR: nvdsinfer_context_impl.cpp:341 Failed to make stream wait on event, cuda err_no:700, err_str:cudaErrorIllegalAddress
ERROR: nvdsinfer_context_impl.cpp:1643 Infer context enqueue buffer failed, nvinfer error:NVDSINFER_TENSORRT_ERROR
ERROR: nvdsinfer_context_impl.cpp:341 Failed to make stream wait on event, cuda err_no:700, err_str:cudaErrorIllegalAddress
ERROR: nvdsinfer_context_impl.cpp:341 Failed to make stream wait on event, cuda err_no:700, err_str:cudaErrorIllegalAddress
0:09:46.995231552 30752 0x7ff98c0b2590 WARN nvinfer gstnvinfer.cpp:1324:gst_nvinfer_input_queue_loop: error: Failed to queue input batch for inferencing
ERROR: nvdsinfer_context_impl.cpp:341 Failed to make stream wait on event, cuda err_no:700, err_str:cudaErrorIllegalAddress
ERROR: nvdsinfer_context_impl.cpp:1619 Preprocessor transform input data failed., nvinfer error:NVDSINFER_CUDA_ERROR
ERROR: nvdsinfer_context_impl.cpp:1643 Infer context enqueue buffer failed, nvinfer error:NVDSINFER_TENSORRT_ERROR
0:09:46.995272890 30752 0x7ff848086450 WARN nvinfer gstnvinfer.cpp:1324:gst_nvinfer_input_queue_loop: error: Failed to queue input batch for inferencing
ERROR: nvdsinfer_context_impl.cpp:1619 Preprocessor transform input data failed., nvinfer error:NVDSINFER_CUDA_ERROR
0:09:46.995301953 30752 0x7ffa9403c0f0 WARN nvinfer gstnvinfer.cpp:1324:gst_nvinfer_input_queue_loop: error: Failed to queue input batch for inferencing
0:09:46.995313767 30752 0x7ff9dc09eed0 WARN nvinfer gstnvinfer.cpp:1324:gst_nvinfer_input_queue_loop: error: Failed to queue input batch for inferencing
ERROR: nvdsinfer_context_impl.cpp:1619 Preprocessor transform input data failed., nvinfer error:NVDSINFER_CUDA_ERROR
0:09:46.995349434 30752 0x7ff90400a280 WARN nvinfer gstnvinfer.cpp:1324:gst_nvinfer_input_queue_loop: error: Failed to queue input batch for inferencing
ERROR: nvdsinfer_context_impl.cpp:1619 Preprocessor transform input data failed., nvinfer error:NVDSINFER_CUDA_ERROR
0:09:46.995377450 30752 0x7ffa94016000 WARN nvinfer gstnvinfer.cpp:1324:gst_nvinfer_input_queue_loop: error: Failed to queue input batch for inferencing
ERROR: nvdsinfer_context_impl.cpp:1619 Preprocessor transform input data failed., nvinfer error:NVDSINFER_CUDA_ERROR
0:09:46.995409319 30752 0x7ff9dc018a30 WARN nvinfer gstnvinfer.cpp:1324:gst_nvinfer_input_queue_loop: error: Failed to queue input batch for inferencing
ERROR: nvdsinfer_context_impl.cpp:341 Failed to make stream wait on event, cuda err_no:700, err_str:cudaErrorIllegalAddress
ERROR: nvdsinfer_context_impl.cpp:1619 Preprocessor transform input data failed., nvinfer error:NVDSINFER_CUDA_ERROR
ERROR: nvdsinfer_context_impl.cpp:341 Failed to make stream wait on event, cuda err_no:700, err_str:cudaErrorIllegalAddress
0:09:46.995464196 30752 0x7ff9dc0188a0 WARN nvinfer gstnvinfer.cpp:1324:gst_nvinfer_input_queue_loop: error: Failed to queue input batch for inferencing
ERROR: nvdsinfer_context_impl.cpp:1619 Preprocessor transform input data failed., nvinfer error:NVDSINFER_CUDA_ERROR
ERROR: nvdsinfer_context_impl.cpp:341 Failed to make stream wait on event, cuda err_no:700, err_str:cudaErrorIllegalAddress
0:09:46.995498178 30752 0x7ff98c0b2590 WARN nvinfer gstnvinfer.cpp:1324:gst_nvinfer_input_queue_loop: error: Failed to queue input batch for inferencing
ERROR: nvdsinfer_context_impl.cpp:1619 Preprocessor transform input data failed., nvinfer error:NVDSINFER_CUDA_ERROR
ERROR: nvdsinfer_context_impl.cpp:341 Failed to make stream wait on event, cuda err_no:700, err_str:cudaErrorIllegalAddress
0:09:46.995541242 30752 0x7ff9dc018a30 WARN nvinfer gstnvinfer.cpp:1324:gst_nvinfer_input_queue_loop: error: Failed to queue input batch for inferencing
ERROR: nvdsinfer_context_impl.cpp:341 Failed to make stream wait on event, cuda err_no:700, err_str:cudaErrorIllegalAddress
ERROR: nvdsinfer_context_impl.cpp:1619 Preprocessor transform input data failed., nvinfer error:NVDSINFER_CUDA_ERROR
ERROR: nvdsinfer_context_impl.cpp:1619 Preprocessor transform input data failed., nvinfer error:NVDSINFER_CUDA_ERROR
0:09:46.995584136 30752 0x7ff98c0b2590 WARN nvinfer gstnvinfer.cpp:1324:gst_nvinfer_input_queue_loop: error: Failed to queue input batch for inferencing
0:09:46.995603543 30752 0x7ffa9403c0f0 WARN nvinfer gstnvinfer.cpp:1324:gst_nvinfer_input_queue_loop: error: Failed to queue input batch for inferencing
ERROR: nvdsinfer_context_impl.cpp:341 Failed to make stream wait on event, cuda err_no:700, err_str:cudaErrorIllegalAddress
ERROR: nvdsinfer_context_impl.cpp:1619 Preprocessor transform input data failed., nvinfer error:NVDSINFER_CUDA_ERROR
0:09:46.995671534 30752 0x7ffa9403c0f0 WARN nvinfer gstnvinfer.cpp:1324:gst_nvinfer_input_queue_loop: error: Failed to queue input batch for inferencing

error7:
#0 0x00007fffd0afdc95 in ?? ()
#1 0x00007fffc00ca170 in ?? ()
#2 0x00007ff4b4712270 in ?? ()
#3 0x0000000000000000 in ?? ()

Second question:

What I mean is that I have internally implemented a monitoring thread to monitor if the pipeline is operating properly. ‘Cannot pull the stream’ means that the pipeline is not functioning correctly, indicating a possible issue with retrieving the video stream! The monitoring principle is similar to the implementation mechanism of a watchdog. Once the pipeline is not operating properly, I will close the previous pipeline and establish a new one. The new pipeline will still use the same video stream, identified by the same stream_id, to ensure that the GPU plugin in this pipeline is assigned to the previously allocated GPU.

I have noticed a recurring pattern before encountering errors. Certain pipelines exhibit abnormalities, and due to the monitoring measures in place, these pipelines are rebuilt. However, after repeating the pipeline rebuilding operation several times or even a dozen times, the program crashes.

Third question:

When I tested with 44 pipelines, removing my custom plugin from the pipeline to ensure that only official plugins were used, there were no issues observed. There were no problems or abnormalities in the pipelines that would require rebuilding them.

Additional information:

When I was still using the custom plugins, I conducted a test by creating only four pipelines and assigning them to four different GPUs. In this case, there were no pipeline abnormalities that required rebuilding, and as a result, the program did not experience any crashes.

I wonder if it is a CPU scheduling issue that leads to abnormalities in the pipeline. Consequently, my program detects pipeline anomalies and rebuilds the problematic pipelines. However, it is the repeated execution of this operation that eventually causes the program to crash. I cannot fully comprehend it. To help you understand how I monitor and rebuild the pipeline, below are the two functions I use for monitoring and rebuilding the pipeline:

void decodeBin::start()
{
GstStateChangeReturn ret = gst_element_set_state(pipeline_, GST_STATE_PLAYING);

if (ret != GST_STATE_CHANGE_SUCCESS)
{
    g_print("Start %s 's pipeline wrong! ---> %d", this->channel_name_.c_str(), (int)ret);
    // return;
}

decodeProtery->feed();

cur_state = NetState::ns_normal;

is_stop_ = false;

std::thread t([this]() mutable
{
    while (!this->is_stop_)
    {

        bool has_feed = this->decodeProtery->is_feed();

        // bool has_feed = true;

        std::cout << "Has feed ? " << (has_feed ? "YES" : "No") << std::endl;

        if (!has_feed && cur_state != NetState::ns_null)
        {
            cur_state = (cur_state == NetState::ns_normal) ? NetState::ns_warn : NetState::ns_error;
            if (cur_state == NetState::ns_error)
            {
                cur_state = NetState::ns_normal;
                this->reconnect();
            }
        }
        else if(cur_state != NetState::ns_null)
        {
            cur_state = NetState::ns_normal;
        }
        sleep(watch_time);
    } });
t.detach();

}

void decodeBin::reconnect()
{
// GstElement *srcpad = gst_element_get_static_pad(cur_effect, “src”);

std::cout << "Reconnect" << std::endl;

GstElement *newBin = create_source_bin(&this->m_int_gpu_id_, this->uri_);

gst_element_set_state(this->uridecodebin, GST_STATE_NULL);

gst_bin_remove(GST_BIN(this->bin), this->uridecodebin);

gst_element_set_state(newBin, GST_STATE_PLAYING);

if (!gst_bin_add(GST_BIN(this->bin), newBin))
{
    g_print("Add Element to the pipeline fail ! \n");
}

// if (!gst_element_link_many(newBin, this->que, NULL))
// {
// g_print(“link Decoder and tee fail ! \n”);
// }

if (!gst_element_link_many(newBin, this->tee, NULL))
{
    g_print("link Decoder and tee fail ! \n");
}



std::swap(newBin, this->uridecodebin);

}

Have you checked the memory usage when there are 44 pipelines? Is there the kernel log when crash happened?

To identify whether the issue is related to DeepStream APIs or your own plugin, please try the 44 pipelines without your customized plugin, we need to minimize the scope first.

Have you checked the memory usage when there are 44 pipelines? Is there the kernel log when crash happened?

When using 44 pipelines, I checked the memory usage, and my server has a memory capacity of 256GB. At the time of the crash, the memory usage was 125.6GB.

To identify whether the issue is related to DeepStream APIs or your own plugin, please try the 44 pipelines without your customized plugin, we need to minimize the scope first.

I have previously mentioned that when I remove my own plugin from the pipeline and only use official plugins, there are no pipeline exceptions and no need to rebuild the pipeline.
Additionally, the program does not crash. When I assign GPUs to 4 windows and run 11 pipelines in each window, I also don’t experience these issues, and the program doesn’t crash.
Similarly, when I assign GPUs to 4 windows and run only 1 pipeline in each window, I don’t encounter these issues either, and the program doesn’t crash.
It is worth noting that the reconstruction of the pipeline does not cause an increase in GPU memory usage, so I don’t understand the cause of these problems.

In which circumstance the error will happen?

These three situations do not result in a crash. The first situation refers to the program allocating GPUs itself, using 44 pipelines, and removing custom plugins from the pipelines, only using official plugins. The second and third situations also do not result in a crash, and there is no need to elaborate further. The crash occurs when the pipeline uses custom plugins and the program assigns GPUs to the pipelines. After all 44 pipelines are set up, they run for a while, and then some of the pipelines are rebuilt. After multiple rebuilds, the program crashes. The code for rebuilding the pipeline is right above. It involves stopping the pipeline and replacing the uridecodebin element instead of completely destroying and rebuilding the pipeline.