Gst_element_set_state() is hanging while setting the element to NULL

Please provide complete information as applicable to your setup.

• Hardware Platform ( Tesla T4)
• CUDA Version: 11.4
• Ubuntu Version: 18.04
• DeepStream Version: 6.0.1
• gstreamer Version: 1.14.5
• TensorRT Version: 8.0.1
• NVIDIA GPU Driver Version: 470.82.01
• Issue Type(bugs)
**• opencv version : 4.6.0 **

I have a program that want to dynamic add and delete sources, the pipeline just like below:

nvurisrcbin ! nvstreammux ! queue ! nvpreprocess ! nvinfer ! queue ! nvtracker ! queue ! tee ! queue ! nvmultistreamtiler ! nvvideoconvert ! nvdsosd ! nvvideoconvert ! fakesink

and, I have referenced the demo: https://github.com/NVIDIA-AI-IOT/deepstream_reference_apps/blob/master/runtime_source_add_delete/deepstream_test_rt_src_add_del.c

some code like this:

    while (1) {
        for(size_t i = 0; i < 30; i++) {
            addDeviceScheduler(ctx, (char*)std::to_string(i).c_str(), "rtsp://admin:123456@192.168.26.201/mpeg4");
            sleep(5);
        }

        for(size_t i = 0; i < 30; i++) {
            deleteDeviceScheduler(ctx, (char*)std::to_string(i).c_str());
            sleep(5);
        }
        sleep(2);
    }
bool NVGstPipeline::remove_source(guint index, std::string url)
{
    LOG(INFO) << "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA";
    g_mutex_lock (&app_ctx.lock);
    GstElement *src_bin = NULL;
    GstElement *streammux = NULL;
    GstPad *pad = NULL;
    gchar bin_name[NAME_LENGTH] = {};
    gchar pad_name[NAME_LENGTH] = {};
    GstStateChangeReturn state_ret;

    LOG(INFO) << "BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB";
    //根据index获取source
    g_snprintf(bin_name, NAME_LENGTH, SOURCE_BIN_NAME_V, index);
    src_bin = gst_bin_get_by_name(GST_BIN(app_ctx.pipeline), bin_name);
    LOG(INFO) << "CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC";
    //获取streammux
    streammux = gst_bin_get_by_name(GST_BIN(app_ctx.pipeline), STREAM_MUXER_NAME);
    if (!src_bin || !streammux) {
        LOG(ERROR) << "Can't find some element when deleting source.";
        g_mutex_unlock (&app_ctx.lock);
        return false;
    }
    LOG(INFO) << "DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD";

    gst_element_send_event(src_bin, gst_event_new_eos());
    gst_element_send_event(src_bin, gst_event_new_flush_stop (FALSE));
    usleep(50000);
    state_ret = gst_element_set_state(src_bin, GST_STATE_NULL);
    LOG(INFO) << "EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE";
    switch(state_ret) {
    case GST_STATE_CHANGE_ASYNC:
        gst_element_get_state(src_bin, NULL, NULL, GST_CLOCK_TIME_NONE);
        LOG(INFO) << "FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF";
    case GST_STATE_CHANGE_SUCCESS:
        g_snprintf(pad_name, NAME_LENGTH, "sink_%u", index);
        pad = gst_element_get_static_pad(streammux, pad_name);
        LOG(INFO) << "GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG";
        //eos 会卡住,释放不掉,原因pipeline状态未设置NULL
        if(pad) {
            gst_pad_send_event(pad, gst_event_new_eos());
            LOG(INFO) << "HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH";
            gst_pad_send_event(pad, gst_event_new_flush_stop (FALSE));
            LOG(INFO) << "IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII";
            gst_element_release_request_pad(streammux, pad);
            LOG(INFO) << "JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ";
            gst_object_unref(pad);
            LOG(INFO) << "KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKK";
            usleep(500);
            gst_bin_remove(GST_BIN(app_ctx.pipeline), src_bin);
            LOG(INFO) << "SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSS";
            gst_object_unref(src_bin);
            LOG(INFO) << "TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT";
        }
        break;
    case GST_STATE_CHANGE_FAILURE:
        LOG(ERROR) << "STATE CHANGE FAILURE";
        break;
    case GST_STATE_CHANGE_NO_PREROLL:
        LOG(ERROR) << "STATE CHANGE NO PREROLL";
        break;
    default:
        LOG(ERROR) << "UNKNOWED";
        break;
    }

    app_ctx.source_bin[index].index = -1;
    app_ctx.source_bin[index].url.clear();
    app_ctx.source_bin[index].device_id.clear();
    app_ctx.source_bin[index].srcBin = NULL;
    app_ctx.source_bin[index].alive = FALSE;
    app_ctx.source_bin[index].reconfiguring = FALSE;
    app_ctx.connect_count --;
    LOG(INFO) << "Source removed, index = " << index << ", url = " << url;

    g_mutex_unlock (&app_ctx.lock);
    return true;
}

when I dynamic delete the source ,it would stuck on line : state_ret = gst_element_set_state(src_bin, GST_STATE_NULL);

logs:

0:03:25.919650477 30071 0x7f2c680026d0 WARN                    v4l2 gstv4l2object.c:3051:gst_v4l2_object_get_nearest_size:<nvv4l2decoder29:sink> Unable to try format: Unknown error -1
0:03:25.919665162 30071 0x7f2c680026d0 WARN                    v4l2 gstv4l2object.c:2937:gst_v4l2_object_probe_caps_for_format:<nvv4l2decoder29:sink> Could not probe minimum capture size for pixelformat VP80
0:03:25.919679666 30071 0x7f2c680026d0 WARN                    v4l2 gstv4l2object.c:3051:gst_v4l2_object_get_nearest_size:<nvv4l2decoder29:sink> Unable to try format: Unknown error -1
0:03:25.919694451 30071 0x7f2c680026d0 WARN                    v4l2 gstv4l2object.c:2943:gst_v4l2_object_probe_caps_for_format:<nvv4l2decoder29:sink> Could not probe maximum capture size for pixelformat VP80
0:03:25.919725768 30071 0x7f2c680026d0 WARN                    v4l2 gstv4l2object.c:3051:gst_v4l2_object_get_nearest_size:<nvv4l2decoder29:sink> Unable to try format: Unknown error -1
0:03:25.919740877 30071 0x7f2c680026d0 WARN                    v4l2 gstv4l2object.c:2937:gst_v4l2_object_probe_caps_for_format:<nvv4l2decoder29:sink> Could not probe minimum capture size for pixelformat H264
0:03:25.919755280 30071 0x7f2c680026d0 WARN                    v4l2 gstv4l2object.c:3051:gst_v4l2_object_get_nearest_size:<nvv4l2decoder29:sink> Unable to try format: Unknown error -1
0:03:25.919770400 30071 0x7f2c680026d0 WARN                    v4l2 gstv4l2object.c:2943:gst_v4l2_object_probe_caps_for_format:<nvv4l2decoder29:sink> Could not probe maximum capture size for pixelformat H264
0:03:25.919862135 30071 0x7f2c680026d0 WARN                    v4l2 gstv4l2object.c:3051:gst_v4l2_object_get_nearest_size:<nvv4l2decoder29:src> Unable to try format: Unknown error -1
0:03:25.919878643 30071 0x7f2c680026d0 WARN                    v4l2 gstv4l2object.c:2937:gst_v4l2_object_probe_caps_for_format:<nvv4l2decoder29:src> Could not probe minimum capture size for pixelformat NM12
0:03:25.919893453 30071 0x7f2c680026d0 WARN                    v4l2 gstv4l2object.c:3051:gst_v4l2_object_get_nearest_size:<nvv4l2decoder29:src> Unable to try format: Unknown error -1
0:03:25.919908025 30071 0x7f2c680026d0 WARN                    v4l2 gstv4l2object.c:2943:gst_v4l2_object_probe_caps_for_format:<nvv4l2decoder29:src> Could not probe maximum capture size for pixelformat NM12
0:03:25.919931141 30071 0x7f2c680026d0 WARN                    v4l2 gstv4l2object.c:2388:gst_v4l2_object_add_interlace_mode:0x7f2c58080020 Failed to determine interlace mode
0:03:26.244276652 30071 0x7f2c680026d0 WARN            v4l2videodec gstv4l2videodec.c:1685:gst_v4l2_video_dec_decide_allocation:<nvv4l2decoder29> Duration invalid, not setting latency
0:03:26.247351155 30071 0x7f2c680026d0 WARN          v4l2bufferpool gstv4l2bufferpool.c:1065:gst_v4l2_buffer_pool_start:<nvv4l2decoder29:pool:src> Uncertain or not enough buffers, enabling copy threshold
0:03:26.252909052 30071 0x7f2c580034a0 WARN          v4l2bufferpool gstv4l2bufferpool.c:1512:gst_v4l2_buffer_pool_dqbuf:<nvv4l2decoder29:pool:src> Driver should never set v4l2_buffer.field to ANY
I0809 13:57:33.946566 30071 captureenginescheduler.cpp:212] Enter deleteDeviceScheduler()
I0809 13:57:33.946648 30071 captureenginecontext.cpp:152] 1111111111111111111111111111111111111111
I0809 13:57:33.946681 30071 nvgstpipeline.cpp:469] AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
I0809 13:57:33.946694 30071 nvgstpipeline.cpp:478] BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
I0809 13:57:33.947454 30071 nvgstpipeline.cpp:482] CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
I0809 13:57:33.947772 30071 nvgstpipeline.cpp:490] DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD
0:03:54.356493688 30071 0x7f34bc02b630 WARN                 rtspsrc gstrtspsrc.c:3155:on_timeout:<src> source e4faeeec, stream e4faeeec in session 0 timed out

Can you provide a simple sample to reproduce the hang based on our demo?

ok. I will try to reproduce this problem, and I found some topics met the same problem like me, I don’t know if this is the same bug.

Your deepstream version is too old. We may fix some hang problems on the latest version. Could you update the DeepStream to the latest version first?

Hi, I want to deploy on Tesla P4, and referenced to the page:
https://docs.nvidia.com/metropolis/deepstream/dev-guide/text/DS_Quickstart.html
we don’t know if is compatible.

OK. Since you have T4 server, you can try to reproduce that with the latest DeepStream version on T4 first. Thanks

I have the same problem, mentioned it here:

function gst_element_set_state sometimes hangs when source bin is being set to GST_STATE_NULL

I have tried DeepStream versions 5.1 → 6.2

1 Like

hi, I have test app with docker images: nvcr.io/nvidia/deepstream:6.3-gc-triton-devel,
• Hardware Platform ( Tesla T4)
• Ubuntu Version: 18.04
• NVIDIA GPU Driver Version: 530.30.02

now , i met some problems. when I tested with one GPU: CUDA_VISIBLE_DEVICES=0 ./app, it seems work ok,
but when I used 3 GPUS, one GPU can use, but the other 2 gpus cause error: nvvideoconvert gstnvvideoconvert.c:4095:gst_nvvideoconvert_transform: buffer transform failed

I have create 3 pipelines, one pipeline per GPU, every pipeline like this:
nvurisrcbin ! nvstreammux ! queue ! nvpreprocess ! nvinfer ! queue ! nvtracker ! queue ! tee ! queue ! nvmultistreamtiler ! nvvideoconvert ! nvdsosd ! nvvideoconvert ! fakesink

logs:

I0811 09:07:38.628474  1818 captureenginescheduler.cpp:189] Enter addDeviceScheduler()
I0811 09:07:38.630127  1818 deviceinfo.cpp:56] Add Device to database success.
I0811 09:07:38.658967  1819 nvgstpipeline.cpp:1242] Pipeline ready.
I0811 09:07:38.659286  1818 nvgstpipeline.cpp:613] resume pipeline to playing state
I0811 09:07:38.659446  1818 nvgstpipeline.cpp:461] Add source sucess, index = 0, url = rtsp://admin:123456@192.168.26.201/mpeg4
I0811 09:07:39.096009  1865 nvgstpipeline.cpp:1591] new compatible video pad vsrc_0 added on source-bin-00
I0811 09:07:39.331456  1819 nvgstpipeline.cpp:1232] Pipeline running.
I0811 09:07:39.664829  1818 captureenginecontext.cpp:146] add device status: 0, ID: 0
I0811 09:07:39.664976  1818 captureenginescheduler.cpp:196] Exit addDeviceScheduler()
I0811 09:07:41.665096  1818 captureenginescheduler.cpp:189] Enter addDeviceScheduler()
I0811 09:07:41.666858  1818 deviceinfo.cpp:56] Add Device to database success.
I0811 09:07:41.667536  1819 nvgstpipeline.cpp:1242] Pipeline ready.
I0811 09:07:41.667650  1818 nvgstpipeline.cpp:613] resume pipeline to playing state
I0811 09:07:41.667686  1818 nvgstpipeline.cpp:461] Add source sucess, index = 0, url = rtsp://admin:123456@192.168.26.201/mpeg4
I0811 09:07:42.042872  1878 nvgstpipeline.cpp:1591] new compatible video pad vsrc_0 added on source-bin-00
0:00:36.187609754  1818 0x7f98d80112a0 ERROR         nvvideoconvert gstnvvideoconvert.c:4095:gst_nvvideoconvert_transform: buffer transform failed
I0811 09:07:42.673116  1818 captureenginecontext.cpp:146] add device status: 0, ID: 1
I0811 09:07:42.673204  1818 captureenginescheduler.cpp:196] Exit addDeviceScheduler()
I0811 09:07:44.673372  1818 captureenginescheduler.cpp:189] Enter addDeviceScheduler()
I0811 09:07:44.674957  1818 deviceinfo.cpp:56] Add Device to database success.
I0811 09:07:44.675477  1819 nvgstpipeline.cpp:1242] Pipeline ready.
I0811 09:07:44.675622  1818 nvgstpipeline.cpp:613] resume pipeline to playing state
I0811 09:07:44.675655  1818 nvgstpipeline.cpp:461] Add source sucess, index = 0, url = rtsp://admin:123456@192.168.26.201/mpeg4
I0811 09:07:45.083824  1891 nvgstpipeline.cpp:1591] new compatible video pad vsrc_0 added on source-bin-00
**0:00:39.145889916  1818 0x7f98a0011a40 ERROR         nvvideoconvert gstnvvideoconvert.c:4095:gst_nvvideoconvert_transform: buffer transform failed**
I0811 09:07:45.680912  1818 captureenginecontext.cpp:146] add device status: 0, ID: 2
I0811 09:07:45.681000  1818 captureenginescheduler.cpp:196] Exit addDeviceScheduler()
I0811 09:07:47.681164  1818 captureenginescheduler.cpp:189] Enter addDeviceScheduler()
I0811 09:07:47.682541  1818 deviceinfo.cpp:56] Add Device to database success.
I0811 09:07:47.682873  1818 nvgstpipeline.cpp:461] Add source sucess, index = 1, url = rtsp://admin:123456@192.168.26.201/mpeg4
I0811 09:07:48.065145  1904 nvgstpipeline.cpp:1591] new compatible video pad vsrc_0 added on source-bin-01
I0811 09:07:48.683140  1818 captureenginecontext.cpp:146] add device status: 0, ID: 3
I0811 09:07:48.683230  1818 captureenginescheduler.cpp:196] Exit addDeviceScheduler()
I0811 09:07:50.683337  1818 captureenginescheduler.cpp:189] Enter addDeviceScheduler()
I0811 09:07:50.684543  1818 deviceinfo.cpp:56] Add Device to database success.
I0811 09:07:50.690039  1818 nvgstpipeline.cpp:461] Add source sucess, index = 1, url = rtsp://admin:123456@192.168.26.201/mpeg4
I0811 09:07:51.070062  1916 nvgstpipeline.cpp:1591] new compatible video pad vsrc_0 added on source-bin-01
0:00:45.309839870  1818 0x7f9828012400 ERROR         nvvideoconvert gstnvvideoconvert.c:4095:gst_nvvideoconvert_transform: buffer transform failed
I0811 09:07:51.695436  1818 captureenginecontext.cpp:146] add device status: 0, ID: 4
I0811 09:07:51.695523  1818 captureenginescheduler.cpp:196] Exit addDeviceScheduler()

what’s the different with DS6.3 and DS6.0.1, in ds6.0.1, it work ok with 3 GPUS, but it can not work on DS6.3.

and I have test source add and delete with only one GPU, it seems not block in ** ```
gst_element_set_state(src_bin, GST_STATE_NULL);

I have set 10 source every per pipeline.  I will test 30 source in one GPU next.

OK. The block problem is resolved on DeepStream 6.3, is that right? About the new GPU ids problems, could you open a new topic for better reference by other customers?

oh, NO!

I have been testing the demo about 3 days, it also blocked again… I started on Aug 10, and it blocked on Aug 13 00:32:00 am. I tested only on one GPU, I tested it on physical machine, not on docker.

• Hardware Platform ( A40 )
• CUDA Version: 12.1
• Ubuntu Version: 20.04
• DeepStream Version: 6.3
• gstreamer Version: 1.16.3
• TensorRT Version: 8.6.1
• NVIDIA GPU Driver Version: 530.30.02
**• opencv version : 4.5.4 **

this time, is seems not block on ```
gst_element_set_state()

source code :

bool NVGstPipeline::remove_source(guint index, std::string url)
{
g_mutex_lock (&app_ctx.lock);
GstElement *src_bin = NULL;
GstElement *streammux = NULL;
GstPad *pad = NULL;
gchar bin_name[NAME_LENGTH] = {};
gchar pad_name[NAME_LENGTH] = {};
GstStateChangeReturn state_ret;

//根据index获取source
g_snprintf(bin_name, NAME_LENGTH, SOURCE_BIN_NAME_V, index);
src_bin = gst_bin_get_by_name(GST_BIN(app_ctx.pipeline), bin_name);
//获取streammux
streammux = gst_bin_get_by_name(GST_BIN(app_ctx.pipeline), STREAM_MUXER_NAME);
if (!src_bin || !streammux) {
    LOG(ERROR) << "Can't find some element when deleting source.";
    g_mutex_unlock (&app_ctx.lock);
    return false;
}
GstState state, pending;
GstStateChangeReturn ret = gst_element_get_state(src_bin, &state, &pending, 0);
LOG(INFO) << "RET: " << ret << " state: " << state << " pending: " << pending;

gst_element_send_event(src_bin, gst_event_new_eos());
//gst_element_send_event(src_bin, gst_event_new_flush_stop (FALSE));
usleep(50000);
//https://gitlab.freedesktop.org/gstreamer/gstreamer/-/issues/704#note_947201
LOG(INFO) << "SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSS";
//gst_element_sync_state_with_parent(src_bin);
state_ret = gst_element_set_state(src_bin, GST_STATE_PAUSED);
LOG(INFO) << "TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT";
state_ret = gst_element_set_state(src_bin, GST_STATE_NULL);
LOG(INFO) << "[PPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPP";
switch(state_ret) {
case GST_STATE_CHANGE_ASYNC:
    gst_element_get_state(src_bin, NULL, NULL, GST_CLOCK_TIME_NONE);
case GST_STATE_CHANGE_SUCCESS:
    g_snprintf(pad_name, NAME_LENGTH, "sink_%u", index);
    pad = gst_element_get_static_pad(streammux, pad_name);
    //eos 会卡住,释放不掉,原因pipeline状态未设置NULL
    if(pad) {
        gst_pad_send_event(pad, gst_event_new_eos());
        gst_pad_send_event(pad, gst_event_new_flush_stop (FALSE));
        gst_element_release_request_pad(streammux, pad);
        gst_object_unref(pad);
        usleep(500);
        gst_bin_remove(GST_BIN(app_ctx.pipeline), src_bin);
        //gst_object_unref(src_bin);
    }
    break;
case GST_STATE_CHANGE_FAILURE:
    LOG(ERROR) << "STATE CHANGE FAILURE";
    break;
case GST_STATE_CHANGE_NO_PREROLL:
    LOG(ERROR) << "STATE CHANGE NO PREROLL";
    break;
default:
    LOG(ERROR) << "UNKNOWED";
    break;
}

app_ctx.source_bin[index].index = -1;
app_ctx.source_bin[index].url.clear();
app_ctx.source_bin[index].device_id.clear();
app_ctx.source_bin[index].srcBin = NULL;
app_ctx.source_bin[index].alive = FALSE;
app_ctx.source_bin[index].reconfiguring = FALSE;
app_ctx.connect_count --;
LOG(INFO) << "Source removed, index = " << index << ", url = " << url;

g_mutex_unlock (&app_ctx.lock);
return true;

}


logs images:
[1.zip|attachment](upload://eWqjkPzJI4Oo3yBxM9KNH4tr2jI.zip) (212.8 KB)

The picture was not successfully attached.
So on Deepstream 6.3, it takes 3 days to get stuck. But on Deepstream 6.0.1, it got stuck immediately?

1.zip (212.8 KB)

It may depend on the source added, when I tested it on Deepstream 6.0.1, with 30 sources add on 3 gpus , it got stuck immediately, with 10 sources add on 1 gpu, will longer to cause this problem, and DS 6.3 more longer than DS6.0.1 to got stuck, and from the logs, DS6.3 it seems not stuck in function :

gst_element_set_state()

because the logs can see :

LOG(INFO) << "[PPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPP";

so it may stuck in:

case GST_STATE_CHANGE_ASYNC:
    gst_element_get_state(src_bin, NULL, NULL, GST_CLOCK_TIME_NONE);
case GST_STATE_CHANGE_SUCCESS:
    g_snprintf(pad_name, NAME_LENGTH, "sink_%u", index);
    pad = gst_element_get_static_pad(streammux, pad_name);
    //eos 会卡住,释放不掉,原因pipeline状态未设置NULL
    if(pad) {
        gst_pad_send_event(pad, gst_event_new_eos());
        gst_pad_send_event(pad, gst_event_new_flush_stop (FALSE));
        gst_element_release_request_pad(streammux, pad);
        gst_object_unref(pad);
        usleep(500);
        gst_bin_remove(GST_BIN(app_ctx.pipeline), src_bin);
        //gst_object_unref(src_bin);
    }
    break;

There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks

If it gets stuck during a long run, it may be related to resource consumption.
1.You can use our demo to reproduce this problem after a long run.
2.You can also monitor the use of some of your resources, such as memory, gpu …

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.