[NvTiler::Composite] ERROR: 349; NvBufSurfTransformComposite failed(-2)

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU) Jetson Xavier NX
• DeepStream Version 5.0
• JetPack Version (valid for Jetson only) R32 Revision: 5.0 GCID: 25531747 Board: t186ref
• TensorRT Version 7.1.3 + CUDA 10.2
• Issue Type( questions, new requirements, bugs) bugs/errors regarding NvBufSurfTransformComposite and gst-resource-error-quark
• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing) please see below
• Requirement details( This is for new requirement. Including the module name-for which plugin or for which sample application, the function description) please see below

My app is based on deepstream_test_3.py (python version using RTSP stream).
Please see below for the order of execution within the code. At the end, we call tiler_sink_pad_buffer_probe which I also handle other logics depending on the object we see.

When i start the app, it works fine. Randomly, it stops working, and when that happened, I took a look at the log, and this is what I found -

[2021-03-09T16:46:58.328-08:00][INFO]-[NvTiler::Composite] ERROR: 349; NvBufSurfTransformComposite failed(-2)
[2021-03-09T16:46:58.328-08:00][ERROR]-SYNC_IOC_FENCE_INFO ioctl failed with 9
[2021-03-09T16:46:58.334-08:00][ERROR]-SYNC_IOC_FENCE_INFO ioctl failed with 9
[2021-03-09T16:46:58.334-08:00][ERROR]-SYNC_IOC_FENCE_INFO ioctl failed with 9

Then immediately after (sometime before - not sure if this is related to the above) -

[2021-03-09T16:46:58.392-08:00][ERROR]-bus_call.py:37,Error: gst-resource-error-quark: GstNvTiler:
FATAL ERROR; NvTiler::Composite failed (1): /dvs/git/dirty/git-master_linux/deepstream/sdk/src/gst-plugins/gst-nvtiler/gstnvtiler.cpp(657): gst_nvmultistreamtiler_transform (): /GstPipeline:pipeline0/GstNvMultiStreamTiler:nvtiler

When I just turn off the app and turn it back on, it starts to work fine about a day or so. After that this error pops back and causes the app to freeze. I tested to check if camera has any issues, but I saw that I was able to still stream from the RTSP source.

I couldn’t really figure out where the issue was coming from. When I searched NvBufSurfTransformComposite, documentation talked about what they were, but it didn’t really help me troubleshoot. When I searched this term on the forum, I didn’t see any similar cases. If you could please help, I would appreciate it. Please see below for the codes

Thanks

1. Create streammux and add to pipeline

streammux = Gst.ElementFactory.make("nvstreammux", "Stream-muxer") pipeline.add(streammux)

2. For all sources, create source_bin

for i in range(number_sources):
    # os.mkdir(folder_name+"/stream_"+str(i))
    frame_count["stream_"+str(i)] = 0
    saved_count["stream_"+str(i)] = 0

    print("Creating source_bin ", i, " \n ")
    uri_name = args[i+1]
    # if we have rtsp:// in the string, it returns 0. if not, returns -1
    if uri_name.find("rtsp://") == 0:
        is_live = True  # start streaming
    # Create the bin for this stream, and make a source pad for its output
    source_bin = create_source_bin(i, uri_name)
    if not source_bin:
        sys.stderr.write("Unable to create source bin \n")
    # Add this bin to the pipeline
    pipeline.add(source_bin)

    # Get a sink pad in the streammux element
    padname = "sink_%u" % i
    sinkpad = streammux.get_request_pad(padname)
    if not sinkpad:
        sys.stderr.write("Unable to create sink pad bin \n")

    # Link the source pad on this bin to the sink pad in streammux
    srcpad = source_bin.get_static_pad("src")
    if not srcpad:
        sys.stderr.write("Unable to create src pad bin \n")
    srcpad.link(sinkpad)

3. Create queue for all

queue1 = Gst.ElementFactory.make("queue", "queue1")
queue2 = Gst.ElementFactory.make("queue", "queue2")
queue3 = Gst.ElementFactory.make("queue", "queue3")
queue4 = Gst.ElementFactory.make("queue", "queue4")
queue5 = Gst.ElementFactory.make("queue", "queue5")
queue6 = Gst.ElementFactory.make("queue", "queue6")
queue7 = Gst.ElementFactory.make("queue", "queue7")
queue8 = Gst.ElementFactory.make("queue", "queue8")
pipeline.add(queue1)
pipeline.add(queue2)
pipeline.add(queue3)
pipeline.add(queue4)
pipeline.add(queue5)
pipeline.add(queue6)
pipeline.add(queue7)
pipeline.add(queue8)

4. And add nvvidconv1 and filter1 to convert the frames to RGBA

print("Creating Pgie \n ")
pgie = Gst.ElementFactory.make("nvinfer", "primary-inference")
if not pgie:
    sys.stderr.write(" Unable to create pgie \n")

tracker = Gst.ElementFactory.make("nvtracker", "tracker")
if not tracker:
    sys.stderr.write(" Unable to create tracker \n")

print("Creating nvvidconv1 \n ")
nvvidconv1 = Gst.ElementFactory.make("nvvideoconvert", "convertor1")
if not nvvidconv1:
    sys.stderr.write(" Unable to create nvvidconv1 \n")

print("Creating filter1 \n ")
caps1 = Gst.Caps.from_string("video/x-raw(memory:NVMM), format=RGBA")
filter1 = Gst.ElementFactory.make("capsfilter", "filter1")
if not filter1:
    sys.stderr.write(" Unable to get the caps filter1 \n")
filter1.set_property("caps", caps1)

5. Create tiler

print("Creating tiler \n ")
tiler = Gst.ElementFactory.make("nvmultistreamtiler", "nvtiler")
if not tiler:
    sys.stderr.write(" Unable to create tiler \n")

6. The next element in the pipeline converts the output to RGBA format. Use convertor to convert from NV12 to RGBA as required by nvosd

print("Creating nvvidconv \n ")
nvvidconv = Gst.ElementFactory.make("nvvideoconvert", "convertor")
if not nvvidconv:
    sys.stderr.write(" Unable to create nvvidconv \n")

# Create OSD to draw on the converted RGBA buffer
print("Creating nvosd \n ")
nvosd = Gst.ElementFactory.make("nvdsosd", "onscreendisplay")
if not nvosd:
    sys.stderr.write(" Unable to create nvosd \n")

nvosd.set_property('process-mode', OSD_PROCESS_MODE)
nvosd.set_property('display-text', OSD_DISPLAY_TEXT)

7. Create FakeSink since no display is connected

print("Creating FakeSink \n")
sink = Gst.ElementFactory.make("fakesink", "fakesink")
if not sink:
    sys.stderr.write(" Unable to create fakesink \n")

if is_aarch64():
    print("Creating transform \n ")
    transform = Gst.ElementFactory.make(
        "queue", "nvegl-transform")
    if not transform:
        sys.stderr.write(" Unable to create transform \n")


if is_live:
    print("At least one of the sources is live")
    streammux.set_property('live-source', 1)

streammux.set_property('width', 1280)
streammux.set_property('height', 720)
streammux.set_property('batch-size', number_sources)
streammux.set_property('batched-push-timeout', 4000000)
pgie.set_property('config-file-path', PGIE_CONFIG_FILE)

8. Set properties of tracker

config = configparser.ConfigParser()
config.read('tracker_config.txt') # see below for the content of the tracker_config.txt
config.sections()

for key in config['tracker']:
    if key == 'tracker-width':
        tracker_width = config.getint('tracker', key)
        tracker.set_property('tracker-width', tracker_width)
    if key == 'tracker-height':
        tracker_height = config.getint('tracker', key)
        tracker.set_property('tracker-height', tracker_height)
    if key == 'gpu-id':
        tracker_gpu_id = config.getint('tracker', key)
        tracker.set_property('gpu_id', tracker_gpu_id)
    if key == 'll-lib-file':
        tracker_ll_lib_file = config.get('tracker', key)
        tracker.set_property('ll-lib-file', tracker_ll_lib_file)
    if key == 'll-config-file':
        tracker_ll_config_file = config.get('tracker', key)
        tracker.set_property('ll-config-file', tracker_ll_config_file)
    if key == 'enable-batch-process':
        tracker_enable_batch_process = config.getint('tracker', key)
        tracker.set_property('enable_batch_process',
                             tracker_enable_batch_process)

pgie_batch_size = pgie.get_property("batch-size")
if(pgie_batch_size != number_sources):
    print("WARNING: Overriding infer-config batch-size",
          pgie_batch_size, " with number of sources ", number_sources, " \n")
    pgie.set_property("batch-size", number_sources)

tiler_rows = int(math.sqrt(number_sources))
tiler_columns = int(math.ceil((1.0*number_sources)/tiler_rows))
tiler.set_property("rows", tiler_rows)
tiler.set_property("columns", tiler_columns)
tiler.set_property("width", TILED_OUTPUT_WIDTH)
tiler.set_property("height", TILED_OUTPUT_HEIGHT)

# sink.set_property("qos", 0)
sink.set_property("sync", 0)

if not is_aarch64():
    # Use CUDA unified memory in the pipeline so frames
    # can be easily accessed on CPU in Python.
    mem_type = int(pyds.NVBUF_MEM_CUDA_UNIFIED)
    streammux.set_property("nvbuf-memory-type", mem_type)
    nvvidconv.set_property("nvbuf-memory-type", mem_type)
    nvvidconv1.set_property("nvbuf-memory-type", mem_type)
    tiler.set_property("nvbuf-memory-type", mem_type)

9. Adding elements to Pipeline

pipeline.add(pgie)
pipeline.add(tracker)
pipeline.add(tiler)
pipeline.add(nvvidconv)
pipeline.add(filter1)
pipeline.add(nvvidconv1)
pipeline.add(nvosd)
if is_aarch64():
    pipeline.add(transform)
pipeline.add(sink)

10. Linking elements in the Pipeline

streammux.link(queue1)
queue1.link(pgie)
pgie.link(queue2)
queue2.link(tracker)
tracker.link(queue3)
queue3.link(nvvidconv1)
nvvidconv1.link(queue4)
queue4.link(filter1)
filter1.link(queue5)
queue5.link(tiler)
tiler.link(queue6)
queue6.link(nvvidconv)
nvvidconv.link(queue7)
queue7.link(nvosd)
if is_aarch64():
nvosd.link(queue8)
queue8.link(transform)
transform.link(sink)
else:
nvosd.link(queue8)
queue8.link(sink)

11. create an event loop and feed gstreamer bus messages to it

loop = GObject.MainLoop()
bus = pipeline.get_bus()
bus.add_signal_watch()
bus.connect(“message”, bus_call, loop)

12. Add probe to get informed of the meta data generated, we add probe to the sink pad of the osd element, since by that time, the buffer would have had gotten all the metadata.

tiler_sink_pad = tiler.get_static_pad("sink")
if not tiler_sink_pad:
    sys.stderr.write(" Unable to get sink pad \n")
else:
    tiler_sink_pad.add_probe(
        Gst.PadProbeType.BUFFER, tiler_sink_pad_buffer_probe, 0)

tracker_config.txt

[tracker]
tracker-width=640
tracker-height=384
gpu-id=0
ll-lib-file=/opt/nvidia/deepstream/deepstream-5.0/lib/libnvds_mot_klt.so
#ll-lib-file=/opt/nvidia/deepstream/deepstream-5.0/lib/libnvds_nvdcf.so
#ll-config-file=tracker_config.yml
enable-batch-process=1

Can the original test3 script work with the same scenario?

@Fiona.Chen

I haven’t ran original test3 script for that long to know if it does or not.
My script/app runs fine initially. It seemed to stop 1-2 days after I start the app (happened twice).
Not sure if this information helps

Can you try the original test3 script for the same scenario first? It is better to reproduce the problem with our sample code.

@Fiona.Chen

If I understood correctly, you want me to run the same code with same hardware setup to see if we have the same issue?

My current setup is deployed at the client’s location, so if I deployed test3 script as is, that wouldn’t be an apple to apple comparison. No?

Or are you asking me to just run the test3 script for days to see if anything changes?

Please let me know if I misunderstood your points.

Thanks

We need you to run test3 with the same scenario which will cause the failure.

@Fiona.Chen
Thank you for the quick response… and I apologize for keep asking the Q.

SAME SCENARIO -

  • I am having a hard time understanding what you mean by same scenario. If codebase changed, the setup is different. Correct?
  • Are you referring to same hardware setup?
  • Are you referring to same software setup as well?
  • My app - streams from RTSP, checks for an object, if we found an object, send to AWS

If you could please answer those in-line, I would really appreciate it.

To make things simple, we need you to reproduce the same failure with test3. You can choose how to do that as you like. And you need to tell us how to reproduce the failure with test3.

1 Like

You need to get enough information of how and where the problem is. We can not debug for you. Even you yourself need to figure out whether the failure is hardware related or software related.

@Fiona.Chen

Hmmm… I do understand what you mean, but please do know that I am not just asking you to debug. Please allow me to explain my intention.

My thought process was that it isn’t related to hardware - because I have 2 devices deployed with same setting and code and both of these have this issue. So it is very unlikely that hardware (Jetson Xavier NX Dev Kit) would be causing this issue, would you agree?

I also didn’t start from taking my app apart because it is deployed at a location where I can’t just do that. I decided to start here because -
Error clearly states that

[NvTiler::Composite] ERROR: 349; NvBufSurfTransformComposite failed(-2)

and near or around this error, other errors show

[2021-03-10T18:47:40.363-08:00][ERROR]-SYNC_IOC_FENCE_INFO ioctl failed with 9
[2021-03-10T18:47:40.363-08:00][ERROR]-SYNC_IOC_FENCE_INFO ioctl failed with 9

So, I thought it is related to NVIDIA since NvTiler showed up. So I searched for NvBufSurfTransformComposite (NVIDIA DeepStream SDK API Reference: Image Transformation and Compositing API). This search didn’t reveal much information since it looked like it should fail from the beginning if something was used incorrectly.

That’s when I turned to here. I posted the pipeline hoping to see if something I did was incorrect. Based on your response, should I assume that the pipeline is set up correctly? I was hoping to get some answers on what would cause NvBufSurfTransformComposite failure. Would you be able to help get some better answers to this?

Hopefully, this clarifies that my intention wasn’t to just get the answer from you without debugging, but I do understand and appreciate your pointer.

If you can reproduce the failure with test3 sample, we can debug with test3 and tell you why NvBufSurfTransformComposite failed. NvBufSurfTransformComposite is inside nvmultistreamtiler which is not open source.
If you think there is something wrong with your code, you need to provide the complete code and configurations but not just pieces.

1 Like

Understood!
As you suggested, let me think about

  1. how I can replicate it using test3 sample
  2. if I can’t do that, share the code and configuration for better support