DS6.0.1 Segfault ibnvds_opticalflow_dgpu: Setting GPU_ID = 0

willemvdkletersteeg · March 25, 2022, 10:50am

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU)
• DeepStream Version 6.0.1-samples docker image
• JetPack Version (valid for Jetson only)
• TensorRT Version DS6.0.1-samples docker image
• NVIDIA GPU Driver Version (valid for GPU only)

willem@host:~$ docker run -it --gpus all nvidia/cuda:11.0-base nvidia-smi
Fri Mar 25 11:43:53 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.82.01    Driver Version: 470.82.01    CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:05:00.0 Off |                  N/A |
| 35%   30C    P0    66W / 250W |      0MiB / 11016MiB |      1%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA GeForce ...  Off  | 00000000:09:00.0 Off |                  N/A |
| 37%   25C    P0    61W / 250W |      0MiB / 11019MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   2  NVIDIA GeForce ...  Off  | 00000000:8A:00.0 Off |                  N/A |
| 33%   26C    P0    39W / 250W |      0MiB / 11019MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

• Issue Type( questions, new requirements, bugs)

I have my deepstream application running wonderfully on my Jetson Xavier NX with the latest jetpack. I can also make it run inside the l4t deepstream containers available on NGC. I am now trying to get the same application running on x86 with 3 discrete GPU’s (RTX 2080Ti) inside the 6.0.1-samples container for x86 also from NGC.

The container boots up just fine and the sample applications seem to run perfectly. But when I start my own application (which uses optical flow gst-nvof plugin) it segfaults when starting the gstreamer pipeline…

This is the exact console output:

Ninox.py:376: Warning: value "0" of type 'guint' is invalid or out of range for property 'batch-size' of type 'guint'
  pgie.set_property("batch-size",0)
Device Number: 0
Device name: NVIDIA GeForce RTX 2080 Ti
Device Version 7.5
  Device Supports Optical Flow Functionality

0:00:00.326923973    83      0x26828c0 WARN                 nvinfer gstnvinfer.cpp:635:gst_nvinfer_logger:<primary-inference> NvDsInferContext[UID 1]: Warning from NvDsInferContextImpl::initialize() <nvdsinfer_context_impl.cpp:1161> [UID = 1]: Warning, OpenCV has been deprecated. Using NMS for clustering instead of cv::groupRectangles with topK = 20 and NMS Threshold = 0.5
0:00:02.043054864    83      0x26828c0 INFO                 nvinfer gstnvinfer.cpp:638:gst_nvinfer_logger:<primary-inference> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::deserializeEngineAndBackend() <nvdsinfer_context_impl.cpp:1900> [UID = 1]: deserialized trt engine from :/ninox/deepstream_models/Primary_Detector/resnet10.caffemodel_b1_gpu0_int8.engine
INFO: ../nvdsinfer/nvdsinfer_model_builder.cpp:610 [Implicit Engine Info]: layers num: 3
0   INPUT  kFLOAT input_1         3x368x640       
1   OUTPUT kFLOAT conv2d_bbox     16x23x40        
2   OUTPUT kFLOAT conv2d_cov/Sigmoid 4x23x40         

0:00:02.043164331    83      0x26828c0 INFO                 nvinfer gstnvinfer.cpp:638:gst_nvinfer_logger:<primary-inference> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::generateBackendContext() <nvdsinfer_context_impl.cpp:2004> [UID = 1]: Use deserialized engine model: /ninox/deepstream_models/Primary_Detector/resnet10.caffemodel_b1_gpu0_int8.engine
0:00:02.044318762    83      0x26828c0 INFO                 nvinfer gstnvinfer_impl.cpp:313:notifyLoadModelStatus:<primary-inference> [UID 1]: Load new model:deepstream_config/ds_pgie_config.txt sucessfully
gst_ds_optical_flow_set_caps: Creating OpticalFlow Context for Source = 0
libnvds_opticalflow_dgpu: Setting GPU_ID = 0
Segmentation fault (core dumped)

Wat is going on? As said, it works fine on Jetson… ?

willemvdkletersteeg · March 28, 2022, 6:51am

Can I give this a little bump? Anyone who knows where to go looking for the cause of this problem?

Code runs fine on Jetson
Samples run fine on x86
Code segfaults on x86

kesong · March 29, 2022, 1:16am

Can you run gdb with your application and show the log of gdb when crash?

willemvdkletersteeg · March 29, 2022, 2:05pm

Apparently, it’s queue4 that’s segfaulting. Thanks for the clue. Not sure what to do next but i’ll try to troubleshoot from here.

GNU gdb (Ubuntu 8.1.1-0ubuntu1) 8.1.1
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from python3...(no debugging symbols found)...done.
Starting program: /usr/bin/python3 Ninox.py -s /config/DS3.1_Test1.json -n
warning: Error disabling address space randomization: Operation not permitted
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ff1270b3700 (LWP 496)]
[New Thread 0x7ff1268b2700 (LWP 497)]
[New Thread 0x7ff1240b1700 (LWP 498)]
[New Thread 0x7ff11f8b0700 (LWP 499)]
[New Thread 0x7ff11d0af700 (LWP 500)]
[New Thread 0x7ff11c8ae700 (LWP 501)]
[New Thread 0x7ff1180ad700 (LWP 502)]
[New Thread 0x7ff1178ac700 (LWP 503)]
[New Thread 0x7ff1130ab700 (LWP 504)]
[New Thread 0x7ff1108aa700 (LWP 505)]
[New Thread 0x7ff10e0a9700 (LWP 506)]
[New Thread 0x7ff10b8a8700 (LWP 507)]
[New Thread 0x7ff10b0a7700 (LWP 508)]
[New Thread 0x7ff1068a6700 (LWP 509)]
[New Thread 0x7ff1040a5700 (LWP 510)]
[New Thread 0x7ff1018a4700 (LWP 511)]
[New Thread 0x7ff1010a3700 (LWP 512)]
[New Thread 0x7ff0fe8a2700 (LWP 513)]
[New Thread 0x7ff0fa0a1700 (LWP 514)]
[New Thread 0x7ff0f98a0700 (LWP 515)]
[New Thread 0x7ff0f509f700 (LWP 516)]
[New Thread 0x7ff0f489e700 (LWP 517)]
[New Thread 0x7ff0f009d700 (LWP 518)]
[New Thread 0x7ff0ed89c700 (LWP 519)]
[New Thread 0x7ff0eb09b700 (LWP 520)]
[New Thread 0x7ff0e889a700 (LWP 521)]
[New Thread 0x7ff0e6099700 (LWP 522)]
[New Thread 0x7ff0e3898700 (LWP 523)]
[New Thread 0x7ff0e3097700 (LWP 524)]
[New Thread 0x7ff0e0896700 (LWP 525)]
[New Thread 0x7ff0e0095700 (LWP 526)]
[New Thread 0x7ff0d9894700 (LWP 527)]
[Thread 0x7ff0e0095700 (LWP 526) exited]
[Thread 0x7ff0e0896700 (LWP 525) exited]
[Thread 0x7ff0e3097700 (LWP 524) exited]
[Thread 0x7ff0e3898700 (LWP 523) exited]
[Thread 0x7ff0e6099700 (LWP 522) exited]
[Thread 0x7ff0e889a700 (LWP 521) exited]
[Thread 0x7ff0eb09b700 (LWP 520) exited]
[Thread 0x7ff0ed89c700 (LWP 519) exited]
[Thread 0x7ff0f009d700 (LWP 518) exited]
[Thread 0x7ff0f489e700 (LWP 517) exited]
[Thread 0x7ff0f509f700 (LWP 516) exited]
[Thread 0x7ff0f98a0700 (LWP 515) exited]
[Thread 0x7ff0fa0a1700 (LWP 514) exited]
[Thread 0x7ff0fe8a2700 (LWP 513) exited]
[Thread 0x7ff1010a3700 (LWP 512) exited]
[Thread 0x7ff1018a4700 (LWP 511) exited]
[Thread 0x7ff1040a5700 (LWP 510) exited]
[Thread 0x7ff1068a6700 (LWP 509) exited]
[Thread 0x7ff10b0a7700 (LWP 508) exited]
[Thread 0x7ff10b8a8700 (LWP 507) exited]
[Thread 0x7ff10e0a9700 (LWP 506) exited]
[Thread 0x7ff1108aa700 (LWP 505) exited]
[Thread 0x7ff1130ab700 (LWP 504) exited]
[Thread 0x7ff1178ac700 (LWP 503) exited]
[Thread 0x7ff1180ad700 (LWP 502) exited]
[Thread 0x7ff11c8ae700 (LWP 501) exited]
[Thread 0x7ff11d0af700 (LWP 500) exited]
[Thread 0x7ff11f8b0700 (LWP 499) exited]
[Thread 0x7ff1240b1700 (LWP 498) exited]
[Thread 0x7ff1268b2700 (LWP 497) exited]
[Thread 0x7ff1270b3700 (LWP 496) exited]

(gst-plugin-scanner:528): GStreamer-WARNING **: 15:02:05.888: Failed to load plugin '/usr/lib/x86_64-linux-gnu/gstreamer-1.0/deepstream/libnvdsgst_inferserver.so': libtritonserver.so: cannot open shared object file: No such file or directory

(gst-plugin-scanner:528): GStreamer-WARNING **: 15:02:07.809: Failed to load plugin '/usr/lib/x86_64-linux-gnu/gstreamer-1.0/deepstream/libnvdsgst_udp.so': librivermax.so.0: cannot open shared object file: No such file or directory
[New Thread 0x7ff0e0095700 (LWP 531)]
[New Thread 0x7ff0e0896700 (LWP 532)]
[New Thread 0x7ff0e3097700 (LWP 535)]
[New Thread 0x7ff0e3898700 (LWP 542)]
[New Thread 0x7ff10b268700 (LWP 543)]
[New Thread 0x7ff1040a5700 (LWP 544)]
[New Thread 0x7ff1018a4700 (LWP 545)]
[New Thread 0x7ff1010a3700 (LWP 546)]
Device Number: 0
Device name: NVIDIA GeForce RTX 2080 Ti
Device Version 7.5
  Device Supports Optical Flow Functionality

0:00:05.075127699   492      0x2cf86c0 WARN                 nvinfer gstnvinfer.cpp:635:gst_nvinfer_logger:<primary-inference> NvDsInferContext[UID 1]: Warning from NvDsInferContextImpl::initialize() <nvdsinfer_context_impl.cpp:1161> [UID = 1]: Warning, OpenCV has been deprecated. Using NMS for clustering instead of cv::groupRectangles with topK = 20 and NMS Threshold = 0.5
0:00:06.739718177   492      0x2cf86c0 INFO                 nvinfer gstnvinfer.cpp:638:gst_nvinfer_logger:<primary-inference> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::deserializeEngineAndBackend() <nvdsinfer_context_impl.cpp:1900> [UID = 1]: deserialized trt engine from :/ninox/deepstream_models/Primary_Detector/resnet10.caffemodel_b1_gpu0_int8.engine
INFO: ../nvdsinfer/nvdsinfer_model_builder.cpp:610 [Implicit Engine Info]: layers num: 3
0   INPUT  kFLOAT input_1         3x368x640       
1   OUTPUT kFLOAT conv2d_bbox     16x23x40        
2   OUTPUT kFLOAT conv2d_cov/Sigmoid 4x23x40         

0:00:06.739857025   492      0x2cf86c0 INFO                 nvinfer gstnvinfer.cpp:638:gst_nvinfer_logger:<primary-inference> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::generateBackendContext() <nvdsinfer_context_impl.cpp:2004> [UID = 1]: Use deserialized engine model: /ninox/deepstream_models/Primary_Detector/resnet10.caffemodel_b1_gpu0_int8.engine
[New Thread 0x7ff0fa0a1700 (LWP 547)]
[New Thread 0x7ff0f98a0700 (LWP 548)]
[New Thread 0x7ff0f509f700 (LWP 549)]
0:00:06.741964642   492      0x2cf86c0 INFO                 nvinfer gstnvinfer_impl.cpp:313:notifyLoadModelStatus:<primary-inference> [UID 1]: Load new model:deepstream_config/ds_pgie_config.txt sucessfully
[New Thread 0x7ff0f489e700 (LWP 550)]
[New Thread 0x7ff0f009d700 (LWP 551)]
[New Thread 0x7ff0eb09b700 (LWP 552)]
[New Thread 0x7ff0e889a700 (LWP 553)]
[New Thread 0x7ff074c5e700 (LWP 554)]
gst_ds_optical_flow_set_caps: Creating OpticalFlow Context for Source = 0
libnvds_opticalflow_dgpu: Setting GPU_ID = 0
[New Thread 0x7ff05940b700 (LWP 555)]
[New Thread 0x7ff058c0a700 (LWP 556)]
[New Thread 0x7ff021fff700 (LWP 557)]
[New Thread 0x7ff0217fe700 (LWP 558)]
[New Thread 0x7ff020ffd700 (LWP 559)]
[New Thread 0x7ff019fff700 (LWP 560)]

Thread 55 "queue4:src" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ff019fff700 (LWP 560)]
__memmove_sse2_unaligned_erms () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:431
431	../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S: No such file or directory.
(gdb)

willemvdkletersteeg · April 5, 2022, 7:24am

Apparantly, it wasn’t the gst-queue segfaulting. I’ve done some more research and narrowed it down to this exact call:

(gdb) py-bt
Traceback (most recent call first):
  <built-in method __deepcopy__ of numpy.ndarray object at remote 0x7f93d06c2850>
  File "/usr/lib/python3.6/copy.py", line 161, in deepcopy
    y = copier(memo)
  File "Ninox.py", line 187, in buffer_probe
    detector_input_queue.put(DetectorInputCapsule(copy.deepcopy(current_frame), frame_number, casted_meta_list), block=False)

I do this call in my buffer probe function to process the frame and metadate in another process. This works fine on Jetson platform but it segfaults on x86. I believe this must have to do with the fact that the implementation of the pyds.get_nvds_buf_surface() method is different for x86 and Jetson where the returned numpy array is mapped to CPU memory on Jetson but to CUDA unified memory on x86. I do need to pass on a copy of the frame to the other process via the multiprocess queue, though. So the question is, how?

def buffer_probe(pad, info, u_data):
    frame_number = 0
    gst_buffer = info.get_buffer()
    if not gst_buffer:
        logging.error("Unable to get GstBuffer.")
        return

    # Retrieve batch metadata from the gst_buffer
    # Note that pyds.gst_buffer_get_nvds_batch_meta() expects the
    # C address of gst_buffer as input, which is obtained with hash(gst_buffer)
    batch_meta = pyds.gst_buffer_get_nvds_batch_meta(hash(gst_buffer))
           
    current_frame = pyds.get_nvds_buf_surface(hash(gst_buffer), 0)

    frame_meta_list = batch_meta.frame_meta_list

    # Note that frame_meta_list.data needs a cast to pyds.NvDsFrameMeta
    # The casting is done by pyds.NvDsFrameMeta.cast()
    # The casting also keeps ownership of the underlying memory
    # in the C code, so the Python garbage collector will leave
    # it alone.
    frame_meta = pyds.NvDsFrameMeta.cast(frame_meta_list.data)
    
    frame_number = frame_meta.frame_num
    obj_meta_list = frame_meta.obj_meta_list
    user_meta_list = frame_meta.frame_user_meta_list

    casted_meta_list = []
    while obj_meta_list is not None:
        try: 
            # Casting obj_meta_list.data to pyds.NvDsObjectMeta
            obj_meta=pyds.NvDsObjectMeta.cast(obj_meta_list.data)
        except StopIteration:
            break
        

        # We have to get rid of all the pyds.NvDs* objects in the metadata because we can't pickle those, so we just get what we need for
        # the detectors to do their job down the line...
        x_left = int(obj_meta.rect_params.left)
        x_right = int(obj_meta.rect_params.left + obj_meta.rect_params.width)
        y_top = int(obj_meta.rect_params.top)
        y_bottom =  int(obj_meta.rect_params.top + obj_meta.rect_params.height)
        coordinates = (x_left, y_top, x_right, y_bottom)

        obj_meta_dict = {
            "type": "object",
            "class_id": obj_meta.class_id,
            "obj_label": obj_meta.obj_label,
            "confidence": obj_meta.confidence,
            "coordinates": coordinates,
            "x_left": x_left,
            "x_right": x_right,
            "y_top": y_top,
            "y_bottom": y_bottom
        }
        casted_meta_list.append(obj_meta_dict)

        try: 
            obj_meta_list = obj_meta_list.next
        except StopIteration:
            break
    
    while user_meta_list is not None:
        try:
            of_user_meta = pyds.NvDsUserMeta.cast(user_meta_list.data)
        except StopIteration:
            break
        try:
            # Casting of_user_meta.user_meta_data to pyds.NvDsOpticalFlowMeta
            of_meta = pyds.NvDsOpticalFlowMeta.cast(of_user_meta.user_meta_data)
            # Get Flow vectors
            flow_vectors = pyds.get_optical_flow_vectors(of_meta)
            # Reshape the obtained flow vectors into proper shape
            flow_vectors = flow_vectors.reshape(of_meta.rows, of_meta.cols, 2)

            user_meta_dict = {
                "type": "opticalflow",
                "flow_vectors": flow_vectors
            }
            casted_meta_list.append(user_meta_dict)

        except StopIteration:
            break
        try:
            user_meta_list = user_meta_list.next
        except StopIteration:
            break

    try:
        detector_input_queue.put(DetectorInputCapsule(copy.deepcopy(current_frame), frame_number, casted_meta_list), block=False)
    except Full:
        try:
            while True: # a Multiprocessing Queue does not have a clear() method or something similar, so we have to clear it the dirty way
                detector_input_queue.get_nowait()
        except Empty:
            pass    

    return Gst.PadProbeReturn.OK

Is there something I can do to avoid this segfault and force the framebuffer into CPU memory?

FYI: the detector_input_queue is a Multiprocessing.Queue.

(gdb) bt 25
#0  0x00007f940089e384 in __memmove_sse2_unaligned_erms () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:431
#1  0x00007f9382a9ca9d in  () at /usr/lib/python3/dist-packages/numpy/core/multiarray.cpython-36m-x86_64-linux-gnu.so
#2  0x00007f9382a9d3ea in  () at /usr/lib/python3/dist-packages/numpy/core/multiarray.cpython-36m-x86_64-linux-gnu.so
#3  0x00007f9382aa721f in  () at /usr/lib/python3/dist-packages/numpy/core/multiarray.cpython-36m-x86_64-linux-gnu.so
#4  0x00007f9382b2b566 in  () at /usr/lib/python3/dist-packages/numpy/core/multiarray.cpython-36m-x86_64-linux-gnu.so
#5  0x000000000050abff in _PyCFunction_FastCallDict (kwargs=<optimized out>, nargs=<optimized out>, args=<optimized out>, func_obj=<built-in method __deepcopy__ of numpy.ndarray object at remote 0x7f93d06c2850>) at ../Objects/methodobject.c:234
#6  0x000000000050abff in _PyCFunction_FastCallKeywords (kwnames=<optimized out>, nargs=<optimized out>, stack=<optimized out>, func=<optimized out>) at ../Objects/methodobject.c:294
#7  0x000000000050abff in call_function.lto_priv (pp_stack=0x7f93addfced0, oparg=<optimized out>, kwnames=<optimized out>) at ../Python/ceval.c:4851
#8  0x000000000050c924 in _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at ../Python/ceval.c:3335
#9  0x0000000000508675 in PyEval_EvalFrameEx (throwflag=0, f=Frame 0x7f92cc01d648, for file /usr/lib/python3.6/copy.py, line 161, in deepcopy (x=<numpy.ndarray at remote 0x7f93d06c2850>, memo={}, _nil=[], d=140272833669200, y=[...], cls=<type at remote 0x7f9382dccba0>, copier=<built-in method __deepcopy__ of numpy.ndarray object at remote 0x7f93d06c2850>, issc=False))
    at ../Python/ceval.c:754
#10 0x0000000000508675 in _PyEval_EvalCodeWithName.lto_priv.1836 (_co=<optimized out>, globals=<optimized out>, locals=<optimized out>, args=<optimized out>, argcount=<optimized out>, kwnames=0x0, kwargs=0x7f92cc01cfe8, kwcount=<optimized out>, kwstep=1, defs=0x7f93ff926e20, defcount=2, kwdefs=0x0, closure=0x0, name='deepcopy', qualname='deepcopy') at ../Python/ceval.c:4166
#11 0x000000000050a3e0 in fast_function.lto_priv (func=<function at remote 0x7f93ff81a158>, stack=0x7f92cc01cfe0, nargs=1, kwnames=<optimized out>) at ../Python/ceval.c:4992
#12 0x000000000050adcd in call_function.lto_priv (pp_stack=0x7f93addfd1f0, oparg=<optimized out>, kwnames=<optimized out>) at ../Python/ceval.c:4872
#13 0x000000000050c924 in _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at ../Python/ceval.c:3335
#14 0x0000000000508675 in PyEval_EvalFrameEx (throwflag=0, f=Frame 0x7f92cc01cd98, for file Ninox.py, line 187, in buffer_probe (pad=<Pad at remote 0x7f93d06b0bd0>, info=<PadProbeInfo at remote 0x7f93d06c24a8>, u_data=0, frame_number=0, gst_buffer=<Buffer at remote 0x7f937cac6e28>, batch_meta=<pyds.NvDsBatchMeta at remote 0x7f93d06c4490>, current_frame=<numpy.ndarray at remote 0x7f93d06c2850>, frame_meta_list=<pyds.GList at remote 0x7f93d06c6f48>, frame_meta=<pyds.NvDsFrameMeta at remote 0x7f93d06c63b0>, obj_meta_list=None, user_meta_list=None, casted_meta_list=[])) at ../Python/ceval.c:754
#15 0x0000000000508675 in _PyEval_EvalCodeWithName.lto_priv.1836 (_co=<optimized out>, globals=<optimized out>, locals=<optimized out>, args=<optimized out>, argcount=<optimized out>, kwnames=0x0, kwargs=0x0, kwcount=<optimized out>, kwstep=2, defs=0x0, defcount=0, kwdefs=0x0, closure=0x0, name=0x0, qualname=0x0) at ../Python/ceval.c:4166
#16 0x000000000058990b in PyEval_EvalCodeEx (closure=<optimized out>, kwdefs=<optimized out>, defcount=0, defs=0x0, kwcount=0, kws=0x0, argcount=<optimized out>, args=0x7f937ca88c30, locals=0x0, globals=<optimized out>, _co=<optimized out>) at ../Python/ceval.c:4187
#17 0x000000000058990b in function_call.lto_priv (func=<function at remote 0x7f937e345b70>, arg=(<Pad at remote 0x7f93d06b0bd0>, <PadProbeInfo at remote 0x7f93d06c24a8>, 0), kw=0x0)
    at ../Objects/funcobject.c:604
#18 0x00000000005a07ce in PyObject_Call (func=<function at remote 0x7f937e345b70>, args=<optimized out>, kwargs=<optimized out>) at ../Objects/abstract.c:2261
#19 0x00007f937e733c4b in  () at /usr/lib/python3/dist-packages/gi/_gi.cpython-36m-x86_64-linux-gnu.so
#20 0x00007f93edbd6b4f in ffi_closure_unix64_inner () at /usr/lib/x86_64-linux-gnu/libffi.so.6
#21 0x00007f93edbd6f16 in ffi_closure_unix64 () at /usr/lib/x86_64-linux-gnu/libffi.so.6
#22 0x00007f937d612288 in  () at /usr/lib/x86_64-linux-gnu/libgstreamer-1.0.so.0
#23 0x00007f93f7b9e8e4 in g_hook_list_marshal () at /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0
#24 0x00007f937d610825 in  () at /usr/lib/x86_64-linux-gnu/libgstreamer-1.0.so.0
#25 0x00007f937d6144c3 in  () at /usr/lib/x86_64-linux-gnu/libgstreamer-1.0.so.0

Any insights will be very helpful. We’ve developed this application on Jetson platform but we need to deploy on x86 with dGPU’s but we’ve been hitting brick walls so far…

Fiona.Chen · April 6, 2022, 2:54am

Please refer to gst_element_send_nvevent_new_stream_reset — Deepstream Deepstream Version: 6.1.1 documentation

willemvdkletersteeg · April 6, 2022, 7:35am

Thanks! For reasons I’m not going into further, I also need a complete copy of the videoframe. So, looking at the documentation, it seems to me I’m going to need some combination of pyds.NvBufSurfaceMap(), pyds.NvBufSurfaceCopy() and pyds.NvBufSurfaceSyncForCpu(). Right?

edit: ah I see syncing is only for Jetson platform. So leaves me to figure out how to handle surface mapping and copying, I guess?

By the way, I’ve been looking far and wide last week for this exact documentation. All links I found guided me to the Metropolis Deepstream mainpage and even the links on that page reverted back to the mainpage. I couldn’t find anything. Was it me or was there some kind of error/malfunction going on?

willemvdkletersteeg · April 6, 2022, 10:14pm

I found it! Apparently I had to set the nvbuf memory type of some elements to unified memory (which it is). Found it by sheer luck in one of the examples. Weird how stuff like this isn’t properly documented. Seems quite basic.

        mem_type = int(pyds.NVBUF_MEM_CUDA_UNIFIED)
        streammux.set_property("nvbuf-memory-type", mem_type)
        nvvidconv.set_property("nvbuf-memory-type", mem_type)

willemvdkletersteeg · April 7, 2022, 7:04am

OK, I was too fast in my conclusions. I did get everything running, but sooner or later the application quits with another segfault in the Gstreamer pipeline thread. It happens even sooner when I start multiple docker containers with our application (which is supposed to be done if all works out). But it also happens when I just run one container with one instance of the application. This is the backtrace in GDB:

#0  0x00007fece9144565 in  () at /usr/lib/x86_64-linux-gnu/libcuda.so.1
#1  0x00007fece8ffe5e9 in  () at /usr/lib/x86_64-linux-gnu/libcuda.so.1
#2  0x00007fece8ea6cb1 in  () at /usr/lib/x86_64-linux-gnu/libcuda.so.1
#3  0x00007fece8f402d5 in  () at /usr/lib/x86_64-linux-gnu/libcuda.so.1
#4  0x00007fecee35605c in  () at /usr/local/cuda-11.4/lib64/libcudart.so.11.0
#5  0x00007fecee3a9f06 in cudaLaunchKernel () at /usr/local/cuda-11.4/lib64/libcudart.so.11.0
#6  0x00007fecea7319eb in _Z16cudaLaunchKernelIcE9cudaErrorPKT_4dim3S4_PPvmP11CUstream_st () at ///opt/nvidia/deepstream/deepstream-6.0/lib/libnvbufsurftransform.so
#7  0x00007fecea6c342e in __device_stub__Z20NV12_ER_to_RGB_cutexyyyPvS_S_iiiiiiiiiii(unsigned long long, unsigned long long, unsigned long long, void*, void*, void*, int, int, int, int, int, int, int, int, int, int, int) () at ///opt/nvidia/deepstream/deepstream-6.0/lib/libnvbufsurftransform.so
#8  0x00007fecea6c34c2 in NV12_ER_to_RGB_cutex(unsigned long long, unsigned long long, unsigned long long, void*, void*, void*, int, int, int, int, int, int, int, int, int, int, int) () at ///opt/nvidia/deepstream/deepstream-6.0/lib/libnvbufsurftransform.so
#9  0x00007fecea69135b in handleRAWInput () at ///opt/nvidia/deepstream/deepstream-6.0/lib/libnvbufsurftransform.so
#10 0x00007fecea7428dc in NvBufSurfTransform_GPU_CuTex () at ///opt/nvidia/deepstream/deepstream-6.0/lib/libnvbufsurftransform.so
#11 0x00007fecea744546 in NvBufSurfTransform_GPU () at ///opt/nvidia/deepstream/deepstream-6.0/lib/libnvbufsurftransform.so
#12 0x00007fecea73f4ca in NvBufSurfTransformAsync () at ///opt/nvidia/deepstream/deepstream-6.0/lib/libnvbufsurftransform.so
#13 0x00007fecea73fab4 in NvBufSurfTransform () at ///opt/nvidia/deepstream/deepstream-6.0/lib/libnvbufsurftransform.so
#14 0x00007fee587a03a8 in convert_batch_and_push_to_input_thread(_GstNvInfer*, GstNvInferBatch*, GstNvInferMemory*) ()
    at /usr/lib/x86_64-linux-gnu/gstreamer-1.0/deepstream/libnvdsgst_infer.so
#15 0x00007fee587a100e in gst_nvinfer_process_full_frame(_GstNvInfer*, _GstBuffer*, NvBufSurface*) ()
    at /usr/lib/x86_64-linux-gnu/gstreamer-1.0/deepstream/libnvdsgst_infer.so
#16 0x00007fee587a3e80 in gst_nvinfer_submit_input_buffer(_GstBaseTransform*, int, _GstBuffer*) ()
    at /usr/lib/x86_64-linux-gnu/gstreamer-1.0/deepstream/libnvdsgst_infer.so
#17 0x00007fed598fcb61 in  () at /usr/lib/x86_64-linux-gnu/libgstbase-1.0.so.0
#18 0x00007fed595f689b in  () at /usr/lib/x86_64-linux-gnu/libgstreamer-1.0.so.0
#19 0x00007fed595febc3 in gst_pad_push () at /usr/lib/x86_64-linux-gnu/libgstreamer-1.0.so.0
#20 0x00007fed598fccaf in  () at /usr/lib/x86_64-linux-gnu/libgstbase-1.0.so.0
#21 0x00007fed595f689b in  () at /usr/lib/x86_64-linux-gnu/libgstreamer-1.0.so.0
#22 0x00007fed595febc3 in gst_pad_push () at /usr/lib/x86_64-linux-gnu/libgstreamer-1.0.so.0
#23 0x00007fee72b17593 in gst_nvstreammux_src_push_loop () at /usr/lib/x86_64-linux-gnu/gstreamer-1.0/deepstream/libnvdsgst_multistream.so
#24 0x00007fed5962b279 in  () at /usr/lib/x86_64-linux-gnu/libgstreamer-1.0.so.0
#25 0x00007fed5a022c70 in  () at /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0
#26 0x00007fed5a0222a5 in  () at /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0
#27 0x00007fee76b626db in start_thread (arg=0x7fed7c522700) at pthread_create.c:463
#28 0x00007fee76e9b61f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

I get essentially the same backtrace every single time. I can’t make a lot of sense out of it. I’m not much of a C developer, nor do I have a lot of experience with GDB. If anyone has a thought about what to do next, I’d be very thankful.

FYI: the application is running on a beefy GPU server with 2 octocore CPU’s and 96GB of RAM. It has 3 RTX2080Ti cards wit 10GB vram, running Ubuntu 20.04LTS with the nvidia container runtime and 470 driver. The application we are developing has to do video processing/analysis on live (RTSP) streams and the server has to be able to do multiple streams. To do so, we run the application in a Deepstream (6.0.1) docker container.

When I run one instance of this container I get about 10% GPU utilization and about 8% memory usage on one RTX device. So theoretically there should be more than enough power to at least have about 15 instances running spread over these 3 devices with ample overhead. But seeing all these memory issues and segfaults I’m beginning to doubt if it’s even technically possible AT ALL to run multiple instance of a GPU intensive application on one device? What do I need to do to get this running smoothly and reliably?

Because the application does run, which is promising. Sometimes for 5 minutes, sometimes for an hour, but eventually it segfaults? It seems to me like some sort of race condition or sync problem between host and device memory. But will I be able to influence this in Python with the limited tools I have?

kesong · April 7, 2022, 2:07pm

There is no update from you for a period, assuming this is not an issue any more.
Hence we are closing this topic. If need further support, please open a new one.
Thanks

Seems crash in the preprocess of nvinfer. Can you share your pipeline? Can you disable some plugin in your pipeline to narrow down the issue?