Extra memory usage in GPU0 when testing deepstream-segmentation-app in GPU1

• Hardware Platform (GPU ) RTX3080 ×2
• DeepStream Version 6.0.1
• TensorRT Version 8.0.1.6
**• NVIDIA GPU Driver Version ** 470.82.01

• Issue Type( bugs)
We found some extra memory usage in GPU0(default) when using segmentation model in multi gpu application.

• How to reproduce the issue ?
/opt/nvidia/deepstream/deepstream/sources/apps/sample_apps/deepstream-segmentation-test
image
we set all the gpu-id as 1

lQLPJxaJmP7Rf01xzQIDsC102pu-3VGTAuKBm4mAYQA_515_113

./deepstream-segmentation-app dstest_segmentation_config_industrial.txt /opt/nvidia/deepstream/deepstream-6.0/samples/streams/sample_industrial.jpg

lQDPJxaJmP7Rf6M9zQJssPZb1aWs9B2xAuKBm6dAbgA_620_61

I am checking

  1. after checking, deepstream_segmentation_app1.c can reproduce this issue, deepstream_segmentation_app2.c can’t, the difference is nvinfer 's usage.
    deepstream_segmentation_app1.c (13.9 KB)
    deepstream_segmentation_app2.c (13.9 KB)

  2. check result with gst-launch-1.0 command:
    this command can’t reproduce.
    gst-launch-1.0 filesrc location=…/…/…/…/samples/streams/sample_720p.mjpeg ! jpegparse ! nvv4l2decoder gpu-id=1 ! mux.sink_0 nvstreammux name=mux batch-size=1 width=1280 height=720 gpu-id=1 ! fakesink
    this command can reproduce.
    gst-launch-1.0 filesrc location=…/…/…/…/samples/streams/sample_720p.mjpeg ! jpegparse ! nvv4l2decoder gpu-id=1 ! mux.sink_0 nvstreammux name=mux batch-size=1 width=1280 height=720 gpu-id=1 ! nvinfer config-file-path=/opt/nvidia/deepstream/deepstream/sources/apps/sample_apps/deepstream-test1/dstest1_pgie_config.txt gpu-id=1 ! fakesink

will continue to check.

it is a reproducible bug, nvinfer plugin is opensource, please use this workaround:

  1. modify /opt/nvidia/deepstream/deepstream-6.1/sources/gst-plugins/gst-nvinfer/gstnvinfer.cpp, like this:
    static gpointer gst_nvinfer_input_queue_loop (gpointer data)
    {
    GstNvInfer *nvinfer = (GstNvInfer *) data;
    cudaSetDevice (nvinfer->gpu_id);

    }
  2. compile, then copy libnvdsgst_infer.so to /opt/nvidia/deepstream/deepstream/lib/gst-plugins, backup old libnvdsgst_infer.so first.

@wlzkobe Can you let us know if the above workaround wor for your case? thanks.

No, it still doesn’t work

We’ve tested this workaround, but still the same result.

one correction:
2. compile, then copy libnvdsgst_infer.so to /opt/nvidia/deepstream/deepstream/lib/gst-plugins, backup old libnvdsgst_infer.so first.

We know the correct path since it’s a gstreamer plugin lib.

GPU 0 memory usage happend when the program running for a moment not the very early time.

  1. Using the workaround, is there improvement?
  2. Using the workaround, I can’t reproduce this issue by ./deepstream-segmentation-app -t infer dstest_segmentation_config_industrial.txt /opt/nvidia/deepstream/deepstream/samples/streams/sample_industrial.jpg, could you provide simplified code to reproduce? thanks!
  1. It seems no improvement.
  2. We also use this demo and the same command.

But our sdk version is 6.0.1 since the latest 6.1 need Ubuntu 20,

and this is the source code we modified.
deepstream_segmentation_app.c (13.8 KB)


image

my code deepstream_segmentation_app1.c is similar with yours.
you can test in ds6.1 docker , here is the link: Docker Containers — DeepStream 6.1.1 Release documentation

This problem is assured to be nvinfer plugin’s bug.
It can simply reprocude using gst-launch-1.0 command like this:

gst-launch-1.0 \
rtspsrc location=rtsp://RTSP_RESOURCE latency=200 drop-on-latency=1 ! rtph264depay ! \
nvv4l2decoder gpu-id=1 ! m.sink_0 \
nvstreammux gpu-id=1 name=m batch-size=1 width=1280 height=720 batched-push-timeout=40000 ! \
nvinfer gpu-id=1 config-file-path=INFER_CONFIG_FILE !```

This is annoying!

yes, can you try the fix in comment 5?

1 Like

I just testd the fix, and it works on 2080Ti / deepstream 6.1-dev / nvidia driver 510.

before fix
|    0   N/A  N/A     25172      C   gst-launch-1.0                    159MiB |
|    1   N/A  N/A     25172      C   gst-launch-1.0                    817MiB |

after fix
|    1   N/A  N/A     23992      C   gst-launch-1.0                    825MiB |

Would you further give a brief explanation? I’ve almost read through codes of gstnvinfer and nvinfer, but I can’t figure out how your one-line fix works.

thanks for your update, please refer to cudaSetDevice explanation,:CUDA Runtime API :: CUDA Toolkit Documentation need to Sets device as the current device in thread gst_nvinfer_input_queue_loop.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.