Hardware decoding and multi-GPU through Docker

Hi all,

I could pinpoint an issue with our Deepstream pipeline, where in a multi GPU environment some of the pipes stalled at startup
The setup involves reservation of single GPU per container and using several containers, one per GPU / instance of deepstream

I was able to reproduce the problem without any code of ours, with this docker-compose :

services:
  cam0:
    image: nvcr.io/nvidia/deepstream:7.1-triton-multiarch
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              device_ids: ["GPU-64955b11-02d1-f07f-93a1-660a632e1a56"]
              capabilities: [compute, utility, video, graphics]
    entrypoint:
      - gst-launch-1.0
      - uridecodebin
      - uri=rtsp://alfred:alfred1326@10.22.56.200:554/cam/realmonitor?channel=1&subtype=0
      - name=srcbin
      - '!'
      - fakesink

  cam1:
    image: nvcr.io/nvidia/deepstream:7.1-triton-multiarch
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              device_ids: ["GPU-40253b63-70d6-fd7f-e0a0-a060dc869950"]
              capabilities: [compute, utility, video, graphics]
    entrypoint:
      - gst-launch-1.0
      - uridecodebin
      - uri=rtsp://alfred:alfred1326@10.22.56.200:554/cam/realmonitor?channel=1&subtype=0
      - name=srcbin
      - '!'
      - fakesink

Change the device ids and the RTSP urls, and the problem should appear.
nvidia-smi dmon will confirm that only a single GPU uses its hardware decoder
Using a different sink will confirm that only a single GPU is actually running

Doing the same thing in a single docker container, though, will not cause any problem :
if I run that image, and exec the following processes simultaneously inside of it :
CUDA_VISIBLE_DEVICES=0 gst-launch-1.0 uridecodebin uri=rtsp://alfred:alfred1326@10.22.56.200:554/cam/realmonitor?channel=1&subtype=0 ! fakesink
CUDA_VISIBLE_DEVICES=1 gst-launch-1.0 uridecodebin uri=rtsp://alfred:alfred1326@10.22.56.200:554/cam/realmonitor?channel=1&subtype=0 ! fakesink
everything runs correctly

Now we plan, for now, to just switch to a single container obviously, but we find it slightly unsatisfying to have to do that, when Docker expressedly encourages a single process per container…

Can you confirm the buggy behavior? Any chance it could be fixed, assuming the problem is somewhere in your IP?

1, do you mean after running docker-compose, the cmd hung in one docker container, and the cmd was running in the other container? did the issue happen on a fixed GPU?
2. about " nvidia-smi dmon will confirm that only a single GPU uses its hardware decoder", what are GPU model? In the docker, can you use ‘top’ to see the gst-launch cmd? maybe the decoder utilizaiton is 0 because the decoder workload is light.
3. to narrow down this issue, could you provide some running logs? you can use “gst-launch-1.0 --gst-debug-level=6” to get more logs.