Cannot start nvcr.io/nvidia/deepstream:6.1-triton container in Docker Swarm

Hi Nvidia team,

I am trying to deploy the Docker Deepstream with the triton version in Docker Swarm Mode. However, the docker container quits right after the container started and I do not know the reason.

Here is the step that I tested whether the image can be deployed by Docker Swarm or not:

  1. Start docker swarm and execute the command “nvidia-smi” and sleep forever: “docker service create --replicas 1 --name swarm-gpu-test-ds nvcr.io/nvidia/deepstream:6.1-triton bash -c “nvidia-smi && sleep infinity””

  2. Then, the logs of the running container show:

===============================
DeepStreamSDK 6.1.0

*** LICENSE AGREEMENT ***
By using this software you agree to fully comply with the terms and conditions
of the License Agreement. The License Agreement is located at
/opt/nvidia/deepstream/deepstream/LicenseAgreement.pdf. If you do not agree
to the terms and conditions of the License Agreement do not use the software.

=============================
== Triton Inference Server ==

NVIDIA Release 22.03 (build 33743047)
Triton Server Version 2.20.0

Copyright (c) 2018-2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

  1. The logs did not show any error. However, the Docker Swarm shows that “Detected task failure” and initializes a different docker container.

However, when I run with a different docker image in Docker Swarm: docker service create --replicas 1 --name swarm-gpu-test-ds nvcr.io/nvidia/deepstream:6.1-samples bash -c “nvidia-smi && sleep infinity”, the docker swarm can start the container without any error, and the command “nvidia-smi” was executed, then the container was put to sleep.

I verify that the Nvidia Docker Container was installed successfully. I verify it by running the command: “sudo docker run --rm --runtime=nvidia --gpus all nvidia/cuda:11.6.2-base-ubuntu20.04 nvidia-smi”.

Also, the GPU is already enabled for Docker Swarm. I verify it by running: “docker service create --replicas 1 --name swarm-gpu-test-default nvidia/cuda:11.6.2-base-ubuntu20.04 bash -c “nvidia-smi && sleep infinity””.

Moreover, I can run the Docker Swarm with Triton image: “docker service create --replicas 1 --name swarm-gpu-test-ds nvcr.io/nvidia/tritonserver:22.02-py3 bash -c “nvidia-smi && sleep infinity”” without any problem.

Furthermore, I can use docker run or docker-compose to run deepstream nvcr.io/nvidia/deepstream:6.1-triton without any error.

I also tried with nvcr.io/nvidia/deepstream:6.0.1-triton and nvcr.io/nvidia/deepstream:6.2-triton, but none of them can be deployed.

What should I do so that I can deploy with the image “nvcr.io/nvidia/deepstream:6.1-triton”?

• Hardware Platform: GPU 1080ti, Ubuntu 20.04.5 LTS
• DeepStream Version: 6.0.1, 6.1, 6.2
• NVIDIA GPU Driver Version (valid for GPU only): Driver Version: 530.30.02; CUDA Version: 12.1

Can you make sure the driver version matches the DeepStream compatibility? Quickstart Guide — DeepStream 6.2 Release documentation. E.G. if you want to run DeepStream 6.1 docker, the driver version in the host should be 515.65.01.
Driver 530.30.02 is just a beta version.

Thank you for your reply.

I also try with Driver Version: 515.65.01 and reboot the computer. However, the exact same problem occurred.

In summary:

  • Command that work:
  1. docker service create --replicas 1 --name swarm-gpu-test-ds nvcr.io/nvidia/deepstream:6.1-samples bash -c “nvidia-smi && sleep infinity”
  2. sudo docker run --rm --runtime=nvidia --gpus all nvidia/cuda:11.6.2-base-ubuntu20.04 nvidia-smi
  3. docker service create --replicas 1 --name swarm-gpu-test-default nvidia/cuda:11.6.2-base-ubuntu20.04 bash -c “nvidia-smi && sleep infinity"
  4. docker service create --replicas 1 --name swarm-gpu-test-ds nvcr.io/nvidia/tritonserver:22.02-py3 bash -c “nvidia-smi && sleep infinity”
  5. docker-compose to run deepstream nvcr.io/nvidia/deepstream:6.1-triton without any error
  • The only container related to deepstream and triton does not work: docker service create --replicas 1 --name swarm-gpu-test-ds nvcr.io/nvidia/deepstream:6.1-triton bash -c “nvidia-smi && sleep infinity”

Can you guess what is the problem here?

We are investigating the issue, will be back when there is any progress.

Hi @Fiona.Chen ,

I figured out the error.
I noticed that the docker container started without any problem, but does not accept the command.
As a result, I explicitly passed the entrypoint to the command and it worked perfectly.

Therefore, the solution should be:

  • Instead of using:

docker service create --replicas 1 --name swarm-gpu-test-ds nvcr.io/nvidia/deepstream:6.1-triton bash -c “nvidia-smi && sleep infinity”

  • Please use:

docker service create --replicas 1 --name swarm-gpu-test-ds --entrypoint “bash -c ‘nvidia-smi && sleep infinity’” nvcr.io/nvidia/deepstream:6.1-triton

Thus, there should not be any further errors.
Thank you so much for your help.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.