Deadlock when using DeepStream in Jetson AGX Origin Docker

Deeptream + docker image: deepstream:7.1-triton-multiarch took Deadlock.

We develop the video stream infer/OD base on:

and run it one Jetson with docker deepstream:7.1-triton-multiarch

and sometime it takes deadlock or suddenly stop FPS output:
Steps to recreate bug:

Add 4 cameras, each camera has 4 prompts and 1 roi
Rerun application until meet bug
Test data:
cURL.txt

Expected result:

  1. The application should reconnect when a camera FPS drop to 0
  2. The application should not stop logging the camera FPS. It should log camera FPS continuously when the application is running a camera
  3. The application should work normally after handle the camera FPS drop to 0. It should response after user call APIs, log alert to Rabbitmq and log output to file CSV
  4. The deadlock must not happen

Evidence: please check in the drive folder because the attach documents could not be uploaded here:

OS: Linux/Other Unix

Product Name: Jetson AGX Origin

Driver Version: 540.4.0

What Jetpack version?

head -n 1 /etc/nv_tegra_release

Is nvidia-container-toolkit installed?

sudo apt search nvidia-container-toolkit |grep install

cat /etc/docker/daemon.json
#Does yours have this contents? Format is lost so here’s file.
daemon.json.txt (150 Bytes)

{
“default-runtime”: “nvidia”,
“runtimes”: {
“nvidia”: {
“path”: “nvidia-container-runtime”,
“runtimeArgs”:
}
}
}

What is your command line to start the docker container? You could try this or variations of it.

docker run --rm -it \
–gpus all \
–runtime nvidia \
–ipc=host \
–shm-size=2g \ #or 3g, 4g
–ulimit memlock=-1 --ulimit stack=67108864 \
–env NVIDIA_VISIBLE_DEVICES=all \
–env NVIDIA_DRIVER_CAPABILITIES=compute,utility,video,graphics \
–env TRITONSERVER_LOG_VERBOSE=1 \
nvcr.io/nvidia/deepstream:7.1-triton-multiarch


Following from dusty-nv/jetson-containers/docs/setup.md
"
If you’re building containers or working with large models, it’s advisable to mount SWAP (typically correlated with the amount of memory in the board). Run these commands to disable ZRAM and create a swap file:

sudo systemctl disable nvzramconfig
sudo fallocate -l 16G /mnt/16GB.swap
sudo mkswap /mnt/16GB.swap
sudo swapon /mnt/16GB.swap

Then add the following line to the end of /etc/fstab to make the change persistent:

/mnt/16GB.swap  none  swap  sw 0  0

If you have NVME storage available, it’s preferred to allocate the swap file on NVME.

#Disabling the Desktop GUI

If you’re running low on memory, you may want to try disabling the Ubuntu desktop GUI. This will free up extra memory that the window manager and desktop uses (around ~800MB for Unity/GNOME or ~250MB for LXDE)

You can disable the desktop temporarily, run commands in the console, and then re-start the desktop when desired:

$ sudo init 3     # stop the desktop
# log your user back into the console (Ctrl+Alt+F1, F2, ect)
$ sudo init 5     # restart the desktop

If you wish to make this persistent across reboots, you can use the follow commands to change the boot-up behavior:

sudo systemctl set-default multi-user.target     # disable desktop on boot
sudo systemctl set-default graphical.target      # enable desktop on boot

"

1 Like

Thank you very much Whitesscott, we implemented as you guide and it run well. but after one fail setting, it stuck as showing in the attached log. please help us. @whitesscott
deepstream_build.log (44.4 KB)

On this nvidia ngc webpage under prerequisites there’s a .sh they suggest running and it shows an additional 2 docker run options shown below

Run the container:

  1. Allow external applications to connect to the host’s X display:
xhost +
  1. Run the docker container (use the desired container tag in the command line below):
    If using docker (recommended):
docker run -it --rm --network=host --runtime nvidia  -e DISPLAY=$DISPLAY -w /opt/nvidia/deepstream/deepstream-7.1 -v /tmp/.X11-unix/:/tmp/.X11-unix nvcr.io/nvidia/deepstream:7.1-triton-multiarch
WARNING: [TRT]: Tactic Device request: 160MB Available: 9MB. Device memory is insufficient to use tactiERROR: [TRT]: IBuilder::buildSerializedNetwork: Error Code 10: Internal Error (Could not find any implementation for node /vision_model/embeddings/patch_embedding/Conv.)

From your log, the memory is insufficient on your device. Could you just run that on your host without docker?

There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks

thank you very much

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.