I have a system based on the Xavier NX platform (specifically, a Lenovo ThinkEdge SE70 running Ubuntu 20.04 and JetPack 5.0.1) and a Docker image built on top of the nvcr.io/nvidia/deepstream-l4t:6.1.1-base image.
When I install the nvidia-container
meta package (per the instructions here) from the https://repo.download.nvidia.com/jetson/common r35.1 and https://repo.download.nvidia.com/jetson/t194 r35.1 repos, running the module with the following deployment manifest in IoT Hub works:
"inference": {
"version": "0.0.1",
"imagePullPolicy": "on-create",
"restartPolicy": "always",
"settings": {
"image": "reg.azurecr.io/inference:latest",
"createOptions": {
"HostConfig": {
"runtime": "nvidia",
"Binds": [
"/tmp/argus_socket:/tmp/argus_socket"
]
}
}
},
"status": "running",
"type": "docker"
},
However, Azure IoT Edge production checklist explicitly states that: “Only moby-engine is supported in production.”
With that in mind, on a fresh install if, instead of using the nvidia-container
package I use the recommended moby-engine
package (and install nvidia-docker2
as well since it would appear that this is needed), the container starts, but seems to have no access to the GPU.
I can find no instructions for getting a Deepstream 6.1.1-based container, running in a container/module via IoT Edge Hub on moby-engine. The sample from Azure shows getting a Deepstream 5.1 app running, and uses the old method of nvidia-docker
which mounts a bunch of stuff off the host, which I’m trying to avoid and wouldn’t work anyway without me installing Deepstream on the devices.
Other relevant reading:
- This issue in the moby repo provides essentially no help.
-
https://docs.nvidia.com/metropolis/deepstream/DeepStream_6.1.1_Release_Notes.pdf states that: “For the Jetson platform, omit installation of the Moby packages. Moby is
currently incompatible with NVIDIA Container Runtime.” Is this true? Are we relegated to using an unsupported runtime? - Neither this nor this recommendation on passing extra commands to the module/container startup either results in a container that doesn’t start at all and throws errors in the edgeAgent logs, or the same behaviour as above where the container starts but seems to have no access to the GPU so it crashes and restarts.
-
This issue talks about much the same thing and recommends e.g.,
sudo apt-get remove docker docker-engine docker.io containerd runc
but that doesn’t work. - This repo seems to be the closest, but adding all these extra lines in deployment.template.json doesn’t work either.
Any help here would be appreciated.
Thanks.