Description
How can I deploy the jetson-inference project on a jetson nano using kubernetes?
Environment
TensorRT Version:
GPU Type:
Nvidia Driver Version:
CUDA Version:
CUDNN Version:
Operating System + Version:
Python Version (if applicable):
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):
Relevant Files
Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.)
Steps To Reproduce
Please include:
- Exact steps/commands to build your repro
- Exact steps/commands to run your repro
- Full traceback of errors encountered
Hi,
We are moving this post to the Jetson Nano forum to get better help.
Thank you.
1 Like
Hi,
This looks like a Jetson issue. Please refer to the below samples in case useful.
For any further assistance, we will move this post to to Jetson related forum.
Thanks!
Hello Aakanksha, yes I have seen these tutorials and I have managed to mount the jetson-inference in the jetson nano. But what I need is to deploy that same project using kubernetes.
@celzambranom I haven’t used Kubernetes before and hence am unfamiliar with what’s entailed, but given the jetson-inference Docker container, what do you need to deploy it? Is there something unique about doing it with jetson-inference or special steps required? Do you have an existing Kubernetes install running on your Jetson and have deployed other containers?
the jetson nano in this case is a worker node that is part of a kubernetes cluster. In this case I am trying through a kubernetes deployment to deploy the inference container on the jetson nano.
@celzambranom have you done that yet on your Nano with a generic container, or is there something particular to the jetson-inference container you are having an issue with? There are various resources/tutorials on google about setting Kubernetes up on Jetson Nano if that’s what you’re looking for.
Yes, I have already tested the jetson inference project on the jetson nano in isolation and it works perfectly. The problem is when trying to deploy the project from a kubernetes cluster. Perhaps the problem could be in the access to the GPU
Hi @celzambranom, what’s the specific issue or error that you are encountering? Have you been able to deploy other GPU-enabled containers via Kubernetes (like l4t-base, l4t-pytorch, ect)?
The jetson-inference container has some additional directories and devices mounted by it’s docker/run.sh script, so I would inspect that script and the actual docker run command that it launches to make sure that your configuration matches.
From what I understand, I would have to mount all those commands from the docker/run.sh file in the kubernetes deployments.yaml template to create the pods with the jetson-inference project running, is that correct?
@celzambranom yes that would be ideal - if you don’t mount jetson-inference/data
, then every time you exit/restart the container, it will need to re-download the models and re-build their TensorRT engines. The other mounts you can skip if you aren’t using them.
Hi dusty_nv, I have finally managed to mount the jetson-inference container on top of the jetson nano using kubernetes. But now it’s giving me this error:
./imagenet: error while loading shared libraries: /usr/lib/aarch64-linux-gnu/libnvinfer.so.8: file too short
thanks for your help
Hi @celzambranom - is kubernetes launching the docker container with --runtime nvidia
? On JetPack 4, the CUDA/cuDNN/TensorRT libraries get mounted into the containers from the host device by the NVIDIA Container Runtime.
Hello Dusty_nv, the error is being given once I execute the following line (./detectnet images/peds_0.jpg images/test/peds_0.jpg)
directly on the jetson nano after mounting the container directly on the jetson nano without using kubernetes. I also have the runtime configured in the daemon.json file
@celzambranom I don’t think the error is related to jetson-inference, I think that the CUDA/cuDNN/TensorRT libraries aren’t being mounted into the containers correctly by the NVIDIA Runtime. Are you able to run any CUDA or TensorRT stuff? Try running /usr/src/tensorrt/bin/trtexec -h
in the container. And try l4t-base container too.
Outside of container, do you have /usr/lib/aarch64-linux-gnu/libnvinfer.so.8
installed?
Do you have CSV files under /etc/nvidia-container-runtime/host-files-for-container.d/
?
ls /etc/nvidia-container-runtime/host-files-for-container.d/
cuda.csv cudnn.csv l4t.csv tensorrt.csv visionworks.csv
Hello, inside the container the directory /usr/src/tensorrt/ is empty, so I can’t execute /usr/src/tensorrt/bin/trtexec
Now, outside of the container I have /usr/lib/aarch64-linux-gnu/libnvinfer.so.8 installed and I have the CSV files at the address /etc/nvidia-container-runtime/host-files-for-container.d /
Greetings
OK gotcha - what’s the command you are using to start the container when you are testing it directly with docker run
? It includes the --runtime nvidia
flag right? If so, I’m unsure what about your environment or docker setup is leading to these libraries not being mounted in container, and if perhaps the installation of kubernetes overwrote some configs or upgraded some packages. In that case, you might want to try a fresh SD card of JetPack to get the NVIDIA Container Runtime again (or re-install it)
Run the script found in docker/run.sh, so it already includes --runtime nvidia.
How can I reinstall NVIDIA Container Runtime??
You can try reinstalling these packages:
$ apt-cache search nvidia-container
libnvidia-container-tools - NVIDIA container runtime library (command-line tools)
libnvidia-container0 - NVIDIA container runtime library
libnvidia-container1 - NVIDIA container runtime library
nvidia-container-csv-cuda - Jetpack CUDA CSV file
nvidia-container-csv-cudnn - Jetpack CUDNN CSV file
nvidia-container-csv-tensorrt - Jetpack TensorRT CSV file
nvidia-container-csv-visionworks - Jetpack VisionWorks CSV file
nvidia-container-runtime - NVIDIA container runtime
nvidia-container-toolkit - NVIDIA container runtime hook
nvidia-container - NVIDIA Container Meta Package
If it doesn’t work, I would recommend flashing a fresh SD card with the JetPack image, and confirm that the container runtime is working for you from the outset. You can also check it with the l4t-base container too.
I’ll do that, thank you very much and I’ll let you know