Thanks for the great hardware and tutorials.
I followed the “OpenPi π₀.₅ on Jetson Thor” tutorial on Jetson AI Lab, and I have two questions.
Note: My Jetson Thor will arrive in a few days. My current hardware is DGX Spark, so I followed the tutorial using DGX Spark for now.
Q1) Docker parameter
In Step 5: Launch the Docker Container, the Docker command uses the --runtime nvidia parameter.
However, my hardware does not seem to have that runtime available, so I used --gpus all instead.
Inference works well, and I do not see any obvious issues.
However, I am not sure what the nvidia runtime is used for. Is there any problem or performance degradation when using --gpus all instead of --runtime nvidia?
Q2) Missing serve_policy.py file for launching the inference server
In Step 14: (Optional) Launch Inference Server, the tutorial uses the following command to launch the inference server:
python openpi_on_thor/serve_policy.py \
...
However, I could not find serve_policy.py in the openpi_on_thor folder.
It would be very helpful if you could provide some guidance on how to build my own inference server Python file, possibly by referencing or forking the missing serve_policy.py.
Do you have this file?
cat /etc/docker/daemon.json
{
"default-runtime": "nvidia",
"runtimes": {
"nvidia": {
"args": [],
"path": "nvidia-container-runtime"
}
}
If not run this to see if the packages that are on Thor, are available to be installed or are installed on your DGX Spark.
apt search nvidia-container
This file exists
openpi/scripts/serve_policy.py
If you run this:
git clone --recurse-submodules https://github.com/Physical-Intelligence/openpi.git
cd openpi
git checkout 175f89c3
Hi,
Do you mean you are facing this issue on DGX Spark?
If so, we can move your topic to the corresponding board for better support.
Thanks.
Q1) My DGX spark doesn’t have /etc/docker/daemon.json file.
In addition, I don’t know why, but nvidia-container1 (with number 1) is on DGX.
I will check when I get Jetson Thor in this week.
Q2) I will copy the openpi/scripts/serve_policy.py and check whether the script runs properly on docker!!
Thanks!
Yes I’m trying the “Jetson AI Lab” tutorials on my DGX, so Q1 may be caused by differences between DGX Spark and Thor.
However, Q2) does not seem to be related to whether the environment is DGX or Jetson.
“openpi_on_thor” folder is downloaded using the following command, which is provided in Jetson AI Lab tutorial.
wget -qO- https://www.jetson-ai-lab.com/code-samples/openpi_on_thor/download.sh | bash
However, there is no serve_policy.py file in that folder, even though I followed the tutorial properly.
In addition, in the tutorial, contents of the downloaded folder is introduced as follows
ls openpi_on_thor/
# thor.Dockerfile pyproject.toml pi05_inference.py pytorch_to_onnx.py
# build_engine.sh trt_model_forward.py trt_torch.py calibration_data.py
# patches/apply_gemma_fixes.py
I guess there is some kind of jump/missing on instructions to copy serve_policy.py file to …/openpi_on_thor/ folder
I tried using serve_policy.py by copying it into the openpi_on_thor folder and rebuilding the Docker image.
However, I got the following error. It seems that the serve_policy.py file used in the tutorial was modified to use TensorRT.
Why --runtime=nvidia on Jetson
Unlike a discrete-GPU host (where --gpus all alone is enough — the
libnvidia-container shim auto-discovers /dev/nvidia* and the
matching driver libs), Jetson’s iGPU stack is bound to host kernel
drivers and is exposed to containers through a CSV-driven
mount mechanism owned by nvidia-container-runtime:
/etc/nvidia-container-runtime/host-files-for-container.d/
├── devices.csv # /dev/nvgpu, /dev/nvhost-*, /dev/nvmap, …
└── drivers.csv # /usr/lib/aarch64-linux-gnu/tegra/libcuda.so.*, …
Passing --runtime=nvidia is what activates that runtime, which in
turn parses the two CSV files at container start and bind-mounts
every listed device node and driver library from the Tegra host
into the container. Without the flag the standard runc starts the
container without those mounts; the result is no /dev/nvgpu, no
libcuda.so, and torch.cuda.is_available() returns False even
though nvidia-smi works on the host.
q2 u need set trt engie
Hi,
Sorry that the file is located on the server but has not been added to the download.sh.
Please try to download it manually with the command below:
$ wget -q https://www.jetson-ai-lab.com/code-samples/openpi_on_thor/serve_policy.py .
Thanks.
Thank you so much for your help !!
I will try it!!