I am launching the NGC container using docker as instructed in the Git.
However, when I try to build the plugin libraries—specifically, running: make -j$(nproc), I run into the error: fatal error: cuda_runtime_api.h: No such file or directory
Cuda seems to work, as the nvidia-smi output is as expected:
Are you able to find cuda_runtime_api.h on bare metal?
I just tried to follow the steps in TensorRT/demo/Diffusion at main · NVIDIA/TensorRT · GitHub skipping the part " (Optional) Install latest TensorRT release", and could finish running make -j$(nproc) without the error.
I find the file on bare metal at: /usr/local/cuda-11.6/targets/sbsa-linux/include/cuda_runtime_api.h on the Clara AGX devkit, and at /usr/local/cuda-11.8/targets/sbsa-linux/include/cuda_runtime_api.h within the launched container.
No worries—not sure what the issue was earlier, but I followed the steps again and I am able to build successfully. I am able to locate the necessary cuda files on bare metal.
However, upon trying to launch the model (>python3 demo-diffusion.py --help) after building, I’ll receive the error: “ModuleNotFoundError: No module named ‘cuda’”, so it seems a cuda issue persists.
Thanks, I’ll raise an issue there.
Prior to launching the model, I’m getting the following error when trying to install requirements.txt in the container:
ERROR: Could not find a version that satisfies the requirement torch==1.12.1+cu116 (from versions: 1.8.0, 1.8.1, 1.9.0, 1.10.0, 1.10.1, 1.10.2, 1.11.0, 1.12.0, 1.12.1, 1.13.0, 1.13.1)
ERROR: No matching distribution found for torch==1.12.1+cu116
I’ve tried installing this requirement directly as instructed by the PyTorch website in the container, like so:
However, I get the same ‘No matching distribution error’. I’ll raise an issue on the CUDA forum about this, but just thought I’d mention this here. Thanks!
Oh I see, the torch installation on arm+dGPU could be a little tricky, I will ask around and see if someone has the install recipe. In the meanwhile, one thing that you could try is using the PyTorch base image nvcr.io/nvidia/pytorch:22.10-py3 instead of the TRT base image nvcr.io/nvidia/tensorrt:22.10-py3. The PyT base image supports both x86 and arm64, you could see the details here PyTorch | NVIDIA NGC
@rchand18 Following up on the previous message: the existing pip wheels would not work for the Clara AGX devkit, since their arm builds support Mac M1/M2. For using Pytorch on the devkit, you could build PyTorch from source, or use the NGC PyT container.
That error is likely because the 22.10 image has a higher CUDA/Pytorch version than 11.6/1.12.1. Please see PyTorch Release 22.10 for the software versions in each Docker image release. Perhaps you could try an earlier version of the PyTorch Docker image although that may require you to use an earlier version of the TensorRT Diffusion demo repo. Building from source can be a good option.