Hi,
I have a Jetson AGX Orin flashed with JetPack 5.0.2 (L4T R35.1.0) and I’m using it to train a Deep Learning model with PyTorch. I used a base docker image from nvidia oficial repository: nvcr.io/nvidia/l4t-pytorch:r35.1.0-pth1.11-py3
Then I need nvidia DALI python package, and i followed the instructions in README.md of the github repository to install via pip package.
When the image is built, i execute using the following command:
docker run -it --runtime nvidia --net host <image_name>
And when executing my python script to train the following error appears:
dlopen libnvidia-ml.so failed!. Please install GPU dirver[/opt/dali/dali/util/nvml_wrap.cc:69] nvmlInitChecked failed:
Traceback (most recent call last):
File "src/train.py", line 286, in <module>
train(train_loop_func, logger, args)
File "src/train.py", line 148, in train
train_loader = get_train_loader(args, args.seed - 2**31)
File "/workspace/pytorch_nvidia/src/ssd/data.py", line 40, in get_train_loader
train_pipe.build()
File "/usr/local/lib/python3.8/dist-packages/nvidia/dali/pipeline.py", line 861, in build
self._pipe.Build(self._generate_build_args())
RuntimeError: nvml error (13): Local version of NVML doesn't implement this function
I’m missing something or how i can solve this error?