Error: Local version of NVML doesn't implement this function

I have a Jetson AGX Orin and I want to train an SSD model for object detection inside a docker container. To do so, I have built a pytorch_nvidia docker image for aarch64 tegra compatible with the Orin L4T version, which was flashed with JetPack 5.0.2.
Host versions info:

  • Cuda version: 11.4
  • L4T R35.1.0

This is the Dockerfile:
Dockerfile.tegra (1.4 KB)

To train the model I use the main.py script from github NVIDIA/ DeepLearningExamples/PyTorch/Detection/SSD
I get the following error:

dlopen libnvidia-ml.so failed!. Please install GPU dirver[/opt/dali/dali/util/nvml_wrap.cc:69] nvmlInitChecked failed: 
Traceback (most recent call last):
  File "src/train.py", line 286, in <module>
    train(train_loop_func, logger, args)
  File "src/train.py", line 148, in train
    train_loader = get_train_loader(args, args.seed - 2**31)
  File "/workspace/pytorch_nvidia/src/ssd/data.py", line 40, in get_train_loader
    train_pipe.build()
  File "/usr/local/lib/python3.8/dist-packages/nvidia/dali/pipeline.py", line 861, in build
    self._pipe.Build(self._generate_build_args())
RuntimeError: nvml error (13): Local version of NVML doesn't implement this function

I also tried to modify the installation of DALI compiling from source for cuda version 11.4 but the error persists. Any ideas?

3 Likes

Hi,

Please try our container for Jetson below:

Thanks.

Take a look at the Dockerfile I attached, you will see that I already use the image you mentioned as the base image. So this does not solve my problem. Any further ideas?

1 Like

The problem remains unresolved. I have a project stuck because of this issue. Could you give me some ideas on how to fix it? Do you need any other information from me so that you can help me?

1 Like

Hi all, I’m a co-worker of @maria.mercade . We have started from 0 with another NVIDIA Orin to see if the problem was because the initial instalation of the Orin. Same error.

Hi,

Really sorry for the late update. (This ticket is somehow missing from our tracking)

The error is from DALI.
Have you tried DALI on Jetson before?

Based on the below doc, the package for Jetson needs to build from the source.
https://docs.nvidia.com/deeplearning/dali/user-guide/docs/support_matrix.html

Xavier | Not Available | 11.8 | Jetpack 5.0.2 | SM 5.3 and later | Jetpack 5.0.2 | Jetpack 5.0.2 | Python wheel can be build from source

Thanks.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.