Error: Local version of NVML doesn't implement this function

maria.mercade · December 22, 2022, 2:55pm

I have a Jetson AGX Orin and I want to train an SSD model for object detection inside a docker container. To do so, I have built a pytorch_nvidia docker image for aarch64 tegra compatible with the Orin L4T version, which was flashed with JetPack 5.0.2.
Host versions info:

Cuda version: 11.4
L4T R35.1.0

This is the Dockerfile:
Dockerfile.tegra (1.4 KB)

To train the model I use the main.py script from github NVIDIA/ DeepLearningExamples/PyTorch/Detection/SSD
I get the following error:

dlopen libnvidia-ml.so failed!. Please install GPU dirver[/opt/dali/dali/util/nvml_wrap.cc:69] nvmlInitChecked failed: 
Traceback (most recent call last):
  File "src/train.py", line 286, in <module>
    train(train_loop_func, logger, args)
  File "src/train.py", line 148, in train
    train_loader = get_train_loader(args, args.seed - 2**31)
  File "/workspace/pytorch_nvidia/src/ssd/data.py", line 40, in get_train_loader
    train_pipe.build()
  File "/usr/local/lib/python3.8/dist-packages/nvidia/dali/pipeline.py", line 861, in build
    self._pipe.Build(self._generate_build_args())
RuntimeError: nvml error (13): Local version of NVML doesn't implement this function

I also tried to modify the installation of DALI compiling from source for cuda version 11.4 but the error persists. Any ideas?

AastaLLL · December 26, 2022, 2:20am

Hi,

Please try our container for Jetson below:

Thanks.

maria.mercade · January 9, 2023, 8:26am

Take a look at the Dockerfile I attached, you will see that I already use the image you mentioned as the base image. So this does not solve my problem. Any further ideas?

maria.mercade · January 17, 2023, 4:05pm

The problem remains unresolved. I have a project stuck because of this issue. Could you give me some ideas on how to fix it? Do you need any other information from me so that you can help me?

albert.arla · January 23, 2023, 9:47am

Hi all, I’m a co-worker of @maria.mercade . We have started from 0 with another NVIDIA Orin to see if the problem was because the initial instalation of the Orin. Same error.

AastaLLL · February 8, 2023, 2:51am

Hi,

Really sorry for the late update. (This ticket is somehow missing from our tracking)

The error is from DALI.
Have you tried DALI on Jetson before?

Based on the below doc, the package for Jetson needs to build from the source.
https://docs.nvidia.com/deeplearning/dali/user-guide/docs/support_matrix.html

Xavier | Not Available | 11.8 | Jetpack 5.0.2 | SM 5.3 and later | Jetpack 5.0.2 | Jetpack 5.0.2 | Python wheel can be build from source

Thanks.

system · March 7, 2023, 5:40am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Jetson Orin L4T Docker with DALI support Jetson AGX Orin python	3	724	March 10, 2023
JetPack 6.3 containerd and kubernetes Jetson AGX Orin nvbugs , containers	12	809	August 22, 2024
Jetson could not load library dlopen error in ROS container Jetson AGX Orin jetpack , tensorrt , ros	7	1462	September 14, 2022
NVIDIA-SMI couldn't find libnvidia-ml.so library in your system Jetson AGX Orin jetson-inference , docker	7	299	January 23, 2025
What are the steps to get up and running with dockerized GPU containers? Jetson AGX Orin pytorch , containers	4	105	January 29, 2025
GPU not accessible in a custom Docker container Jetson AGX Orin docker	4	315	November 5, 2024
No python bindings for nvurisrcbin DeepStream SDK	5	254	June 25, 2024
Unable to perform inference with PyTorch through docker container nvcr.io/nvidia/pytorch:22.10-py3 on Jetson Orin Jetson AGX Orin jetson-inference , pytorch	6	1002	November 15, 2022
L4t-jetpack:r36.3.0 docker image Unable to access GPU Jetson AGX Orin docker , gpu	4	37	January 14, 2025
GPUDirect RDMA - Module can not be insert into kernel Jetson AGX Orin pcie , kernel , nvbugs	27	4597	November 2, 2022

Error: Local version of NVML doesn't implement this function

Related topics