Nvidia docker container doesn't work properly on L4T R32.6.1

Hello

I tried docker container on the latest L4T release R32.6.1. Unfortunately, it doesn’t work to me.

I followed the instructions described on the NGC. I used the l4t-ml container

  • Here is the command lines used:
$  sudo docker pull nvcr.io/nvidia/l4t-ml:r32.6.1-py3
$  sudo docker run -it --rm --runtime nvidia --network host nvcr.io/nvidia/l4t-ml:r32.6.1-py3
  • Kindly find the error logs
$ sudo docker run -it --rm --runtime nvidia --network host nvcr.io/nvidia/l4t-ml:r32.6.1-py3
docker: Error response from daemon: OCI runtime create failed: container_linux.go:345: starting container process caused "process_linux.go:430: container init caused \"process_linux.go:413: running prestart hook 0 caused \\\"error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: mount error: file creation failed: /var/lib/docker/overlay2/2823e4ebe73633f68773ec2acf4913243baf0d22263b9cd1b5e32a3d8068b29e/merged/usr/lib/aarch64-linux-gnu/libvulkan.so.1.2.141: file exists\\\\n\\\"\"": unknown.
root@jetson-agx-xavier-devkit:~# 

I removed --runtime nvidia from the latest command and It works fine.

$ docker run -it --rm --network host nvcr.io/nvidia/l4t-ml:r32.6.1-py3
allow 10 sec for JupyterLab to start @ http://192.168.0.24:8888 (password nvidia)
JupterLab logging location:  /var/log/jupyter.log  (inside the container) 

Unfortunately, I can not use torch and torchvision

  • Kindly find the error logs
root@jetson-agx-xavier-devkit:/# python3 
Python 3.6.9 (default, Jan 26 2021, 15:33:00) 
[GCC 8.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch 
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.6/dist-packages/torch/__init__.py", line 196, in <module>
    _load_global_deps()
  File "/usr/local/lib/python3.6/dist-packages/torch/__init__.py", line 149, in _load_global_deps
    ctypes.CDLL(lib_path, mode=ctypes.RTLD_GLOBAL)
  File "/usr/lib/python3.6/ctypes/__init__.py", line 348, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: libcurand.so.10: cannot open shared object file: No such file or directory
>>> import torchvision 
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.6/dist-packages/torchvision-0.10.0a0+300a8a4-py3.6-linux-aarch64.egg/torchvision/__init__.py", line 6, in <module>
    from torchvision import models
  File "/usr/local/lib/python3.6/dist-packages/torchvision-0.10.0a0+300a8a4-py3.6-linux-aarch64.egg/torchvision/models/__init__.py", line 1, in <module>
    from .alexnet import *
  File "/usr/local/lib/python3.6/dist-packages/torchvision-0.10.0a0+300a8a4-py3.6-linux-aarch64.egg/torchvision/models/alexnet.py", line 1, in <module>
    import torch
  File "/usr/local/lib/python3.6/dist-packages/torch/__init__.py", line 196, in <module>
    _load_global_deps()
  File "/usr/local/lib/python3.6/dist-packages/torch/__init__.py", line 149, in _load_global_deps
    ctypes.CDLL(lib_path, mode=ctypes.RTLD_GLOBAL)
  File "/usr/lib/python3.6/ctypes/__init__.py", line 348, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: libcurand.so.10: cannot open shared object file: No such file or directory
>>> 

I checked and I believe that cuda libraries are missing.

Any help would be appreciated.

Best regards,
Ilies

Docker is working fine with L4T R32.6.1.
I tested the l4t-ml container.
We should close this topic.
Ilies