Nvidia-container-cli error

if I run

sudo nvidia-container-cli -k -d /dev/tty info

I got this error
where is ‘devicesetgpcclkvfoffset’ symbol?

– WARNING, the following logs are for debugging purposes only –

I0509 06:20:58.858216 1009 nvc.c:372] initializing library context (version=1.4.0, build=704a698b7a0ceec07a48e56c37365c741718c2df)
I0509 06:20:58.858281 1009 nvc.c:346] using root /
I0509 06:20:58.858287 1009 nvc.c:347] using ldcache /etc/ld.so.cache
I0509 06:20:58.858293 1009 nvc.c:348] using unprivileged user 65534:65534
I0509 06:20:58.858311 1009 nvc.c:389] attempting to load dxcore to see if we are running under Windows Subsystem for Linux (WSL)
I0509 06:20:58.891385 1009 dxcore.c:226] Creating a new WDDM Adapter for hAdapter:40000000 luid:1e946dc
I0509 06:20:58.917427 1009 dxcore.c:267] Adding new adapter via dxcore hAdapter:40000000 luid:1e946dc wddm version:3000
I0509 06:20:58.917515 1009 dxcore.c:325] dxcore layer initialized successfully
W0509 06:20:58.918854 1009 nvc.c:397] skipping kernel modules load on WSL
I0509 06:20:58.919404 1010 driver.c:101] starting driver service
E0509 06:20:58.950931 1010 driver.c:168] could not start driver service: load library failed: /usr/lib/wsl/drivers/nv_dispi.inf_amd64_43efafcd74b2efc9/libnvidia-ml.so.1: undefined symbol: devicesetgpcclkvfoffset
I0509 06:20:58.951399 1009 driver.c:203] driver service terminated successfully
nvidia-container-cli: initialization error: driver error: failed to process request

I’m currently installing WSL2 Ubuntu 20.04 within Windows 10 Insider Preview Build 21376.1, and faced same issue like below.

# sudo nvidia-container-cli -k -d /dev/tty info

-- WARNING, the following logs are for debugging purposes only --

I0512 07:05:58.485519 5592 nvc.c:372] initializing library context (version=1.4.0, build=704a698b7a0ceec07a48e56c37365c741718c2df)
I0512 07:05:58.485581 5592 nvc.c:346] using root /
I0512 07:05:58.485601 5592 nvc.c:347] using ldcache /etc/ld.so.cache
I0512 07:05:58.485604 5592 nvc.c:348] using unprivileged user 65534:65534
I0512 07:05:58.485659 5592 nvc.c:389] attempting to load dxcore to see if we are running under Windows Subsystem for Linux (WSL)
I0512 07:05:58.502183 5592 dxcore.c:226] Creating a new WDDM Adapter for hAdapter:40000000 luid:185673b
I0512 07:05:58.512924 5592 dxcore.c:267] Adding new adapter via dxcore hAdapter:40000000 luid:185673b wddm version:3000
I0512 07:05:58.512956 5592 dxcore.c:325] dxcore layer initialized successfully
W0512 07:05:58.513266 5592 nvc.c:397] skipping kernel modules load on WSL
I0512 07:05:58.513404 5593 driver.c:101] starting driver service
E0512 07:05:58.521931 5593 driver.c:168] could not start driver service: load library failed: /usr/lib/wsl/drivers/nv_dispi.inf_amd64_43efafcd74b2efc9/libnvidia-ml.so.1: undefined symbol: devicesetgpcclkvfoffset
I0512 07:05:58.522047 5592 driver.c:203] driver service terminated successfully
nvidia-container-cli: initialization error: driver error: failed to process request

and I found out that apt-get experimental repository and stable one is mixed out - failing to install WSL2 one (which is available in experimental repository). as stable package is released, experimental(=WSL2) package should be released, but It wasn’t. see apt-cache madison result below.

# apt-cache madison libnvidia-container1
libnvidia-container1 |    1.4.0-1 | https://nvidia.github.io/libnvidia-container/stable/ubuntu18.04/amd64  Packages
libnvidia-container1 |    1.3.3-1 | https://nvidia.github.io/libnvidia-container/stable/ubuntu18.04/amd64  Packages
libnvidia-container1 | 1.3.3~rc.2-1 | https://nvidia.github.io/libnvidia-container/experimental/ubuntu18.04/amd64  Packages
libnvidia-container1 | 1.3.3~rc.1-1 | https://nvidia.github.io/libnvidia-container/experimental/ubuntu18.04/amd64  Packages
libnvidia-container1 |    1.3.2-1 | https://nvidia.github.io/libnvidia-container/stable/ubuntu18.04/amd64  Packages
libnvidia-container1 |    1.3.1-1 | https://nvidia.github.io/libnvidia-container/stable/ubuntu18.04/amd64  Packages

The point is, we should install libnvidia-container1=1.3.3~rc.2-1 version rather than libnvidia-container1=1.4.0-1. but normal apt-get install nvidia-docker2 command would install 1.4.0-1 version, and It mostly like to fail in WSL2 environment.

I successfully installed older (WSL2-exclusive) packages via apt-get install command below:

apt-get install \
    libnvidia-container1=1.3.3~rc.2-1 \
    libnvidia-container-tools=1.3.3~rc.2-1 \
    nvidia-container-toolkit=1.4.1-1 \
    nvidia-container-runtime=3.4.1-1 \
    nvidia-docker2=2.5.0-1

Those commands will install older packages, and after sudo service docker stop and sudo service docker start, CUDA will work inside docker.

7 Likes

Installing the new Docker Desktop update led to an “OCI runtime create failed” as above error when starting the container (in my case, via the docker-compose). At the same time, these same containers are launched and work successfully under pure Linux.

I tried everything - I reinstalled the Nvidia driver on Windows, reinstalled the docker, the docker-compose, the nvidia container toolkit - nothing helped.

BUT!

  1. By installing the old version as suggested above:
apt-get install \
libnvidia-container1=1.3.3~rc.2-1 \
libnvidia-container-tools=1.3.3~rc.2-1 \
nvidia-container-toolkit=1.4.1-1 \
nvidia-container-runtime=3.4.1-1 \
nvidia-docker2=2.5.0-1
  1. Restarting Docker.
  2. By running Docker manually in WSL 2 in ubuntu to create a new context (under the old Docker-Desktop-created does not work until now).
  3. Rebuild container-CUDA-GPU in this new context with reinstalled drivers to the old version (as indicated above) - everything worked!

Dont know what is realy problem - DD or drivers…

1 Like

I tried the lower version suggested by ji5489.
now I can run ‘sudo nvidia-container-cli -k -d /dev/tty info’ without any problem
and also ‘sudo docker run --gpus all --env NVIDIA_DISABLE_REQUIRE=1 nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark’ command works fine.

By the way ‘nvidia-smi’ problem still remains.

Thanks ji5489.

2 Likes