Nvidia-container-cli error

I’m currently installing WSL2 Ubuntu 20.04 within Windows 10 Insider Preview Build 21376.1, and faced same issue like below.

# sudo nvidia-container-cli -k -d /dev/tty info

-- WARNING, the following logs are for debugging purposes only --

I0512 07:05:58.485519 5592 nvc.c:372] initializing library context (version=1.4.0, build=704a698b7a0ceec07a48e56c37365c741718c2df)
I0512 07:05:58.485581 5592 nvc.c:346] using root /
I0512 07:05:58.485601 5592 nvc.c:347] using ldcache /etc/ld.so.cache
I0512 07:05:58.485604 5592 nvc.c:348] using unprivileged user 65534:65534
I0512 07:05:58.485659 5592 nvc.c:389] attempting to load dxcore to see if we are running under Windows Subsystem for Linux (WSL)
I0512 07:05:58.502183 5592 dxcore.c:226] Creating a new WDDM Adapter for hAdapter:40000000 luid:185673b
I0512 07:05:58.512924 5592 dxcore.c:267] Adding new adapter via dxcore hAdapter:40000000 luid:185673b wddm version:3000
I0512 07:05:58.512956 5592 dxcore.c:325] dxcore layer initialized successfully
W0512 07:05:58.513266 5592 nvc.c:397] skipping kernel modules load on WSL
I0512 07:05:58.513404 5593 driver.c:101] starting driver service
E0512 07:05:58.521931 5593 driver.c:168] could not start driver service: load library failed: /usr/lib/wsl/drivers/nv_dispi.inf_amd64_43efafcd74b2efc9/libnvidia-ml.so.1: undefined symbol: devicesetgpcclkvfoffset
I0512 07:05:58.522047 5592 driver.c:203] driver service terminated successfully
nvidia-container-cli: initialization error: driver error: failed to process request

and I found out that apt-get experimental repository and stable one is mixed out - failing to install WSL2 one (which is available in experimental repository). as stable package is released, experimental(=WSL2) package should be released, but It wasn’t. see apt-cache madison result below.

# apt-cache madison libnvidia-container1
libnvidia-container1 |    1.4.0-1 | https://nvidia.github.io/libnvidia-container/stable/ubuntu18.04/amd64  Packages
libnvidia-container1 |    1.3.3-1 | https://nvidia.github.io/libnvidia-container/stable/ubuntu18.04/amd64  Packages
libnvidia-container1 | 1.3.3~rc.2-1 | https://nvidia.github.io/libnvidia-container/experimental/ubuntu18.04/amd64  Packages
libnvidia-container1 | 1.3.3~rc.1-1 | https://nvidia.github.io/libnvidia-container/experimental/ubuntu18.04/amd64  Packages
libnvidia-container1 |    1.3.2-1 | https://nvidia.github.io/libnvidia-container/stable/ubuntu18.04/amd64  Packages
libnvidia-container1 |    1.3.1-1 | https://nvidia.github.io/libnvidia-container/stable/ubuntu18.04/amd64  Packages

The point is, we should install libnvidia-container1=1.3.3~rc.2-1 version rather than libnvidia-container1=1.4.0-1. but normal apt-get install nvidia-docker2 command would install 1.4.0-1 version, and It mostly like to fail in WSL2 environment.

I successfully installed older (WSL2-exclusive) packages via apt-get install command below:

apt-get install \
    libnvidia-container1=1.3.3~rc.2-1 \
    libnvidia-container-tools=1.3.3~rc.2-1 \
    nvidia-container-toolkit=1.4.1-1 \
    nvidia-container-runtime=3.4.1-1 \
    nvidia-docker2=2.5.0-1

Those commands will install older packages, and after sudo service docker stop and sudo service docker start, CUDA will work inside docker.

7 Likes