WSL Modulus Docker run error (libnvidia-ml.so.1: file exists: unknown.)

Hi. I’m trying to use Modulus with docker on wsl2 ubuntu20.04 (windows11)
And I have a problem.
Running docker with below command

docker run --gpus all -v ${PWD}/examples:/examples -it --rm nvcr.io/nvidia/modulus/modulus:22.09 bash

Then an error like this is coming

docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as ‘legacy’
nvidia-container-cli: mount error: file creation failed: /var/lib/docker/overlay2/d34848e7089996bdb31f9dd8ce55a3e27c6446eee30259c33ffce6ba4777833a/merged/usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1: file exists: unknown.

how could I solve this?

I’m using RTX 3060, 12.1 CUDA version

I think it’s not the problem with gpu or drivers.

sudo docker run --rm --gpus all nvidia/cuda:11.0.3-base-ubuntu20.04 nvidia-smi
with this command

Fri Jun 9 06:03:12 2023
±--------------------------------------------------------------------------------------+
| NVIDIA-SMI 530.47 Driver Version: 531.68 CUDA Version: 12.1 |
|-----------------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce RTX 3060 On | 00000000:01:00.0 On | N/A |
| 0% 47C P8 18W / 170W| 868MiB / 8192MiB | 0% Default |
| | | N/A |
±----------------------------------------±---------------------±---------------------+

±--------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 20 G /Xwayland N/A |
| 0 N/A N/A 26 G /Xwayland N/A |
| 0 N/A N/A 595 G /Xwayland N/A |
±--------------------------------------------------------------------------------------+

this result comes out

Hi @kdg5424

Looks like our Nvidia docker is giving you some troubles on WSL. We don’t test or officially support WSL with the Modulus container but consider having a look at this relevant Github issue with some possible solutions:

Also the Nvidia Modulus container is not on CUDA 12.0 yet, but I am not sure if this is the issue. You could consider a pip install.

Interesting. Consider trying the Nvidia Pytorch base container that we build from to see if that works fine. If it does we know its some issue with the Modulus container (although a fix is unknown).

nvcr.io/nvidia/pytorch:22.12-py3

Hi, @ngeneva .
I tried as you told me.
And it seems working.

kdg@DESKTOP-7ICQ4NK:~$ docker run --gpus all -it --rm nvcr.io/nvidia/pytorch:22.12-py3

=============
== PyTorch ==

NVIDIA Release 22.12 (build 49968248)
PyTorch Version 1.14.0a0+410ce96

Container image Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

Copyright (c) 2014-2022 Facebook Inc.
Copyright (c) 2011-2014 Idiap Research Institute (Ronan Collobert)
Copyright (c) 2012-2014 Deepmind Technologies (Koray Kavukcuoglu)
Copyright (c) 2011-2012 NEC Laboratories America (Koray Kavukcuoglu)
Copyright (c) 2011-2013 NYU (Clement Farabet)
Copyright (c) 2006-2010 NEC Laboratories America (Ronan Collobert, Leon Bottou, Iain Melvin, Jason Weston)
Copyright (c) 2006 Idiap Research Institute (Samy Bengio)
Copyright (c) 2001-2004 Idiap Research Institute (Ronan Collobert, Samy Bengio, Johnny Mariethoz)
Copyright (c) 2015 Google Inc.
Copyright (c) 2015 Yangqing Jia
Copyright (c) 2013-2016 The Caffe contributors
All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:

NOTE: The SHMEM allocation limit is set to the default of 64MB. This may be
insufficient for PyTorch. NVIDIA recommends the use of the following flags:
docker run --gpus all --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 …

root@7fff099603ff:/workspace#

And I think it looks lie some issue with the Modulus container.

1 Like

@ngeneva @kdg5424

This has been a known problem with the Modulus containers for some time. The Pytorch container has always worked without issue.

For Modulus 22.09 you had to remove some of the injected files included in the container. Here’s the below dockerfile to generate a working 22.09 container from the existing one

FROM nvcr.io/nvidia/modulus/modulus:22.09

RUN rm -rf \
    /usr/lib/x86_64-linux-gnu/libcuda.so* \
    /usr/lib/x86_64-linux-gnu/libnvcuvid.so* \
    /usr/lib/x86_64-linux-gnu/libnvidia-*.so* \
    /usr/local/cuda/compat/lib/*.515.65.01

2 Likes

Thanks for reply.
It works well.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.