WSL Modulus Docker run error (libnvidia-ml.so.1: file exists: unknown.)

kdg5424 · June 9, 2023, 6:00am

Hi. I’m trying to use Modulus with docker on wsl2 ubuntu20.04 (windows11)
And I have a problem.
Running docker with below command

docker run --gpus all -v ${PWD}/examples:/examples -it --rm nvcr.io/nvidia/modulus/modulus:22.09 bash

Then an error like this is coming

docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as ‘legacy’
nvidia-container-cli: mount error: file creation failed: /var/lib/docker/overlay2/d34848e7089996bdb31f9dd8ce55a3e27c6446eee30259c33ffce6ba4777833a/merged/usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1: file exists: unknown.

how could I solve this?

I’m using RTX 3060, 12.1 CUDA version

kdg5424 · June 9, 2023, 6:06am

I think it’s not the problem with gpu or drivers.

sudo docker run --rm --gpus all nvidia/cuda:11.0.3-base-ubuntu20.04 nvidia-smi
with this command

±--------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 20 G /Xwayland N/A |
| 0 N/A N/A 26 G /Xwayland N/A |
| 0 N/A N/A 595 G /Xwayland N/A |
±--------------------------------------------------------------------------------------+

this result comes out

ngeneva · June 9, 2023, 6:07am

Hi @kdg5424

Looks like our Nvidia docker is giving you some troubles on WSL. We don’t test or officially support WSL with the Modulus container but consider having a look at this relevant Github issue with some possible solutions:

github.com/NVIDIA/nvidia-docker

docker: Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed

opened 07:25AM - 21 Jun 22 UTC

closed 11:28AM - 30 Jun 22 UTC

zyr-NULL

### Issue or feature description when i use docker to create container, i get t…his error ``` docker: Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy' nvidia-container-cli: initialization error: driver rpc error: timed out: unknown. ``` ### Steps to reproduce the issue * when i executed the following command `sudo docker run --rm --gpus all nvidia/cuda:11.0.3-base-ubuntu20.04 nvidia-smi` * i get the following error `docker: Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy' nvidia-container-cli: initialization error: driver rpc error: timed out: unknown.` * but when i executed this following command, it has the expected output `sudo docker run hello-world` ### here is some Information * Some nvidia-container information ``` gpu-server@gpu-server:~$ nvidia-container-cli -k -d /dev/tty info -- WARNING, the following logs are for debugging purposes only -- I0621 07:07:51.735875 4789 nvc.c:376] initializing library context (version=1.10.0, build=395fd41701117121f1fd04ada01e1d7e006a37ae) I0621 07:07:51.735941 4789 nvc.c:350] using root / I0621 07:07:51.735947 4789 nvc.c:351] using ldcache /etc/ld.so.cache I0621 07:07:51.735963 4789 nvc.c:352] using unprivileged user 1000:1000 I0621 07:07:51.735984 4789 nvc.c:393] attempting to load dxcore to see if we are running under Windows Subsystem for Linux (WSL) I0621 07:07:51.736064 4789 nvc.c:395] dxcore initialization failed, continuing assuming a non-WSL environment W0621 07:07:51.739205 4791 nvc.c:273] failed to set inheritable capabilities W0621 07:07:51.739329 4791 nvc.c:274] skipping kernel modules load due to failure I0621 07:07:51.739807 4793 rpc.c:71] starting driver rpc service W0621 07:08:16.774958 4789 rpc.c:121] terminating driver rpc service (forced) I0621 07:08:20.481845 4789 rpc.c:135] driver rpc service terminated with signal 15 nvidia-container-cli: initialization error: driver rpc error: timed out I0621 07:08:20.481972 4789 nvc.c:434] shutting down library context ``` * Kernel version `Linux gpu-server 4.15.0-187-generic #198-Ubuntu SMP Tue Jun 14 03:23:51 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux` * Driver information ``` ==============NVSMI LOG============== Timestamp : Tue Jun 21 07:13:57 2022 Driver Version : 515.48.07 CUDA Version : 11.7 Attached GPUs : 4 GPU 00000000:01:00.0 Product Name : NVIDIA A100-SXM4-40GB Product Brand : NVIDIA Product Architecture : Ampere Display Mode : Disabled Display Active : Disabled Persistence Mode : Disabled MIG Mode Current : Disabled Pending : Disabled Accounting Mode : Disabled Accounting Mode Buffer Size : 4000 Driver Model Current : N/A Pending : N/A Serial Number : 1561221014674 GPU UUID : GPU-b67da01e-feba-d839-62c5-2773d4e963f0 Minor Number : 0 VBIOS Version : 92.00.19.00.13 MultiGPU Board : No Board ID : 0x100 GPU Part Number : 692-2G506-0202-002 Module ID : 3 Inforom Version Image Version : G506.0202.00.02 OEM Object : 2.0 ECC Object : 6.16 Power Management Object : N/A GPU Operation Mode Current : N/A Pending : N/A GSP Firmware Version : 515.48.07 GPU Virtualization Mode Virtualization Mode : None Host VGPU Mode : N/A IBMNPU Relaxed Ordering Mode : N/A PCI Bus : 0x01 Device : 0x00 Domain : 0x0000 Device Id : 0x20B010DE Bus Id : 00000000:01:00.0 Sub System Id : 0x144E10DE GPU Link Info PCIe Generation Max : 4 Current : 4 Link Width Max : 16x Current : 16x Bridge Chip Type : N/A Firmware : N/A Replays Since Reset : 0 Replay Number Rollovers : 0 Tx Throughput : 0 KB/s Rx Throughput : 0 KB/s Fan Speed : N/A Performance State : P0 Clocks Throttle Reasons Idle : Active Applications Clocks Setting : Not Active SW Power Cap : Not Active HW Slowdown : Not Active HW Thermal Slowdown : Not Active HW Power Brake Slowdown : Not Active Sync Boost : Not Active SW Thermal Slowdown : Not Active Display Clock Setting : Not Active FB Memory Usage Total : 40960 MiB Reserved : 571 MiB Used : 0 MiB Free : 40388 MiB BAR1 Memory Usage Total : 65536 MiB Used : 1 MiB Free : 65535 MiB Compute Mode : Default Utilization Gpu : 0 % Memory : 0 % Encoder : 0 % Decoder : 0 % Encoder Stats Active Sessions : 0 Average FPS : 0 Average Latency : 0 FBC Stats Active Sessions : 0 Average FPS : 0 Average Latency : 0 Ecc Mode Current : Enabled Pending : Enabled ECC Errors Volatile SRAM Correctable : 0 SRAM Uncorrectable : 0 DRAM Correctable : 0 DRAM Uncorrectable : 0 Aggregate SRAM Correctable : 0 SRAM Uncorrectable : 0 DRAM Correctable : 0 DRAM Uncorrectable : 0 Retired Pages Single Bit ECC : N/A Double Bit ECC : N/A Pending Page Blacklist : N/A Remapped Rows Correctable Error : 0 Uncorrectable Error : 0 Pending : No Remapping Failure Occurred : No Bank Remap Availability Histogram Max : 640 bank(s) High : 0 bank(s) Partial : 0 bank(s) Low : 0 bank(s) None : 0 bank(s) Temperature GPU Current Temp : 32 C GPU Shutdown Temp : 92 C GPU Slowdown Temp : 89 C GPU Max Operating Temp : 85 C GPU Target Temperature : N/A Memory Current Temp : 34 C Memory Max Operating Temp : 95 C Power Readings Power Management : Supported Power Draw : 54.92 W Power Limit : 400.00 W Default Power Limit : 400.00 W Enforced Power Limit : 400.00 W Min Power Limit : 100.00 W Max Power Limit : 400.00 W Clocks Graphics : 1080 MHz SM : 1080 MHz Memory : 1215 MHz Video : 975 MHz Applications Clocks Graphics : 1095 MHz Memory : 1215 MHz Default Applications Clocks Graphics : 1095 MHz Memory : 1215 MHz Max Clocks Graphics : 1410 MHz SM : 1410 MHz Memory : 1215 MHz Video : 1290 MHz Max Customer Boost Clocks Graphics : 1410 MHz Clock Policy Auto Boost : N/A Auto Boost Default : N/A Voltage Graphics : 731.250 mV Processes : None GPU 00000000:41:00.0 Product Name : NVIDIA A100-SXM4-40GB Product Brand : NVIDIA Product Architecture : Ampere Display Mode : Disabled Display Active : Disabled Persistence Mode : Disabled MIG Mode Current : Disabled Pending : Disabled Accounting Mode : Disabled Accounting Mode Buffer Size : 4000 Driver Model Current : N/A Pending : N/A Serial Number : 1561221014888 GPU UUID : GPU-6ca82e47-c63a-1bea-38ad-d3af9e1dc26b Minor Number : 1 VBIOS Version : 92.00.19.00.13 MultiGPU Board : No Board ID : 0x4100 GPU Part Number : 692-2G506-0202-002 Module ID : 1 Inforom Version Image Version : G506.0202.00.02 OEM Object : 2.0 ECC Object : 6.16 Power Management Object : N/A GPU Operation Mode Current : N/A Pending : N/A GSP Firmware Version : 515.48.07 GPU Virtualization Mode Virtualization Mode : None Host VGPU Mode : N/A IBMNPU Relaxed Ordering Mode : N/A PCI Bus : 0x41 Device : 0x00 Domain : 0x0000 Device Id : 0x20B010DE Bus Id : 00000000:41:00.0 Sub System Id : 0x144E10DE GPU Link Info PCIe Generation Max : 4 Current : 4 Link Width Max : 16x Current : 16x Bridge Chip Type : N/A Firmware : N/A Replays Since Reset : 0 Replay Number Rollovers : 0 Tx Throughput : 0 KB/s Rx Throughput : 0 KB/s Fan Speed : N/A Performance State : P0 Clocks Throttle Reasons Idle : Active Applications Clocks Setting : Not Active SW Power Cap : Not Active HW Slowdown : Not Active HW Thermal Slowdown : Not Active HW Power Brake Slowdown : Not Active Sync Boost : Not Active SW Thermal Slowdown : Not Active Display Clock Setting : Not Active FB Memory Usage Total : 40960 MiB Reserved : 571 MiB Used : 0 MiB Free : 40388 MiB BAR1 Memory Usage Total : 65536 MiB Used : 1 MiB Free : 65535 MiB Compute Mode : Default Utilization Gpu : 0 % Memory : 0 % Encoder : 0 % Decoder : 0 % Encoder Stats Active Sessions : 0 Average FPS : 0 Average Latency : 0 FBC Stats Active Sessions : 0 Average FPS : 0 Average Latency : 0 Ecc Mode Current : Enabled Pending : Enabled ECC Errors Volatile SRAM Correctable : 0 SRAM Uncorrectable : 0 DRAM Correctable : 0 DRAM Uncorrectable : 0 Aggregate SRAM Correctable : 0 SRAM Uncorrectable : 0 DRAM Correctable : 0 DRAM Uncorrectable : 0 Retired Pages Single Bit ECC : N/A Double Bit ECC : N/A Pending Page Blacklist : N/A Remapped Rows Correctable Error : 0 Uncorrectable Error : 0 Pending : No Remapping Failure Occurred : No Bank Remap Availability Histogram Max : 640 bank(s) High : 0 bank(s) Partial : 0 bank(s) Low : 0 bank(s) None : 0 bank(s) Temperature GPU Current Temp : 30 C GPU Shutdown Temp : 92 C GPU Slowdown Temp : 89 C GPU Max Operating Temp : 85 C GPU Target Temperature : N/A Memory Current Temp : 40 C Memory Max Operating Temp : 95 C Power Readings Power Management : Supported Power Draw : 57.45 W Power Limit : 400.00 W Default Power Limit : 400.00 W Enforced Power Limit : 400.00 W Min Power Limit : 100.00 W Max Power Limit : 400.00 W Clocks Graphics : 915 MHz SM : 915 MHz Memory : 1215 MHz Video : 780 MHz Applications Clocks Graphics : 1095 MHz Memory : 1215 MHz Default Applications Clocks Graphics : 1095 MHz Memory : 1215 MHz Max Clocks Graphics : 1410 MHz SM : 1410 MHz Memory : 1215 MHz Video : 1290 MHz Max Customer Boost Clocks Graphics : 1410 MHz Clock Policy Auto Boost : N/A Auto Boost Default : N/A Voltage Graphics : 700.000 mV Processes : None GPU 00000000:81:00.0 Product Name : NVIDIA A100-SXM4-40GB Product Brand : NVIDIA Product Architecture : Ampere Display Mode : Disabled Display Active : Disabled Persistence Mode : Disabled MIG Mode Current : Disabled Pending : Disabled Accounting Mode : Disabled Accounting Mode Buffer Size : 4000 Driver Model Current : N/A Pending : N/A Serial Number : 1561221015040 GPU UUID : GPU-7e4b55d2-75fc-8ab5-e212-09e69e84704b Minor Number : 2 VBIOS Version : 92.00.19.00.13 MultiGPU Board : No Board ID : 0x8100 GPU Part Number : 692-2G506-0202-002 Module ID : 2 Inforom Version Image Version : G506.0202.00.02 OEM Object : 2.0 ECC Object : 6.16 Power Management Object : N/A GPU Operation Mode Current : N/A Pending : N/A GSP Firmware Version : 515.48.07 GPU Virtualization Mode Virtualization Mode : None Host VGPU Mode : N/A IBMNPU Relaxed Ordering Mode : N/A PCI Bus : 0x81 Device : 0x00 Domain : 0x0000 Device Id : 0x20B010DE Bus Id : 00000000:81:00.0 Sub System Id : 0x144E10DE GPU Link Info PCIe Generation Max : 4 Current : 4 Link Width Max : 16x Current : 16x Bridge Chip Type : N/A Firmware : N/A Replays Since Reset : 0 Replay Number Rollovers : 0 Tx Throughput : 0 KB/s Rx Throughput : 0 KB/s Fan Speed : N/A Performance State : P0 Clocks Throttle Reasons Idle : Active Applications Clocks Setting : Not Active SW Power Cap : Not Active HW Slowdown : Not Active HW Thermal Slowdown : Not Active HW Power Brake Slowdown : Not Active Sync Boost : Not Active SW Thermal Slowdown : Not Active Display Clock Setting : Not Active FB Memory Usage Total : 40960 MiB Reserved : 571 MiB Used : 0 MiB Free : 40388 MiB BAR1 Memory Usage Total : 65536 MiB Used : 1 MiB Free : 65535 MiB Compute Mode : Default Utilization Gpu : 0 % Memory : 0 % Encoder : 0 % Decoder : 0 % Encoder Stats Active Sessions : 0 Average FPS : 0 Average Latency : 0 FBC Stats Active Sessions : 0 Average FPS : 0 Average Latency : 0 Ecc Mode Current : Enabled Pending : Enabled ECC Errors Volatile SRAM Correctable : 0 SRAM Uncorrectable : 0 DRAM Correctable : 0 DRAM Uncorrectable : 0 Aggregate SRAM Correctable : 0 SRAM Uncorrectable : 0 DRAM Correctable : 0 DRAM Uncorrectable : 0 Retired Pages Single Bit ECC : N/A Double Bit ECC : N/A Pending Page Blacklist : N/A Remapped Rows Correctable Error : 0 Uncorrectable Error : 0 Pending : No Remapping Failure Occurred : No Bank Remap Availability Histogram Max : 640 bank(s) High : 0 bank(s) Partial : 0 bank(s) Low : 0 bank(s) None : 0 bank(s) Temperature GPU Current Temp : 32 C GPU Shutdown Temp : 92 C GPU Slowdown Temp : 89 C GPU Max Operating Temp : 85 C GPU Target Temperature : N/A Memory Current Temp : 33 C Memory Max Operating Temp : 95 C Power Readings Power Management : Supported Power Draw : 54.65 W Power Limit : 400.00 W Default Power Limit : 400.00 W Enforced Power Limit : 400.00 W Min Power Limit : 100.00 W Max Power Limit : 400.00 W Clocks Graphics : 1080 MHz SM : 1080 MHz Memory : 1215 MHz Video : 975 MHz Applications Clocks Graphics : 1095 MHz Memory : 1215 MHz Default Applications Clocks Graphics : 1095 MHz Memory : 1215 MHz Max Clocks Graphics : 1410 MHz SM : 1410 MHz Memory : 1215 MHz Video : 1290 MHz Max Customer Boost Clocks Graphics : 1410 MHz Clock Policy Auto Boost : N/A Auto Boost Default : N/A Voltage Graphics : 712.500 mV Processes : None GPU 00000000:C1:00.0 Product Name : NVIDIA A100-SXM4-40GB Product Brand : NVIDIA Product Architecture : Ampere Display Mode : Disabled Display Active : Disabled Persistence Mode : Disabled MIG Mode Current : Disabled Pending : Disabled Accounting Mode : Disabled Accounting Mode Buffer Size : 4000 Driver Model Current : N/A Pending : N/A Serial Number : 1561221014695 GPU UUID : GPU-66ba085a-5496-d204-d4da-aa9f112d3fd8 Minor Number : 3 VBIOS Version : 92.00.19.00.13 MultiGPU Board : No Board ID : 0xc100 GPU Part Number : 692-2G506-0202-002 Module ID : 0 Inforom Version Image Version : G506.0202.00.02 OEM Object : 2.0 ECC Object : 6.16 Power Management Object : N/A GPU Operation Mode Current : N/A Pending : N/A GSP Firmware Version : 515.48.07 GPU Virtualization Mode Virtualization Mode : None Host VGPU Mode : N/A IBMNPU Relaxed Ordering Mode : N/A PCI Bus : 0xC1 Device : 0x00 Domain : 0x0000 Device Id : 0x20B010DE Bus Id : 00000000:C1:00.0 Sub System Id : 0x144E10DE GPU Link Info PCIe Generation Max : 4 Current : 4 Link Width Max : 16x Current : 16x Bridge Chip Type : N/A Firmware : N/A Replays Since Reset : 0 Replay Number Rollovers : 0 Tx Throughput : 0 KB/s Rx Throughput : 0 KB/s Fan Speed : N/A Performance State : P0 Clocks Throttle Reasons Idle : Active Applications Clocks Setting : Not Active SW Power Cap : Not Active HW Slowdown : Not Active HW Thermal Slowdown : Not Active HW Power Brake Slowdown : Not Active Sync Boost : Not Active SW Thermal Slowdown : Not Active Display Clock Setting : Not Active FB Memory Usage Total : 40960 MiB Reserved : 571 MiB Used : 0 MiB Free : 40388 MiB BAR1 Memory Usage Total : 65536 MiB Used : 1 MiB Free : 65535 MiB Compute Mode : Default Utilization Gpu : 0 % Memory : 0 % Encoder : 0 % Decoder : 0 % Encoder Stats Active Sessions : 0 Average FPS : 0 Average Latency : 0 FBC Stats Active Sessions : 0 Average FPS : 0 Average Latency : 0 Ecc Mode Current : Enabled Pending : Enabled ECC Errors Volatile SRAM Correctable : 0 SRAM Uncorrectable : 0 DRAM Correctable : 0 DRAM Uncorrectable : 0 Aggregate SRAM Correctable : 0 SRAM Uncorrectable : 0 DRAM Correctable : 0 DRAM Uncorrectable : 0 Retired Pages Single Bit ECC : N/A Double Bit ECC : N/A Pending Page Blacklist : N/A Remapped Rows Correctable Error : 0 Uncorrectable Error : 0 Pending : No Remapping Failure Occurred : No Bank Remap Availability Histogram Max : 640 bank(s) High : 0 bank(s) Partial : 0 bank(s) Low : 0 bank(s) None : 0 bank(s) Temperature GPU Current Temp : 30 C GPU Shutdown Temp : 92 C GPU Slowdown Temp : 89 C GPU Max Operating Temp : 85 C GPU Target Temperature : N/A Memory Current Temp : 38 C Memory Max Operating Temp : 95 C Power Readings Power Management : Supported Power Draw : 59.01 W Power Limit : 400.00 W Default Power Limit : 400.00 W Enforced Power Limit : 400.00 W Min Power Limit : 100.00 W Max Power Limit : 400.00 W Clocks Graphics : 1080 MHz SM : 1080 MHz Memory : 1215 MHz Video : 975 MHz Applications Clocks Graphics : 1095 MHz Memory : 1215 MHz Default Applications Clocks Graphics : 1095 MHz Memory : 1215 MHz Max Clocks Graphics : 1410 MHz SM : 1410 MHz Memory : 1215 MHz Video : 1290 MHz Max Customer Boost Clocks Graphics : 1410 MHz Clock Policy Auto Boost : N/A Auto Boost Default : N/A Voltage Graphics : 737.500 mV Processes : None ``` * docker version ``` Client: Docker Engine - Community Version: 20.10.17 API version: 1.41 Go version: go1.17.11 Git commit: 100c701 Built: Mon Jun 6 23:02:56 2022 OS/Arch: linux/amd64 Context: default Experimental: true Server: Docker Engine - Community Engine: Version: 20.10.11 API version: 1.41 (minimum version 1.12) Go version: go1.16.9 Git commit: 847da18 Built: Thu Nov 18 00:35:16 2021 OS/Arch: linux/amd64 Experimental: false containerd: Version: 1.6.6 GitCommit: 10c12954828e7c7c9b6e0ea9b0c02b01407d3ae1 nvidia: Version: 1.1.2 GitCommit: v1.1.2-0-ga916309 docker-init: Version: 0.19.0 GitCommit: de40ad0 ``` * NVIDIA container library version ``` cli-version: 1.10.0 lib-version: 1.10.0 build date: 2022-06-13T10:39+00:00 build revision: 395fd41701117121f1fd04ada01e1d7e006a37ae build compiler: x86_64-linux-gnu-gcc-7 7.5.0 build platform: x86_64 build flags: -D_GNU_SOURCE -D_FORTIFY_SOURCE=2 -DNDEBUG -std=gnu11 -O2 -g -fdata-sections -ffunction-sections -fplan9-extensions -fstack-protector -fno-strict-aliasing -fvisibility=hidden -Wall -Wextra -Wcast-align -Wpointer-arith -Wmissing-prototypes -Wnonnull -Wwrite-strings -Wlogical-op -Wformat=2 -Wmissing-format-attribute -Winit-self -Wshadow -Wstrict-prototypes -Wunreachable-code -Wconversion -Wsign-conversion -Wno-unknown-warning-option -Wno-format-extra-args -Wno-gnu-alignof-expression -Wl,-zrelro -Wl,-znow -Wl,-zdefs -Wl,--gc-sections ``` * NVIDIA packages version ``` Desired=Unknown/Install/Remove/Purge/Hold | Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend |/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad) ||/ Name Version Architecture Description +++-====================================-=======================-=======================-============================================================================= ii libnvidia-container-tools 1.10.0-1 amd64 NVIDIA container runtime library (command-line tools) ii libnvidia-container1:amd64 1.10.0-1 amd64 NVIDIA container runtime library un nvidia-container-runtime <none> <none> (no description available) un nvidia-container-runtime-hook <none> <none> (no description available) ii nvidia-container-toolkit 1.10.0-1 amd64 NVIDIA container runtime hook un nvidia-docker <none> <none> (no description available) ii nvidia-docker2 2.7.0-1 all nvidia-docker CLI wrapper ```

Also the Nvidia Modulus container is not on CUDA 12.0 yet, but I am not sure if this is the issue. You could consider a pip install.

ngeneva · June 9, 2023, 6:14am

Interesting. Consider trying the Nvidia Pytorch base container that we build from to see if that works fine. If it does we know its some issue with the Modulus container (although a fix is unknown).

nvcr.io/nvidia/pytorch:22.12-py3

kdg5424 · June 9, 2023, 6:40am

Hi, @ngeneva .
I tried as you told me.
And it seems working.

kdg@DESKTOP-7ICQ4NK:~$ docker run --gpus all -it --rm nvcr.io/nvidia/pytorch:22.12-py3

=============
== PyTorch ==

NVIDIA Release 22.12 (build 49968248)
PyTorch Version 1.14.0a0+410ce96

Copyright (c) 2014-2022 Facebook Inc.
Copyright (c) 2011-2014 Idiap Research Institute (Ronan Collobert)
Copyright (c) 2012-2014 Deepmind Technologies (Koray Kavukcuoglu)
Copyright (c) 2011-2012 NEC Laboratories America (Koray Kavukcuoglu)
Copyright (c) 2011-2013 NYU (Clement Farabet)
Copyright (c) 2006-2010 NEC Laboratories America (Ronan Collobert, Leon Bottou, Iain Melvin, Jason Weston)
Copyright (c) 2006 Idiap Research Institute (Samy Bengio)
Copyright (c) 2001-2004 Idiap Research Institute (Ronan Collobert, Samy Bengio, Johnny Mariethoz)
Copyright (c) 2015 Google Inc.
Copyright (c) 2015 Yangqing Jia
Copyright (c) 2013-2016 The Caffe contributors
All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:

NOTE: The SHMEM allocation limit is set to the default of 64MB. This may be
insufficient for PyTorch. NVIDIA recommends the use of the following flags:
docker run --gpus all --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 …

root@7fff099603ff:/workspace#

And I think it looks lie some issue with the Modulus container.

rohan.patel · June 9, 2023, 6:51pm

@ngeneva @kdg5424

This has been a known problem with the Modulus containers for some time. The Pytorch container has always worked without issue.

For Modulus 22.09 you had to remove some of the injected files included in the container. Here’s the below dockerfile to generate a working 22.09 container from the existing one

FROM nvcr.io/nvidia/modulus/modulus:22.09

RUN rm -rf \
    /usr/lib/x86_64-linux-gnu/libcuda.so* \
    /usr/lib/x86_64-linux-gnu/libnvcuvid.so* \
    /usr/lib/x86_64-linux-gnu/libnvidia-*.so* \
    /usr/local/cuda/compat/lib/*.515.65.01

kdg5424 · June 12, 2023, 12:54am

Thanks for reply.
It works well.

system · June 26, 2023, 12:55am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Problem using modulus 22.07 in WSL2 Technical Support (PhysicsNeMo Only)	3	2728	September 29, 2022
Modulus 22.07 Container version for Linux issue Report a Bug (PhysicsNeMo Only)	10	2281	August 5, 2022
Failing to run Modulus after loading the container on windows subsystem for Linux Technical Support (PhysicsNeMo Only)	3	1869	November 18, 2022
Error running 22.07 container with examples - Failed to create shim task Report a Bug (PhysicsNeMo Only)	7	8885	September 29, 2022
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver CUDA on Windows Subsystem for Linux	33	23399	May 1, 2021
Failure to install CUDA on WSL 2 Ubuntu CUDA on Windows Subsystem for Linux	65	47593	September 10, 2021
470.14 - WSL with W10 Build 21343 - NVIDIA-SMI error CUDA on Windows Subsystem for Linux	43	19315	November 21, 2021
Stderr: nvidia-container-cli: initialization error: driver error: failed to process request\\\\n\\\"\"": unknown CUDA on Windows Subsystem for Linux	35	38193	August 21, 2023
Guide to run CUDA + WSL + Docker with latest versions (21382 Windows build + 470.14 Nvidia) CUDA on Windows Subsystem for Linux cuda , wsl	22	35404	December 9, 2023
WSL2 / RTX3070 : running cuda samples and containers errors CUDA on Windows Subsystem for Linux	0	2654	January 27, 2021

WSL Modulus Docker run error (libnvidia-ml.so.1: file exists: unknown.)

============= == PyTorch ==

Related topics

=============
== PyTorch ==