Hi. I’m trying to use Modulus with docker on wsl2 ubuntu20.04 (windows11)
And I have a problem.
Running docker with below command
docker run --gpus all -v ${PWD}/examples:/examples -it --rm nvcr.io/nvidia/modulus/modulus:22.09 bash
Then an error like this is coming
docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as ‘legacy’
nvidia-container-cli: mount error: file creation failed: /var/lib/docker/overlay2/d34848e7089996bdb31f9dd8ce55a3e27c6446eee30259c33ffce6ba4777833a/merged/usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1: file exists: unknown.
how could I solve this?
I’m using RTX 3060, 12.1 CUDA version
I think it’s not the problem with gpu or drivers.
sudo docker run --rm --gpus all nvidia/cuda:11.0.3-base-ubuntu20.04 nvidia-smi
with this command
Fri Jun 9 06:03:12 2023
±--------------------------------------------------------------------------------------+
| NVIDIA-SMI 530.47 Driver Version: 531.68 CUDA Version: 12.1 |
|-----------------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce RTX 3060 On | 00000000:01:00.0 On | N/A |
| 0% 47C P8 18W / 170W| 868MiB / 8192MiB | 0% Default |
| | | N/A |
±----------------------------------------±---------------------±---------------------+
±--------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 20 G /Xwayland N/A |
| 0 N/A N/A 26 G /Xwayland N/A |
| 0 N/A N/A 595 G /Xwayland N/A |
±--------------------------------------------------------------------------------------+
this result comes out
Hi @kdg5424
Looks like our Nvidia docker is giving you some troubles on WSL. We don’t test or officially support WSL with the Modulus container but consider having a look at this relevant Github issue with some possible solutions:
opened 07:25AM - 21 Jun 22 UTC
closed 11:28AM - 30 Jun 22 UTC
### Issue or feature description
when i use docker to create container, i get t… his error
```
docker: Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: initialization error: driver rpc error: timed out: unknown.
```
### Steps to reproduce the issue
* when i executed the following command
`sudo docker run --rm --gpus all nvidia/cuda:11.0.3-base-ubuntu20.04 nvidia-smi`
* i get the following error
`docker: Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: initialization error: driver rpc error: timed out: unknown.`
* but when i executed this following command, it has the expected output
`sudo docker run hello-world`
### here is some Information
* Some nvidia-container information
```
gpu-server@gpu-server:~$ nvidia-container-cli -k -d /dev/tty info
-- WARNING, the following logs are for debugging purposes only --
I0621 07:07:51.735875 4789 nvc.c:376] initializing library context (version=1.10.0, build=395fd41701117121f1fd04ada01e1d7e006a37ae)
I0621 07:07:51.735941 4789 nvc.c:350] using root /
I0621 07:07:51.735947 4789 nvc.c:351] using ldcache /etc/ld.so.cache
I0621 07:07:51.735963 4789 nvc.c:352] using unprivileged user 1000:1000
I0621 07:07:51.735984 4789 nvc.c:393] attempting to load dxcore to see if we are running under Windows Subsystem for Linux (WSL)
I0621 07:07:51.736064 4789 nvc.c:395] dxcore initialization failed, continuing assuming a non-WSL environment
W0621 07:07:51.739205 4791 nvc.c:273] failed to set inheritable capabilities
W0621 07:07:51.739329 4791 nvc.c:274] skipping kernel modules load due to failure
I0621 07:07:51.739807 4793 rpc.c:71] starting driver rpc service
W0621 07:08:16.774958 4789 rpc.c:121] terminating driver rpc service (forced)
I0621 07:08:20.481845 4789 rpc.c:135] driver rpc service terminated with signal 15
nvidia-container-cli: initialization error: driver rpc error: timed out
I0621 07:08:20.481972 4789 nvc.c:434] shutting down library context
```
* Kernel version
`Linux gpu-server 4.15.0-187-generic #198-Ubuntu SMP Tue Jun 14 03:23:51 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux`
* Driver information
```
==============NVSMI LOG==============
Timestamp : Tue Jun 21 07:13:57 2022
Driver Version : 515.48.07
CUDA Version : 11.7
Attached GPUs : 4
GPU 00000000:01:00.0
Product Name : NVIDIA A100-SXM4-40GB
Product Brand : NVIDIA
Product Architecture : Ampere
Display Mode : Disabled
Display Active : Disabled
Persistence Mode : Disabled
MIG Mode
Current : Disabled
Pending : Disabled
Accounting Mode : Disabled
Accounting Mode Buffer Size : 4000
Driver Model
Current : N/A
Pending : N/A
Serial Number : 1561221014674
GPU UUID : GPU-b67da01e-feba-d839-62c5-2773d4e963f0
Minor Number : 0
VBIOS Version : 92.00.19.00.13
MultiGPU Board : No
Board ID : 0x100
GPU Part Number : 692-2G506-0202-002
Module ID : 3
Inforom Version
Image Version : G506.0202.00.02
OEM Object : 2.0
ECC Object : 6.16
Power Management Object : N/A
GPU Operation Mode
Current : N/A
Pending : N/A
GSP Firmware Version : 515.48.07
GPU Virtualization Mode
Virtualization Mode : None
Host VGPU Mode : N/A
IBMNPU
Relaxed Ordering Mode : N/A
PCI
Bus : 0x01
Device : 0x00
Domain : 0x0000
Device Id : 0x20B010DE
Bus Id : 00000000:01:00.0
Sub System Id : 0x144E10DE
GPU Link Info
PCIe Generation
Max : 4
Current : 4
Link Width
Max : 16x
Current : 16x
Bridge Chip
Type : N/A
Firmware : N/A
Replays Since Reset : 0
Replay Number Rollovers : 0
Tx Throughput : 0 KB/s
Rx Throughput : 0 KB/s
Fan Speed : N/A
Performance State : P0
Clocks Throttle Reasons
Idle : Active
Applications Clocks Setting : Not Active
SW Power Cap : Not Active
HW Slowdown : Not Active
HW Thermal Slowdown : Not Active
HW Power Brake Slowdown : Not Active
Sync Boost : Not Active
SW Thermal Slowdown : Not Active
Display Clock Setting : Not Active
FB Memory Usage
Total : 40960 MiB
Reserved : 571 MiB
Used : 0 MiB
Free : 40388 MiB
BAR1 Memory Usage
Total : 65536 MiB
Used : 1 MiB
Free : 65535 MiB
Compute Mode : Default
Utilization
Gpu : 0 %
Memory : 0 %
Encoder : 0 %
Decoder : 0 %
Encoder Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
FBC Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
Ecc Mode
Current : Enabled
Pending : Enabled
ECC Errors
Volatile
SRAM Correctable : 0
SRAM Uncorrectable : 0
DRAM Correctable : 0
DRAM Uncorrectable : 0
Aggregate
SRAM Correctable : 0
SRAM Uncorrectable : 0
DRAM Correctable : 0
DRAM Uncorrectable : 0
Retired Pages
Single Bit ECC : N/A
Double Bit ECC : N/A
Pending Page Blacklist : N/A
Remapped Rows
Correctable Error : 0
Uncorrectable Error : 0
Pending : No
Remapping Failure Occurred : No
Bank Remap Availability Histogram
Max : 640 bank(s)
High : 0 bank(s)
Partial : 0 bank(s)
Low : 0 bank(s)
None : 0 bank(s)
Temperature
GPU Current Temp : 32 C
GPU Shutdown Temp : 92 C
GPU Slowdown Temp : 89 C
GPU Max Operating Temp : 85 C
GPU Target Temperature : N/A
Memory Current Temp : 34 C
Memory Max Operating Temp : 95 C
Power Readings
Power Management : Supported
Power Draw : 54.92 W
Power Limit : 400.00 W
Default Power Limit : 400.00 W
Enforced Power Limit : 400.00 W
Min Power Limit : 100.00 W
Max Power Limit : 400.00 W
Clocks
Graphics : 1080 MHz
SM : 1080 MHz
Memory : 1215 MHz
Video : 975 MHz
Applications Clocks
Graphics : 1095 MHz
Memory : 1215 MHz
Default Applications Clocks
Graphics : 1095 MHz
Memory : 1215 MHz
Max Clocks
Graphics : 1410 MHz
SM : 1410 MHz
Memory : 1215 MHz
Video : 1290 MHz
Max Customer Boost Clocks
Graphics : 1410 MHz
Clock Policy
Auto Boost : N/A
Auto Boost Default : N/A
Voltage
Graphics : 731.250 mV
Processes : None
GPU 00000000:41:00.0
Product Name : NVIDIA A100-SXM4-40GB
Product Brand : NVIDIA
Product Architecture : Ampere
Display Mode : Disabled
Display Active : Disabled
Persistence Mode : Disabled
MIG Mode
Current : Disabled
Pending : Disabled
Accounting Mode : Disabled
Accounting Mode Buffer Size : 4000
Driver Model
Current : N/A
Pending : N/A
Serial Number : 1561221014888
GPU UUID : GPU-6ca82e47-c63a-1bea-38ad-d3af9e1dc26b
Minor Number : 1
VBIOS Version : 92.00.19.00.13
MultiGPU Board : No
Board ID : 0x4100
GPU Part Number : 692-2G506-0202-002
Module ID : 1
Inforom Version
Image Version : G506.0202.00.02
OEM Object : 2.0
ECC Object : 6.16
Power Management Object : N/A
GPU Operation Mode
Current : N/A
Pending : N/A
GSP Firmware Version : 515.48.07
GPU Virtualization Mode
Virtualization Mode : None
Host VGPU Mode : N/A
IBMNPU
Relaxed Ordering Mode : N/A
PCI
Bus : 0x41
Device : 0x00
Domain : 0x0000
Device Id : 0x20B010DE
Bus Id : 00000000:41:00.0
Sub System Id : 0x144E10DE
GPU Link Info
PCIe Generation
Max : 4
Current : 4
Link Width
Max : 16x
Current : 16x
Bridge Chip
Type : N/A
Firmware : N/A
Replays Since Reset : 0
Replay Number Rollovers : 0
Tx Throughput : 0 KB/s
Rx Throughput : 0 KB/s
Fan Speed : N/A
Performance State : P0
Clocks Throttle Reasons
Idle : Active
Applications Clocks Setting : Not Active
SW Power Cap : Not Active
HW Slowdown : Not Active
HW Thermal Slowdown : Not Active
HW Power Brake Slowdown : Not Active
Sync Boost : Not Active
SW Thermal Slowdown : Not Active
Display Clock Setting : Not Active
FB Memory Usage
Total : 40960 MiB
Reserved : 571 MiB
Used : 0 MiB
Free : 40388 MiB
BAR1 Memory Usage
Total : 65536 MiB
Used : 1 MiB
Free : 65535 MiB
Compute Mode : Default
Utilization
Gpu : 0 %
Memory : 0 %
Encoder : 0 %
Decoder : 0 %
Encoder Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
FBC Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
Ecc Mode
Current : Enabled
Pending : Enabled
ECC Errors
Volatile
SRAM Correctable : 0
SRAM Uncorrectable : 0
DRAM Correctable : 0
DRAM Uncorrectable : 0
Aggregate
SRAM Correctable : 0
SRAM Uncorrectable : 0
DRAM Correctable : 0
DRAM Uncorrectable : 0
Retired Pages
Single Bit ECC : N/A
Double Bit ECC : N/A
Pending Page Blacklist : N/A
Remapped Rows
Correctable Error : 0
Uncorrectable Error : 0
Pending : No
Remapping Failure Occurred : No
Bank Remap Availability Histogram
Max : 640 bank(s)
High : 0 bank(s)
Partial : 0 bank(s)
Low : 0 bank(s)
None : 0 bank(s)
Temperature
GPU Current Temp : 30 C
GPU Shutdown Temp : 92 C
GPU Slowdown Temp : 89 C
GPU Max Operating Temp : 85 C
GPU Target Temperature : N/A
Memory Current Temp : 40 C
Memory Max Operating Temp : 95 C
Power Readings
Power Management : Supported
Power Draw : 57.45 W
Power Limit : 400.00 W
Default Power Limit : 400.00 W
Enforced Power Limit : 400.00 W
Min Power Limit : 100.00 W
Max Power Limit : 400.00 W
Clocks
Graphics : 915 MHz
SM : 915 MHz
Memory : 1215 MHz
Video : 780 MHz
Applications Clocks
Graphics : 1095 MHz
Memory : 1215 MHz
Default Applications Clocks
Graphics : 1095 MHz
Memory : 1215 MHz
Max Clocks
Graphics : 1410 MHz
SM : 1410 MHz
Memory : 1215 MHz
Video : 1290 MHz
Max Customer Boost Clocks
Graphics : 1410 MHz
Clock Policy
Auto Boost : N/A
Auto Boost Default : N/A
Voltage
Graphics : 700.000 mV
Processes : None
GPU 00000000:81:00.0
Product Name : NVIDIA A100-SXM4-40GB
Product Brand : NVIDIA
Product Architecture : Ampere
Display Mode : Disabled
Display Active : Disabled
Persistence Mode : Disabled
MIG Mode
Current : Disabled
Pending : Disabled
Accounting Mode : Disabled
Accounting Mode Buffer Size : 4000
Driver Model
Current : N/A
Pending : N/A
Serial Number : 1561221015040
GPU UUID : GPU-7e4b55d2-75fc-8ab5-e212-09e69e84704b
Minor Number : 2
VBIOS Version : 92.00.19.00.13
MultiGPU Board : No
Board ID : 0x8100
GPU Part Number : 692-2G506-0202-002
Module ID : 2
Inforom Version
Image Version : G506.0202.00.02
OEM Object : 2.0
ECC Object : 6.16
Power Management Object : N/A
GPU Operation Mode
Current : N/A
Pending : N/A
GSP Firmware Version : 515.48.07
GPU Virtualization Mode
Virtualization Mode : None
Host VGPU Mode : N/A
IBMNPU
Relaxed Ordering Mode : N/A
PCI
Bus : 0x81
Device : 0x00
Domain : 0x0000
Device Id : 0x20B010DE
Bus Id : 00000000:81:00.0
Sub System Id : 0x144E10DE
GPU Link Info
PCIe Generation
Max : 4
Current : 4
Link Width
Max : 16x
Current : 16x
Bridge Chip
Type : N/A
Firmware : N/A
Replays Since Reset : 0
Replay Number Rollovers : 0
Tx Throughput : 0 KB/s
Rx Throughput : 0 KB/s
Fan Speed : N/A
Performance State : P0
Clocks Throttle Reasons
Idle : Active
Applications Clocks Setting : Not Active
SW Power Cap : Not Active
HW Slowdown : Not Active
HW Thermal Slowdown : Not Active
HW Power Brake Slowdown : Not Active
Sync Boost : Not Active
SW Thermal Slowdown : Not Active
Display Clock Setting : Not Active
FB Memory Usage
Total : 40960 MiB
Reserved : 571 MiB
Used : 0 MiB
Free : 40388 MiB
BAR1 Memory Usage
Total : 65536 MiB
Used : 1 MiB
Free : 65535 MiB
Compute Mode : Default
Utilization
Gpu : 0 %
Memory : 0 %
Encoder : 0 %
Decoder : 0 %
Encoder Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
FBC Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
Ecc Mode
Current : Enabled
Pending : Enabled
ECC Errors
Volatile
SRAM Correctable : 0
SRAM Uncorrectable : 0
DRAM Correctable : 0
DRAM Uncorrectable : 0
Aggregate
SRAM Correctable : 0
SRAM Uncorrectable : 0
DRAM Correctable : 0
DRAM Uncorrectable : 0
Retired Pages
Single Bit ECC : N/A
Double Bit ECC : N/A
Pending Page Blacklist : N/A
Remapped Rows
Correctable Error : 0
Uncorrectable Error : 0
Pending : No
Remapping Failure Occurred : No
Bank Remap Availability Histogram
Max : 640 bank(s)
High : 0 bank(s)
Partial : 0 bank(s)
Low : 0 bank(s)
None : 0 bank(s)
Temperature
GPU Current Temp : 32 C
GPU Shutdown Temp : 92 C
GPU Slowdown Temp : 89 C
GPU Max Operating Temp : 85 C
GPU Target Temperature : N/A
Memory Current Temp : 33 C
Memory Max Operating Temp : 95 C
Power Readings
Power Management : Supported
Power Draw : 54.65 W
Power Limit : 400.00 W
Default Power Limit : 400.00 W
Enforced Power Limit : 400.00 W
Min Power Limit : 100.00 W
Max Power Limit : 400.00 W
Clocks
Graphics : 1080 MHz
SM : 1080 MHz
Memory : 1215 MHz
Video : 975 MHz
Applications Clocks
Graphics : 1095 MHz
Memory : 1215 MHz
Default Applications Clocks
Graphics : 1095 MHz
Memory : 1215 MHz
Max Clocks
Graphics : 1410 MHz
SM : 1410 MHz
Memory : 1215 MHz
Video : 1290 MHz
Max Customer Boost Clocks
Graphics : 1410 MHz
Clock Policy
Auto Boost : N/A
Auto Boost Default : N/A
Voltage
Graphics : 712.500 mV
Processes : None
GPU 00000000:C1:00.0
Product Name : NVIDIA A100-SXM4-40GB
Product Brand : NVIDIA
Product Architecture : Ampere
Display Mode : Disabled
Display Active : Disabled
Persistence Mode : Disabled
MIG Mode
Current : Disabled
Pending : Disabled
Accounting Mode : Disabled
Accounting Mode Buffer Size : 4000
Driver Model
Current : N/A
Pending : N/A
Serial Number : 1561221014695
GPU UUID : GPU-66ba085a-5496-d204-d4da-aa9f112d3fd8
Minor Number : 3
VBIOS Version : 92.00.19.00.13
MultiGPU Board : No
Board ID : 0xc100
GPU Part Number : 692-2G506-0202-002
Module ID : 0
Inforom Version
Image Version : G506.0202.00.02
OEM Object : 2.0
ECC Object : 6.16
Power Management Object : N/A
GPU Operation Mode
Current : N/A
Pending : N/A
GSP Firmware Version : 515.48.07
GPU Virtualization Mode
Virtualization Mode : None
Host VGPU Mode : N/A
IBMNPU
Relaxed Ordering Mode : N/A
PCI
Bus : 0xC1
Device : 0x00
Domain : 0x0000
Device Id : 0x20B010DE
Bus Id : 00000000:C1:00.0
Sub System Id : 0x144E10DE
GPU Link Info
PCIe Generation
Max : 4
Current : 4
Link Width
Max : 16x
Current : 16x
Bridge Chip
Type : N/A
Firmware : N/A
Replays Since Reset : 0
Replay Number Rollovers : 0
Tx Throughput : 0 KB/s
Rx Throughput : 0 KB/s
Fan Speed : N/A
Performance State : P0
Clocks Throttle Reasons
Idle : Active
Applications Clocks Setting : Not Active
SW Power Cap : Not Active
HW Slowdown : Not Active
HW Thermal Slowdown : Not Active
HW Power Brake Slowdown : Not Active
Sync Boost : Not Active
SW Thermal Slowdown : Not Active
Display Clock Setting : Not Active
FB Memory Usage
Total : 40960 MiB
Reserved : 571 MiB
Used : 0 MiB
Free : 40388 MiB
BAR1 Memory Usage
Total : 65536 MiB
Used : 1 MiB
Free : 65535 MiB
Compute Mode : Default
Utilization
Gpu : 0 %
Memory : 0 %
Encoder : 0 %
Decoder : 0 %
Encoder Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
FBC Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
Ecc Mode
Current : Enabled
Pending : Enabled
ECC Errors
Volatile
SRAM Correctable : 0
SRAM Uncorrectable : 0
DRAM Correctable : 0
DRAM Uncorrectable : 0
Aggregate
SRAM Correctable : 0
SRAM Uncorrectable : 0
DRAM Correctable : 0
DRAM Uncorrectable : 0
Retired Pages
Single Bit ECC : N/A
Double Bit ECC : N/A
Pending Page Blacklist : N/A
Remapped Rows
Correctable Error : 0
Uncorrectable Error : 0
Pending : No
Remapping Failure Occurred : No
Bank Remap Availability Histogram
Max : 640 bank(s)
High : 0 bank(s)
Partial : 0 bank(s)
Low : 0 bank(s)
None : 0 bank(s)
Temperature
GPU Current Temp : 30 C
GPU Shutdown Temp : 92 C
GPU Slowdown Temp : 89 C
GPU Max Operating Temp : 85 C
GPU Target Temperature : N/A
Memory Current Temp : 38 C
Memory Max Operating Temp : 95 C
Power Readings
Power Management : Supported
Power Draw : 59.01 W
Power Limit : 400.00 W
Default Power Limit : 400.00 W
Enforced Power Limit : 400.00 W
Min Power Limit : 100.00 W
Max Power Limit : 400.00 W
Clocks
Graphics : 1080 MHz
SM : 1080 MHz
Memory : 1215 MHz
Video : 975 MHz
Applications Clocks
Graphics : 1095 MHz
Memory : 1215 MHz
Default Applications Clocks
Graphics : 1095 MHz
Memory : 1215 MHz
Max Clocks
Graphics : 1410 MHz
SM : 1410 MHz
Memory : 1215 MHz
Video : 1290 MHz
Max Customer Boost Clocks
Graphics : 1410 MHz
Clock Policy
Auto Boost : N/A
Auto Boost Default : N/A
Voltage
Graphics : 737.500 mV
Processes : None
```
* docker version
```
Client: Docker Engine - Community
Version: 20.10.17
API version: 1.41
Go version: go1.17.11
Git commit: 100c701
Built: Mon Jun 6 23:02:56 2022
OS/Arch: linux/amd64
Context: default
Experimental: true
Server: Docker Engine - Community
Engine:
Version: 20.10.11
API version: 1.41 (minimum version 1.12)
Go version: go1.16.9
Git commit: 847da18
Built: Thu Nov 18 00:35:16 2021
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.6.6
GitCommit: 10c12954828e7c7c9b6e0ea9b0c02b01407d3ae1
nvidia:
Version: 1.1.2
GitCommit: v1.1.2-0-ga916309
docker-init:
Version: 0.19.0
GitCommit: de40ad0
```
* NVIDIA container library version
```
cli-version: 1.10.0
lib-version: 1.10.0
build date: 2022-06-13T10:39+00:00
build revision: 395fd41701117121f1fd04ada01e1d7e006a37ae
build compiler: x86_64-linux-gnu-gcc-7 7.5.0
build platform: x86_64
build flags: -D_GNU_SOURCE -D_FORTIFY_SOURCE=2 -DNDEBUG -std=gnu11 -O2 -g -fdata-sections -ffunction-sections -fplan9-extensions -fstack-protector -fno-strict-aliasing -fvisibility=hidden -Wall -Wextra -Wcast-align -Wpointer-arith -Wmissing-prototypes -Wnonnull -Wwrite-strings -Wlogical-op -Wformat=2 -Wmissing-format-attribute -Winit-self -Wshadow -Wstrict-prototypes -Wunreachable-code -Wconversion -Wsign-conversion -Wno-unknown-warning-option -Wno-format-extra-args -Wno-gnu-alignof-expression -Wl,-zrelro -Wl,-znow -Wl,-zdefs -Wl,--gc-sections
```
* NVIDIA packages version
```
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name Version Architecture Description
+++-====================================-=======================-=======================-=============================================================================
ii libnvidia-container-tools 1.10.0-1 amd64 NVIDIA container runtime library (command-line tools)
ii libnvidia-container1:amd64 1.10.0-1 amd64 NVIDIA container runtime library
un nvidia-container-runtime <none> <none> (no description available)
un nvidia-container-runtime-hook <none> <none> (no description available)
ii nvidia-container-toolkit 1.10.0-1 amd64 NVIDIA container runtime hook
un nvidia-docker <none> <none> (no description available)
ii nvidia-docker2 2.7.0-1 all nvidia-docker CLI wrapper
```
Also the Nvidia Modulus container is not on CUDA 12.0 yet, but I am not sure if this is the issue. You could consider a pip install.
Interesting. Consider trying the Nvidia Pytorch base container that we build from to see if that works fine. If it does we know its some issue with the Modulus container (although a fix is unknown).
nvcr.io/nvidia/pytorch:22.12-py3
Hi, @ngeneva .
I tried as you told me.
And it seems working.
kdg@DESKTOP-7ICQ4NK:~$ docker run --gpus all -it --rm nvcr.io/nvidia/pytorch:22.12-py3
=============
== PyTorch ==
NVIDIA Release 22.12 (build 49968248)
PyTorch Version 1.14.0a0+410ce96
Container image Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
Copyright (c) 2014-2022 Facebook Inc.
Copyright (c) 2011-2014 Idiap Research Institute (Ronan Collobert)
Copyright (c) 2012-2014 Deepmind Technologies (Koray Kavukcuoglu)
Copyright (c) 2011-2012 NEC Laboratories America (Koray Kavukcuoglu)
Copyright (c) 2011-2013 NYU (Clement Farabet)
Copyright (c) 2006-2010 NEC Laboratories America (Ronan Collobert, Leon Bottou, Iain Melvin, Jason Weston)
Copyright (c) 2006 Idiap Research Institute (Samy Bengio)
Copyright (c) 2001-2004 Idiap Research Institute (Ronan Collobert, Samy Bengio, Johnny Mariethoz)
Copyright (c) 2015 Google Inc.
Copyright (c) 2015 Yangqing Jia
Copyright (c) 2013-2016 The Caffe contributors
All rights reserved.
Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES. All rights reserved.
This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
NOTE: The SHMEM allocation limit is set to the default of 64MB. This may be
insufficient for PyTorch. NVIDIA recommends the use of the following flags:
docker run --gpus all --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 …
root@7fff099603ff:/workspace#
And I think it looks lie some issue with the Modulus container.
1 Like
@ngeneva @kdg5424
This has been a known problem with the Modulus containers for some time. The Pytorch container has always worked without issue.
For Modulus 22.09 you had to remove some of the injected files included in the container. Here’s the below dockerfile to generate a working 22.09 container from the existing one
FROM nvcr.io/nvidia/modulus/modulus:22.09
RUN rm -rf \
/usr/lib/x86_64-linux-gnu/libcuda.so* \
/usr/lib/x86_64-linux-gnu/libnvcuvid.so* \
/usr/lib/x86_64-linux-gnu/libnvidia-*.so* \
/usr/local/cuda/compat/lib/*.515.65.01
2 Likes
Thanks for reply.
It works well.
system
Closed
June 26, 2023, 12:55am
9
This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.