Error running 22.07 container with examples - Failed to create shim task

ckitchell · August 3, 2022, 6:25pm

Seems like 22.02 will not install? This is my current error. Can anyone help me with what this is or how to fix it?

docker run --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 --gpus=all -v ${PWD}/examples:/examples -it modulus:22.07 bash
docker: Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as ‘legacy’
nvidia-container-cli: mount error: file creation failed: /var/lib/docker/overlay2/ca5de1d2dbb200798f83372e1d274ce3e2fe6773eb5805ed35859d4e2c02e76f/merged/usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1: file exists: unknown.

ngeneva · August 5, 2022, 1:26am

Hi @ckitchell ,

Is there any chance your running NVIDIA Container Toolkit on WSL? There seems to presently be a known issue with nvidia-docker on Windows systems.

More information:

github.com/NVIDIA/nvidia-docker

WSL2: nvidia-container-cli mount error, libnvidia-ml.so.1: file exists: unknown.

opened 01:50PM - 02 Oct 21 UTC

Mihawk2022

### 1. Issue or feature description I prepare environment follow this [guide:](…https://docs.nvidia.com/cuda/wsl-user-guide/index.html) - Windows 11 build 22000 (Insider Preview Beta Channel) - WSL2, ubuntu20.04 (Linux version is 5.10.16.3) - [CUDA on WSL](https://developer.nvidia.com/cuda/wsl/download) 510.06 - CUDA Toolkit 11-4 (using [WSL-Ubuntu](https://docs.nvidia.com/cuda/wsl-user-guide/index.html#ch03a-setting-up-cuda)) - docker 20.10.8 - nvidia-docker2 2.6.0-1 (with libnvidia-container1_1.5.1-1, libnvidia-container-tools_1.5.1-1, nvidia-container-toolkit_1.5.1-1, nvidia-container-runtime_3.5.0-1) When `sudo docker run --gpus all --runtime=nvidia -it --rm <my image name>`, there comes the issue >docker: Error response from daemon: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: mount error: file creation failed: /var/lib/docker/overlay2/706b1d1b6de681b6daf1cab979336a9d465d9b333962cc17db663f2e334d5776/merged/usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1: file exists: unknown. Though encounter problems when run my own image, this sample just works fine: `docker run --gpus all nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark` ![image](https://user-images.githubusercontent.com/73638271/135861058-9c361324-04b8-49e8-877d-f6acdbd8bca0.png) And I also checked that no nvidia driver installed in my image: `docker exec -it containerID /bin/bash` `apt list --installed` shows there isn't any nvidia* or libnvidia* package, only have some cuda related packages (cuda-compat-10-2, cuda-cudart-10-2, cuda-license-10-2) ### 2. Information #### nvidia-container information from `nvidia-container-cli -k -d /dev/tty info` -- WARNING, the following logs are for debugging purposes only -- I1004 13:41:19.446777 13740 nvc.c:372] initializing library context (version=1.5.1, build=4afad130c4c253abd3b2db563ffe9331594bda41) I1004 13:41:19.447100 13740 nvc.c:346] using root / I1004 13:41:19.447125 13740 nvc.c:347] using ldcache /etc/ld.so.cache I1004 13:41:19.447183 13740 nvc.c:348] using unprivileged user 1000:1000 I1004 13:41:19.447196 13740 nvc.c:389] attempting to load dxcore to see if we are running under Windows Subsystem for Linux (WSL) I1004 13:41:19.465867 13740 dxcore.c:227] Creating a new WDDM Adapter for hAdapter:40000000 luid:f95e09 I1004 13:41:19.478468 13740 dxcore.c:268] Adding new adapter via dxcore hAdapter:40000000 luid:f95e09 wddm version:3000 I1004 13:41:19.478495 13740 dxcore.c:326] dxcore layer initialized successfully W1004 13:41:19.478894 13740 nvc.c:397] skipping kernel modules load on WSL I1004 13:41:19.479135 13741 driver.c:101] starting driver service I1004 13:41:19.537408 13740 nvc_info.c:758] requesting driver information with '' I1004 13:41:19.551091 13740 nvc_info.c:197] selecting /usr/lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.460.91.03 I1004 13:41:19.552122 13740 nvc_info.c:197] selecting /usr/lib/wsl/lib/libnvidia-opticalflow.so.1 I1004 13:41:19.552152 13740 nvc_info.c:197] selecting /usr/lib/x86_64-linux-gnu/libnvidia-opencl.so.460.91.03 I1004 13:41:19.553231 13740 nvc_info.c:197] selecting /usr/lib/wsl/lib/libnvidia-ml.so.1 I1004 13:41:19.553264 13740 nvc_info.c:199] skipping /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.460.91.03 I1004 13:41:19.553295 13740 nvc_info.c:199] skipping /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.460.91.03 I1004 13:41:19.554246 13740 nvc_info.c:197] selecting /usr/lib/wsl/lib/libnvidia-encode.so.1 I1004 13:41:19.554278 13740 nvc_info.c:197] selecting /usr/lib/x86_64-linux-gnu/libnvidia-compiler.so.460.91.03 I1004 13:41:19.555174 13740 nvc_info.c:197] selecting /usr/lib/wsl/lib/libnvcuvid.so.1 I1004 13:41:19.555259 13740 nvc_info.c:197] selecting /usr/lib/wsl/lib/libdxcore.so I1004 13:41:19.556208 13740 nvc_info.c:197] selecting /usr/lib/wsl/lib/libcuda.so.1 I1004 13:41:19.556240 13740 nvc_info.c:199] skipping /usr/lib/x86_64-linux-gnu/libcuda.so.460.91.03 I1004 13:41:19.556348 13740 nvc_info.c:199] skipping /usr/lib/x86_64-linux-gnu/libcuda.so.460.91.03 W1004 13:41:19.556404 13740 nvc_info.c:397] missing library libnvidia-cfg.so W1004 13:41:19.556426 13740 nvc_info.c:397] missing library libnvidia-nscq.so W1004 13:41:19.556429 13740 nvc_info.c:397] missing library libnvidia-fatbinaryloader.so W1004 13:41:19.556431 13740 nvc_info.c:397] missing library libnvidia-allocator.so W1004 13:41:19.556433 13740 nvc_info.c:397] missing library libnvidia-ngx.so W1004 13:41:19.556434 13740 nvc_info.c:397] missing library libvdpau_nvidia.so W1004 13:41:19.556453 13740 nvc_info.c:397] missing library libnvidia-eglcore.so W1004 13:41:19.556456 13740 nvc_info.c:397] missing library libnvidia-glcore.so W1004 13:41:19.556457 13740 nvc_info.c:397] missing library libnvidia-tls.so W1004 13:41:19.556459 13740 nvc_info.c:397] missing library libnvidia-glsi.so W1004 13:41:19.556460 13740 nvc_info.c:397] missing library libnvidia-fbc.so W1004 13:41:19.556462 13740 nvc_info.c:397] missing library libnvidia-ifr.so W1004 13:41:19.556500 13740 nvc_info.c:397] missing library libnvidia-rtcore.so W1004 13:41:19.556506 13740 nvc_info.c:397] missing library libnvoptix.so W1004 13:41:19.556512 13740 nvc_info.c:397] missing library libGLX_nvidia.so W1004 13:41:19.556514 13740 nvc_info.c:397] missing library libEGL_nvidia.so W1004 13:41:19.556521 13740 nvc_info.c:397] missing library libGLESv2_nvidia.so W1004 13:41:19.556524 13740 nvc_info.c:397] missing library libGLESv1_CM_nvidia.so W1004 13:41:19.556526 13740 nvc_info.c:397] missing library libnvidia-glvkspirv.so W1004 13:41:19.556527 13740 nvc_info.c:397] missing library libnvidia-cbl.so W1004 13:41:19.556547 13740 nvc_info.c:401] missing compat32 library libnvidia-ml.so W1004 13:41:19.556555 13740 nvc_info.c:401] missing compat32 library libnvidia-cfg.so W1004 13:41:19.556557 13740 nvc_info.c:401] missing compat32 library libnvidia-nscq.so W1004 13:41:19.556562 13740 nvc_info.c:401] missing compat32 library libcuda.so W1004 13:41:19.556564 13740 nvc_info.c:401] missing compat32 library libnvidia-opencl.so W1004 13:41:19.556583 13740 nvc_info.c:401] missing compat32 library libnvidia-ptxjitcompiler.so W1004 13:41:19.556586 13740 nvc_info.c:401] missing compat32 library libnvidia-fatbinaryloader.so W1004 13:41:19.556587 13740 nvc_info.c:401] missing compat32 library libnvidia-allocator.so W1004 13:41:19.556589 13740 nvc_info.c:401] missing compat32 library libnvidia-compiler.so W1004 13:41:19.556625 13740 nvc_info.c:401] missing compat32 library libnvidia-ngx.so W1004 13:41:19.556629 13740 nvc_info.c:401] missing compat32 library libvdpau_nvidia.so W1004 13:41:19.556632 13740 nvc_info.c:401] missing compat32 library libnvidia-encode.so W1004 13:41:19.556638 13740 nvc_info.c:401] missing compat32 library libnvidia-opticalflow.so W1004 13:41:19.556640 13740 nvc_info.c:401] missing compat32 library libnvcuvid.so W1004 13:41:19.556644 13740 nvc_info.c:401] missing compat32 library libnvidia-eglcore.so W1004 13:41:19.556667 13740 nvc_info.c:401] missing compat32 library libnvidia-glcore.so W1004 13:41:19.556670 13740 nvc_info.c:401] missing compat32 library libnvidia-tls.so W1004 13:41:19.556676 13740 nvc_info.c:401] missing compat32 library libnvidia-glsi.so W1004 13:41:19.556677 13740 nvc_info.c:401] missing compat32 library libnvidia-fbc.so W1004 13:41:19.556679 13740 nvc_info.c:401] missing compat32 library libnvidia-ifr.so W1004 13:41:19.556680 13740 nvc_info.c:401] missing compat32 library libnvidia-rtcore.so W1004 13:41:19.556682 13740 nvc_info.c:401] missing compat32 library libnvoptix.so W1004 13:41:19.556700 13740 nvc_info.c:401] missing compat32 library libGLX_nvidia.so W1004 13:41:19.556703 13740 nvc_info.c:401] missing compat32 library libEGL_nvidia.so W1004 13:41:19.556705 13740 nvc_info.c:401] missing compat32 library libGLESv2_nvidia.so W1004 13:41:19.556740 13740 nvc_info.c:401] missing compat32 library libGLESv1_CM_nvidia.so W1004 13:41:19.556745 13740 nvc_info.c:401] missing compat32 library libnvidia-glvkspirv.so W1004 13:41:19.556746 13740 nvc_info.c:401] missing compat32 library libnvidia-cbl.so W1004 13:41:19.556748 13740 nvc_info.c:401] missing compat32 library libdxcore.so I1004 13:41:19.558106 13740 nvc_info.c:277] selecting /usr/lib/wsl/drivers/nv_dispi.inf_amd64_733101c735b9e264/nvidia-smi W1004 13:41:19.884566 13740 nvc_info.c:423] missing binary nvidia-debugdump W1004 13:41:19.884603 13740 nvc_info.c:423] missing binary nvidia-persistenced W1004 13:41:19.884606 13740 nvc_info.c:423] missing binary nv-fabricmanager W1004 13:41:19.884608 13740 nvc_info.c:423] missing binary nvidia-cuda-mps-control W1004 13:41:19.884609 13740 nvc_info.c:423] missing binary nvidia-cuda-mps-server I1004 13:41:19.884611 13740 nvc_info.c:437] skipping path lookup for dxcore I1004 13:41:19.884617 13740 nvc_info.c:520] listing device /dev/dxg W1004 13:41:19.884653 13740 nvc_info.c:347] missing ipc path /var/run/nvidia-persistenced/socket W1004 13:41:19.884663 13740 nvc_info.c:347] missing ipc path /var/run/nvidia-fabricmanager/socket W1004 13:41:19.884768 13740 nvc_info.c:347] missing ipc path /tmp/nvidia-mps I1004 13:41:19.884791 13740 nvc_info.c:814] requesting device information with '' I1004 13:41:19.896593 13740 nvc_info.c:686] listing dxcore adapter 0 (GPU-4949b172-957c-5479-5dc3-12e0ea688389 at 00000000:2d:00.0) NVRM version: 510.06 CUDA version: 11.2 Device Index: 0 Device Minor: 0 Model: NVIDIA GeForce RTX 2080 Ti Brand: GeForce GPU UUID: GPU-4949b172-957c-5479-5dc3-12e0ea688389 Bus Location: 00000000:2d:00.0 Architecture: 7.5 I1004 13:41:19.896655 13740 nvc.c:423] shutting down library context I1004 13:41:19.897661 13741 driver.c:163] terminating driver service I1004 13:41:19.898674 13740 driver.c:203] driver service terminated successfully #### Kernel version from `uname -a` Linux DESKTOP 5.10.16.3-microsoft-standard-WSL2 #1 SMP Fri Apr 2 22:23:49 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux #### Any relevant kernel output lines from `dmesg` [ 0.000000] Linux version 5.10.16.3-microsoft-standard-WSL2 (oe-user@oe-host) (x86_64-msft-linux-gcc (GCC) 9.3.0, GNU ld (GNU Binutils) 2.34.0.20200220) #1 SMP Fri Apr 2 22:23:49 UTC 2021 [ 0.000000] Command line: initrd=\initrd.img panic=-1 nr_cpus=16 swiotlb=force pty.legacy_count=0 [ 0.000000] KERNEL supported cpus: [ 0.000000] Intel GenuineIntel [ 0.000000] AMD AuthenticAMD [ 0.000000] Centaur CentaurHauls [ 0.000000] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers' [ 0.000000] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers' [ 0.000000] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers' [ 0.000000] x86/fpu: xstate_offset[2]: 576, xstate_sizes[2]: 256 [ 0.000000] x86/fpu: Enabled xstate features 0x7, context size is 832 bytes, using 'compacted' format. [ 0.000000] BIOS-provided physical RAM map: [ 0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009ffff] usable [ 0.000000] BIOS-e820: [mem 0x00000000000e0000-0x00000000000e0fff] reserved [ 0.000000] BIOS-e820: [mem 0x0000000000100000-0x00000000001fffff] ACPI data [ 0.000000] BIOS-e820: [mem 0x0000000000200000-0x00000000f7ffffff] usable [ 0.000000] BIOS-e820: [mem 0x0000000100000000-0x00000004057fffff] usable [ 0.000000] NX (Execute Disable) protection: active [ 0.000000] DMI not present or invalid. [ 0.000000] Hypervisor detected: Microsoft Hyper-V [ 0.000000] Hyper-V: features 0xae7f, privilege high: 0x3b8030, hints 0xc2c, misc 0xe0bed7b2 [ 0.000000] Hyper-V Host Build:22000-10.0-0-0.194 [ 0.000000] Hyper-V: LAPIC Timer Frequency: 0x1e8480 [ 0.000000] Hyper-V: Using hypercall for remote TLB flush [ 0.000000] clocksource: hyperv_clocksource_tsc_page: mask: 0xffffffffffffffff max_cycles: 0x24e6a1710, max_idle_ns: 440795202120 ns [ 0.000001] tsc: Detected 3899.997 MHz processor [ 0.000007] e820: update [mem 0x00000000-0x00000fff] usable ==> reserved [ 0.000008] e820: remove [mem 0x000a0000-0x000fffff] usable [ 0.000010] last_pfn = 0x405800 max_arch_pfn = 0x400000000 [ 0.000033] MTRR default type: uncachable [ 0.000033] MTRR fixed ranges enabled: [ 0.000034] 00000-3FFFF write-back [ 0.000034] 40000-7FFFF uncachable [ 0.000035] 80000-8FFFF write-back [ 0.000035] 90000-FFFFF uncachable [ 0.000035] MTRR variable ranges enabled: [ 0.000036] 0 base 000000000000 mask FFFF00000000 write-back [ 0.000037] 1 base 000100000000 mask FFF000000000 write-back [ 0.000037] 2 disabled [ 0.000037] 3 disabled [ 0.000038] 4 disabled [ 0.000038] 5 disabled [ 0.000038] 6 disabled [ 0.000038] 7 disabled [ 0.000047] x86/PAT: Configuration [0-7]: WB WC UC- UC WB WP UC- WT [ 0.000059] last_pfn = 0xf8000 max_arch_pfn = 0x400000000 [ 0.000071] Using GB pages for direct mapping [ 0.000322] RAMDISK: [mem 0x03035000-0x03043fff] [ 0.000326] ACPI: Early table checksum verification disabled [ 0.000332] ACPI: RSDP 0x00000000000E0000 000024 (v02 VRTUAL) [ 0.000334] ACPI: XSDT 0x0000000000100000 000044 (v01 VRTUAL MICROSFT 00000001 MSFT 00000001) [ 0.000338] ACPI: FACP 0x0000000000101000 000114 (v06 VRTUAL MICROSFT 00000001 MSFT 00000001) [ 0.000341] ACPI: DSDT 0x00000000001011B8 01E184 (v02 MSFTVM DSDT01 00000001 MSFT 05000000) [ 0.000343] ACPI: FACS 0x0000000000101114 000040 [ 0.000344] ACPI: OEM0 0x0000000000101154 000064 (v01 VRTUAL MICROSFT 00000001 MSFT 00000001) [ 0.000346] ACPI: SRAT 0x000000000011F33C 0003B0 (v02 VRTUAL MICROSFT 00000001 MSFT 00000001) [ 0.000347] ACPI: APIC 0x000000000011F6EC 0000C8 (v04 VRTUAL MICROSFT 00000001 MSFT 00000001) [ 0.000351] ACPI: Local APIC address 0xfee00000 [ 0.000516] Zone ranges: [ 0.000517] DMA [mem 0x0000000000001000-0x0000000000ffffff] [ 0.000518] DMA32 [mem 0x0000000001000000-0x00000000ffffffff] [ 0.000519] Normal [mem 0x0000000100000000-0x00000004057fffff] [ 0.000519] Device empty [ 0.000520] Movable zone start for each node [ 0.000520] Early memory node ranges [ 0.000521] node 0: [mem 0x0000000000001000-0x000000000009ffff] [ 0.000522] node 0: [mem 0x0000000000200000-0x00000000f7ffffff] [ 0.000522] node 0: [mem 0x0000000100000000-0x00000004057fffff] [ 0.000857] Zeroed struct page in unavailable ranges: 10593 pages [ 0.000859] Initmem setup node 0 [mem 0x0000000000001000-0x00000004057fffff] [ 0.000860] On node 0 totalpages: 4183711 [ 0.000861] DMA zone: 59 pages used for memmap [ 0.000862] DMA zone: 22 pages reserved [ 0.000862] DMA zone: 3743 pages, LIFO batch:0 [ 0.000884] DMA32 zone: 16320 pages used for memmap [ 0.000884] DMA32 zone: 1011712 pages, LIFO batch:63 [ 0.010695] Normal zone: 49504 pages used for memmap [ 0.010698] Normal zone: 3168256 pages, LIFO batch:63 [ 0.011050] ACPI: Local APIC address 0xfee00000 [ 0.011055] ACPI: LAPIC_NMI (acpi_id[0x01] dfl dfl lint[0x1]) [ 0.011340] IOAPIC[0]: apic_id 16, version 17, address 0xfec00000, GSI 0-23 [ 0.011344] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level) [ 0.011345] ACPI: IRQ9 used by override. [ 0.011346] Using ACPI (MADT) for SMP configuration information [ 0.011353] smpboot: Allowing 16 CPUs, 0 hotplug CPUs [ 0.011362] [mem 0xf8000000-0xffffffff] available for PCI devices [ 0.011363] Booting paravirtualized kernel on Hyper-V [ 0.011365] clocksource: refined-jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 19112604462750000 ns [ 0.015482] setup_percpu: NR_CPUS:256 nr_cpumask_bits:256 nr_cpu_ids:16 nr_node_ids:1 [ 0.016192] percpu: Embedded 52 pages/cpu s173272 r8192 d31528 u262144 [ 0.016196] pcpu-alloc: s173272 r8192 d31528 u262144 alloc=1*2097152 [ 0.016197] pcpu-alloc: [0] 00 01 02 03 04 05 06 07 [0] 08 09 10 11 12 13 14 15 [ 0.016212] Built 1 zonelists, mobility grouping on. Total pages: 4117806 [ 0.016214] Kernel command line: initrd=\initrd.img panic=-1 nr_cpus=16 swiotlb=force pty.legacy_count=0 [ 0.018810] Dentry cache hash table entries: 2097152 (order: 12, 16777216 bytes, linear) [ 0.019993] Inode-cache hash table entries: 1048576 (order: 11, 8388608 bytes, linear) [ 0.020038] mem auto-init: stack:off, heap alloc:off, heap free:off [ 0.036796] Memory: 4094128K/16734844K available (16403K kernel code, 2459K rwdata, 3464K rodata, 1444K init, 1164K bss, 388996K reserved, 0K cma-reserved) [ 0.036832] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=16, Nodes=1 [ 0.036840] ftrace: allocating 49613 entries in 194 pages [ 0.048726] ftrace: allocated 194 pages with 3 groups [ 0.048929] rcu: Hierarchical RCU implementation. [ 0.048930] rcu: RCU restricting CPUs from NR_CPUS=256 to nr_cpu_ids=16. [ 0.048931] Rude variant of Tasks RCU enabled. [ 0.048931] Tracing variant of Tasks RCU enabled. [ 0.048931] rcu: RCU calculated value of scheduler-enlistment delay is 10 jiffies. [ 0.048932] rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=16 [ 0.051184] Using NULL legacy PIC [ 0.051186] NR_IRQS: 16640, nr_irqs: 552, preallocated irqs: 0 [ 0.051565] random: crng done (trusting CPU's manufacturer) [ 0.051585] Console: colour dummy device 80x25 [ 0.051591] printk: console [tty0] enabled [ 0.051595] ACPI: Core revision 20200925 [ 0.051693] Failed to register legacy timer interrupt [ 0.051694] APIC: Switch to symmetric I/O mode setup [ 0.051695] Switched APIC routing to physical flat. [ 0.051850] Hyper-V: Using IPI hypercalls [ 0.051851] Hyper-V: Using enlightened APIC (xapic mode) [ 0.051922] clocksource: tsc-early: mask: 0xffffffffffffffff max_cycles: 0x706eb0792cc, max_idle_ns: 881591209130 ns [ 0.051925] Calibrating delay loop (skipped), value calculated using timer frequency.. 7799.99 BogoMIPS (lpj=38999970) [ 0.051926] pid_max: default: 32768 minimum: 301 [ 0.051936] LSM: Security Framework initializing [ 0.051958] Mount-cache hash table entries: 32768 (order: 6, 262144 bytes, linear) [ 0.051977] Mountpoint-cache hash table entries: 32768 (order: 6, 262144 bytes, linear) [ 0.052150] x86/cpu: User Mode Instruction Prevention (UMIP) activated [ 0.052167] Last level iTLB entries: 4KB 1024, 2MB 1024, 4MB 512 [ 0.052168] Last level dTLB entries: 4KB 2048, 2MB 2048, 4MB 1024, 1GB 0 [ 0.052170] Spectre V1 : Mitigation: usercopy/swapgs barriers and __user pointer sanitization [ 0.052170] Spectre V2 : Mitigation: Full AMD retpoline [ 0.052171] Spectre V2 : Spectre v2 / SpectreRSB mitigation: Filling RSB on context switch [ 0.052172] Spectre V2 : mitigation: Enabling conditional Indirect Branch Prediction Barrier [ 0.052172] Spectre V2 : User space: Mitigation: STIBP via seccomp and prctl [ 0.052173] Speculative Store Bypass: Mitigation: Speculative Store Bypass disabled via prctl and seccomp [ 0.052292] Freeing SMP alternatives memory: 52K [ 0.052344] smpboot: CPU0: AMD Ryzen 7 3800X 8-Core Processor (family: 0x17, model: 0x71, stepping: 0x0) [ 0.052403] Performance Events: PMU not available due to virtualization, using software events only. [ 0.052423] rcu: Hierarchical SRCU implementation. [ 0.052753] smp: Bringing up secondary CPUs ... [ 0.052800] x86: Booting SMP configuration: [ 0.052801] .... node #0, CPUs: #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 #11 #12 #13 #14 #15 [ 0.053300] smp: Brought up 1 node, 16 CPUs [ 0.053300] smpboot: Max logical packages: 1 [ 0.053300] smpboot: Total of 16 processors activated (124799.90 BogoMIPS) [ 0.073395] node 0 deferred pages initialised in 10ms [ 0.075402] devtmpfs: initialized [ 0.075402] x86/mm: Memory block size: 128MB [ 0.075402] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 19112604462750000 ns [ 0.075402] futex hash table entries: 4096 (order: 6, 262144 bytes, linear) [ 0.075402] NET: Registered protocol family 16 [ 0.075402] thermal_sys: Registered thermal governor 'step_wise' [ 0.075402] cpuidle: using governor menu [ 0.075402] ACPI: bus type PCI registered [ 0.075402] PCI: Fatal: No config space access function found [ 0.075402] HugeTLB registered 1.00 GiB page size, pre-allocated 0 pages [ 0.075402] HugeTLB registered 2.00 MiB page size, pre-allocated 0 pages [ 0.082164] raid6: skip pq benchmark and using algorithm avx2x4 [ 0.082164] raid6: using avx2x2 recovery algorithm [ 0.082164] ACPI: Added _OSI(Module Device) [ 0.082164] ACPI: Added _OSI(Processor Device) [ 0.082164] ACPI: Added _OSI(3.0 _SCP Extensions) [ 0.082164] ACPI: Added _OSI(Processor Aggregator Device) [ 0.082164] ACPI: Added _OSI(Linux-Dell-Video) [ 0.082164] ACPI: Added _OSI(Linux-Lenovo-NV-HDMI-Audio) [ 0.082164] ACPI: Added _OSI(Linux-HPI-Hybrid-Graphics) [ 0.085313] ACPI: 1 ACPI AML tables successfully acquired and loaded [ 0.086035] ACPI: Interpreter enabled [ 0.086038] ACPI: (supports S0 S5) [ 0.086039] ACPI: Using IOAPIC for interrupt routing [ 0.086046] PCI: Using host bridge windows from ACPI; if necessary, use "pci=nocrs" and report a bug [ 0.086138] ACPI: Enabled 1 GPEs in block 00 to 0F [ 0.086794] iommu: Default domain type: Translated [ 0.086851] SCSI subsystem initialized [ 0.086881] hv_vmbus: Vmbus version:5.2 [ 0.086881] PCI: Using ACPI for IRQ routing [ 0.086881] PCI: System does not support PCI [ 0.086881] hv_vmbus: Unknown GUID: c376c1c3-d276-48d2-90a9-c04748072c60 [ 0.086881] hv_vmbus: Unknown GUID: 6e382d18-3336-4f4b-acc4-2b7703d4df4a [ 0.086881] clocksource: Switched to clocksource tsc-early [ 0.086881] hv_vmbus: Unknown GUID: dde9cbc0-5060-4436-9448-ea1254a5d177 [ 0.170448] VFS: Disk quotas dquot_6.6.0 [ 0.170458] VFS: Dquot-cache hash table entries: 512 (order 0, 4096 bytes) [ 0.170473] FS-Cache: Loaded [ 0.170496] pnp: PnP ACPI init [ 0.170537] pnp 00:00: Plug and Play ACPI device, IDs PNP0b00 (active) [ 0.170571] pnp: PnP ACPI: found 1 devices [ 0.174903] NET: Registered protocol family 2 [ 0.175138] tcp_listen_portaddr_hash hash table entries: 8192 (order: 5, 131072 bytes, linear) [ 0.175316] TCP established hash table entries: 131072 (order: 8, 1048576 bytes, linear) [ 0.175416] TCP bind hash table entries: 65536 (order: 8, 1048576 bytes, linear) [ 0.175625] TCP: Hash tables configured (established 131072 bind 65536) [ 0.175649] UDP hash table entries: 8192 (order: 6, 262144 bytes, linear) [ 0.175671] UDP-Lite hash table entries: 8192 (order: 6, 262144 bytes, linear) [ 0.175712] NET: Registered protocol family 1 [ 0.176005] RPC: Registered named UNIX socket transport module. [ 0.176006] RPC: Registered udp transport module. [ 0.176007] RPC: Registered tcp transport module. [ 0.176007] RPC: Registered tcp NFSv4.1 backchannel transport module. [ 0.176009] PCI: CLS 0 bytes, default 64 [ 0.176049] Trying to unpack rootfs image as initramfs... [ 0.176181] Freeing initrd memory: 60K [ 0.176183] PCI-DMA: Using software bounce buffering for IO (SWIOTLB) [ 0.176185] software IO TLB: mapped [mem 0x00000000f4000000-0x00000000f8000000] (64MB) [ 0.177614] kvm: no hardware support [ 0.178295] kvm: Nested Virtualization enabled [ 0.178301] SVM: kvm: Nested Paging enabled [ 0.178301] SVM: Virtual VMLOAD VMSAVE supported [ 0.181019] Initialise system trusted keyrings [ 0.181118] workingset: timestamp_bits=46 max_order=22 bucket_order=0 [ 0.181643] squashfs: version 4.0 (2009/01/31) Phillip Lougher [ 0.182012] NFS: Registering the id_resolver key type [ 0.182019] Key type id_resolver registered [ 0.182019] Key type id_legacy registered [ 0.182021] Installing knfsd (copyright (C) 1996 okir@monad.swb.de). [ 0.182442] Key type cifs.idmap registered [ 0.182496] fuse: init (API version 7.32) [ 0.182618] SGI XFS with ACLs, security attributes, realtime, scrub, repair, quota, no debug enabled [ 0.182874] 9p: Installing v9fs 9p2000 file system support [ 0.182880] FS-Cache: Netfs '9p' registered for caching [ 0.182908] FS-Cache: Netfs 'ceph' registered for caching [ 0.182910] ceph: loaded (mds proto 32) [ 0.185420] NET: Registered protocol family 38 [ 0.185422] xor: automatically using best checksumming function avx [ 0.185423] Key type asymmetric registered [ 0.185424] Asymmetric key parser 'x509' registered [ 0.185429] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 250) [ 0.186121] hv_vmbus: registering driver hv_pci [ 0.186439] hv_pci b85a1f33-3b6d-4a2b-982d-0ce62be71656: PCI VMBus probing: Using version 0x10003 [ 0.187115] hv_pci b85a1f33-3b6d-4a2b-982d-0ce62be71656: PCI host bridge to bus 3b6d:00 [ 0.187471] pci 3b6d:00:00.0: [1414:008e] type 00 class 0x030200 [ 0.191995] Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled [ 0.192317] Non-volatile memory driver v1.3 [ 0.194890] brd: module loaded [ 0.195604] loop: module loaded [ 0.195630] hv_vmbus: registering driver hv_storvsc [ 0.195949] wireguard: WireGuard 1.0.0 loaded. See www.wireguard.com for information. [ 0.195950] wireguard: Copyright (C) 2015-2019 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved. [ 0.195962] tun: Universal TUN/TAP device driver, 1.6 [ 0.196041] PPP generic driver version 2.4.2 [ 0.196142] PPP BSD Compression module registered [ 0.196143] PPP Deflate Compression module registered [ 0.196144] PPP MPPE Compression module registered [ 0.196145] NET: Registered protocol family 24 [ 0.196149] hv_vmbus: registering driver hv_netvsc [ 0.196242] VFIO - User Level meta-driver version: 0.3 [ 0.196361] hv_vmbus: registering driver hyperv_keyboard [ 0.196496] rtc_cmos 00:00: RTC can wake from S4 [ 0.196809] scsi host0: storvsc_host_t [ 0.197753] rtc_cmos 00:00: registered as rtc0 [ 0.198038] rtc_cmos 00:00: setting system clock to 2021-10-03T15:03:26 UTC (1633273406) [ 0.198046] rtc_cmos 00:00: alarms up to one month, 114 bytes nvram [ 0.198221] device-mapper: ioctl: 4.43.0-ioctl (2020-10-01) initialised: dm-devel@redhat.com [ 0.198335] device-mapper: raid: Loading target version 1.15.1 [ 0.198404] hv_utils: Registering HyperV Utility Driver [ 0.198405] hv_vmbus: registering driver hv_utils [ 0.198429] hv_vmbus: registering driver hv_balloon [ 0.198437] hv_vmbus: registering driver dxgkrnl [ 0.198452] (NULL device *): dxgk: dxg_drv_init Version: 2103 [ 0.198453] hv_utils: cannot register PTP clock: 0 [ 0.198736] hv_balloon: Using Dynamic Memory protocol version 2.0 [ 0.198827] hv_utils: TimeSync IC version 4.0 [ 0.199020] drop_monitor: Initializing network drop monitor service [ 0.199043] Mirror/redirect action on [ 0.199390] Free page reporting enabled [ 0.199392] hv_balloon: Cold memory discard hint enabled [ 0.199630] (NULL device *): dxgk: mmio allocated 9ffe00000 200000000 9ffe00000 bffdfffff [ 0.199802] IPVS: Registered protocols (TCP, UDP) [ 0.199813] IPVS: Connection hash table configured (size=4096, memory=64Kbytes) [ 0.199835] IPVS: ipvs loaded. [ 0.199836] IPVS: [rr] scheduler registered. [ 0.199836] IPVS: [wrr] scheduler registered. [ 0.199836] IPVS: [sh] scheduler registered. [ 0.199864] ipip: IPv4 and MPLS over IPv4 tunneling driver [ 0.201991] ipt_CLUSTERIP: ClusterIP Version 0.8 loaded successfully [ 0.202382] Initializing XFRM netlink socket [ 0.202426] NET: Registered protocol family 10 [ 0.202648] Segment Routing with IPv6 [ 0.203692] sit: IPv6, IPv4 and MPLS over IPv4 tunneling driver [ 0.203777] NET: Registered protocol family 17 [ 0.203790] Bridge firewalling registered [ 0.203796] 8021q: 802.1Q VLAN Support v1.8 [ 0.203808] sctp: Hash tables configured (bind 256/256) [ 0.203842] 9pnet: Installing 9P2000 support [ 0.203855] Key type dns_resolver registered [ 0.203863] Key type ceph registered [ 0.203976] libceph: loaded (mon/osd proto 15/24) [ 0.204044] NET: Registered protocol family 40 [ 0.204045] hv_vmbus: registering driver hv_sock [ 0.204071] IPI shorthand broadcast: enabled [ 0.204077] sched_clock: Marking stable (203581151, 453300)->(215942200, -11907749) [ 0.204331] registered taskstats version 1 [ 0.204338] Loading compiled-in X.509 certificates [ 0.204648] Btrfs loaded, crc32c=crc32c-generic [ 0.206255] Freeing unused kernel image (initmem) memory: 1444K [ 0.271961] Write protecting the kernel read-only data: 22528k [ 0.272551] Freeing unused kernel image (text/rodata gap) memory: 2028K [ 0.273043] Freeing unused kernel image (rodata/data gap) memory: 632K [ 0.273048] Run /init as init process [ 0.273048] with arguments: [ 0.273048] /init [ 0.273049] with environment: [ 0.273049] HOME=/ [ 0.273049] TERM=linux [ 0.829032] scsi 0:0:0:0: Direct-Access Msft Virtual Disk 1.0 PQ: 0 ANSI: 5 [ 0.829421] sd 0:0:0:0: Attached scsi generic sg0 type 0 [ 0.830236] sd 0:0:0:0: [sda] 536870912 512-byte logical blocks: (275 GB/256 GiB) [ 0.830238] sd 0:0:0:0: [sda] 4096-byte physical blocks [ 0.830362] sd 0:0:0:0: [sda] Write Protect is off [ 0.830364] sd 0:0:0:0: [sda] Mode Sense: 0f 00 00 00 [ 0.830557] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [ 0.874243] hv_pci bb4321df-980a-4d21-afdb-589c18527bf9: PCI VMBus probing: Using version 0x10003 [ 0.915773] hv_pci bb4321df-980a-4d21-afdb-589c18527bf9: PCI host bridge to bus 980a:00 [ 0.915775] pci_bus 980a:00: root bus resource [mem 0xbffe00000-0xbffe02fff window] [ 0.916751] pci 980a:00:00.0: [1af4:1049] type 00 class 0x010000 [ 0.917716] pci 980a:00:00.0: reg 0x10: [mem 0xbffe00000-0xbffe00fff 64bit] [ 0.918396] pci 980a:00:00.0: reg 0x18: [mem 0xbffe01000-0xbffe01fff 64bit] [ 0.919017] pci 980a:00:00.0: reg 0x20: [mem 0xbffe02000-0xbffe02fff 64bit] [ 0.922797] pci 980a:00:00.0: BAR 0: assigned [mem 0xbffe00000-0xbffe00fff 64bit] [ 0.923220] pci 980a:00:00.0: BAR 2: assigned [mem 0xbffe01000-0xbffe01fff 64bit] [ 0.923644] pci 980a:00:00.0: BAR 4: assigned [mem 0xbffe02000-0xbffe02fff 64bit] [ 1.116874] EXT4-fs (sda): mounted filesystem with ordered data mode. Opts: (null) [ 1.202006] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready [ 1.202180] sd 0:0:0:0: [sda] Attached SCSI disk [ 1.251980] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x706eb0792cc, max_idle_ns: 881591209130 ns [ 1.252943] clocksource: Switched to clocksource tsc [ 1.881960] Adding 4194304k swap on /swap/file. Priority:-2 extents:3 across:4210688k [ 3.152119] scsi 0:0:0:1: Direct-Access Msft Virtual Disk 1.0 PQ: 0 ANSI: 5 [ 3.152455] sd 0:0:0:1: Attached scsi generic sg1 type 0 [ 3.152998] sd 0:0:0:1: [sdb] 536870912 512-byte logical blocks: (275 GB/256 GiB) [ 3.152999] sd 0:0:0:1: [sdb] 4096-byte physical blocks [ 3.153082] sd 0:0:0:1: [sdb] Write Protect is off [ 3.153083] sd 0:0:0:1: [sdb] Mode Sense: 0f 00 00 00 [ 3.153213] sd 0:0:0:1: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [ 3.154369] sd 0:0:0:1: [sdb] Attached SCSI disk [ 3.160357] EXT4-fs (sdb): mounted filesystem with ordered data mode. Opts: discard,errors=remount-ro,data=ordered [ 3.215983] FS-Cache: Duplicate cookie detected [ 3.215986] FS-Cache: O-cookie c=00000000aa466783 [p=000000006f69fc41 fl=222 nc=0 na=1] [ 3.215987] FS-Cache: O-cookie d=0000000077b88f2e n=00000000cab53c7d [ 3.215987] FS-Cache: O-key=[10] '34323934393337363132' [ 3.215991] FS-Cache: N-cookie c=0000000061e3e253 [p=000000006f69fc41 fl=2 nc=0 na=1] [ 3.215991] FS-Cache: N-cookie d=0000000077b88f2e n=00000000485d5ccb [ 3.215992] FS-Cache: N-key=[10] '34323934393337363132' [ 3.285697] hv_pci d5ce7240-e76a-439c-ad60-bb77c783e7c5: PCI VMBus probing: Using version 0x10003 [ 3.286638] 9pnet_virtio: no channels available for device drvfs [ 3.286641] WARNING: mount: waiting for virtio device... [ 3.325716] hv_pci d5ce7240-e76a-439c-ad60-bb77c783e7c5: PCI host bridge to bus e76a:00 [ 3.325718] pci_bus e76a:00: root bus resource [mem 0xbffe04000-0xbffe06fff window] [ 3.326672] pci e76a:00:00.0: [1af4:1049] type 00 class 0x010000 [ 3.327614] pci e76a:00:00.0: reg 0x10: [mem 0xbffe04000-0xbffe04fff 64bit] [ 3.328222] pci e76a:00:00.0: reg 0x18: [mem 0xbffe05000-0xbffe05fff 64bit] [ 3.328821] pci e76a:00:00.0: reg 0x20: [mem 0xbffe06000-0xbffe06fff 64bit] [ 3.332517] pci e76a:00:00.0: BAR 0: assigned [mem 0xbffe04000-0xbffe04fff 64bit] [ 3.333024] pci e76a:00:00.0: BAR 2: assigned [mem 0xbffe05000-0xbffe05fff 64bit] [ 3.333449] pci e76a:00:00.0: BAR 4: assigned [mem 0xbffe06000-0xbffe06fff 64bit] [ 3.390415] hv_pci 3f8e3335-82c2-499f-8995-e1c33b9178df: PCI VMBus probing: Using version 0x10003 [ 3.391719] 9pnet_virtio: no channels available for device drvfs [ 3.391721] WARNING: mount: waiting for virtio device... [ 3.430257] hv_pci 3f8e3335-82c2-499f-8995-e1c33b9178df: PCI host bridge to bus 82c2:00 [ 3.430259] pci_bus 82c2:00: root bus resource [mem 0xbffe08000-0xbffe0afff window] [ 3.431241] pci 82c2:00:00.0: [1af4:1049] type 00 class 0x010000 [ 3.432187] pci 82c2:00:00.0: reg 0x10: [mem 0xbffe08000-0xbffe08fff 64bit] [ 3.432796] pci 82c2:00:00.0: reg 0x18: [mem 0xbffe09000-0xbffe09fff 64bit] [ 3.433396] pci 82c2:00:00.0: reg 0x20: [mem 0xbffe0a000-0xbffe0afff 64bit] [ 3.437087] pci 82c2:00:00.0: BAR 0: assigned [mem 0xbffe08000-0xbffe08fff 64bit] [ 3.437505] pci 82c2:00:00.0: BAR 2: assigned [mem 0xbffe09000-0xbffe09fff 64bit] [ 3.437940] pci 82c2:00:00.0: BAR 4: assigned [mem 0xbffe0a000-0xbffe0afff 64bit] [ 3.495623] hv_pci 1b1a11d5-ded9-4bdc-b728-16a6ce447102: PCI VMBus probing: Using version 0x10003 [ 3.536074] hv_pci 1b1a11d5-ded9-4bdc-b728-16a6ce447102: PCI host bridge to bus ded9:00 [ 3.536076] pci_bus ded9:00: root bus resource [mem 0xbffe0c000-0xbffe0efff window] [ 3.537089] pci ded9:00:00.0: [1af4:1049] type 00 class 0x010000 [ 3.537996] pci ded9:00:00.0: reg 0x10: [mem 0xbffe0c000-0xbffe0cfff 64bit] [ 3.538600] pci ded9:00:00.0: reg 0x18: [mem 0xbffe0d000-0xbffe0dfff 64bit] [ 3.539322] pci ded9:00:00.0: reg 0x20: [mem 0xbffe0e000-0xbffe0efff 64bit] [ 3.543300] pci ded9:00:00.0: BAR 0: assigned [mem 0xbffe0c000-0xbffe0cfff 64bit] [ 3.543740] pci ded9:00:00.0: BAR 2: assigned [mem 0xbffe0d000-0xbffe0dfff 64bit] [ 3.544177] pci ded9:00:00.0: BAR 4: assigned [mem 0xbffe0e000-0xbffe0efff 64bit] [ 49.061594] hv_balloon: Max. dynamic memory size: 16344 MB [ 71.292198] TCP: eth0: Driver has suspect GRO implementation, TCP performance may be compromised. [ 8678.849099] docker0: port 1(veth07ad0a7) entered blocking state [ 8678.849101] docker0: port 1(veth07ad0a7) entered disabled state [ 8678.849121] device veth07ad0a7 entered promiscuous mode [ 8678.849150] docker0: port 1(veth07ad0a7) entered blocking state [ 8678.849151] docker0: port 1(veth07ad0a7) entered forwarding state [ 8678.849472] docker0: port 1(veth07ad0a7) entered disabled state [ 8678.990265] cgroup: runc (5415) created nested cgroup for controller "memory" which has incomplete hierarchy support. Nested cgroups may change behavior in the future. [ 8678.990266] cgroup: "memory" requires setting use_hierarchy to 1 on the root [ 8678.990549] cgroup: cgroup: disabling cgroup2 socket matching due to net_prio or net_cls activation [ 8679.419693] eth0: renamed from veth984197c [ 8679.459677] IPv6: ADDRCONF(NETDEV_CHANGE): veth07ad0a7: link becomes ready [ 8679.459697] docker0: port 1(veth07ad0a7) entered blocking state [ 8679.459697] docker0: port 1(veth07ad0a7) entered forwarding state [ 8679.459722] IPv6: ADDRCONF(NETDEV_CHANGE): docker0: link becomes ready [ 8680.288430] veth984197c: renamed from eth0 [ 8680.349650] docker0: port 1(veth07ad0a7) entered disabled state [ 8680.445249] docker0: port 1(veth07ad0a7) entered disabled state [ 8680.445930] device veth07ad0a7 left promiscuous mode [ 8680.445948] docker0: port 1(veth07ad0a7) entered disabled state [ 8871.582213] docker0: port 1(veth2124c65) entered blocking state [ 8871.582215] docker0: port 1(veth2124c65) entered disabled state [ 8871.582233] device veth2124c65 entered promiscuous mode [ 8872.129587] eth0: renamed from veth99f60f2 [ 8872.189745] IPv6: ADDRCONF(NETDEV_CHANGE): veth2124c65: link becomes ready [ 8872.189767] docker0: port 1(veth2124c65) entered blocking state [ 8872.189768] docker0: port 1(veth2124c65) entered forwarding state [ 9039.653247] process 'local/cuda-10.2/bin/ptxas' started with executable stack [ 9387.169252] docker0: port 2(veth0ab1b19) entered blocking state [ 9387.169254] docker0: port 2(veth0ab1b19) entered disabled state [ 9387.169274] device veth0ab1b19 entered promiscuous mode [ 9387.169302] docker0: port 2(veth0ab1b19) entered blocking state [ 9387.169302] docker0: port 2(veth0ab1b19) entered forwarding state [ 9387.169669] docker0: port 2(veth0ab1b19) entered disabled state [ 9387.657707] docker0: port 2(veth0ab1b19) entered disabled state [ 9387.657920] device veth0ab1b19 left promiscuous mode [ 9387.657937] docker0: port 2(veth0ab1b19) entered disabled state [ 9417.075476] nf_conntrack: default automatic helper assignment has been turned off for security reasons and CT-based firewall rule not found. Use the iptables CT target to attach helpers instead. [40931.406310] docker0: port 2(veth8968728) entered blocking state [40931.406311] docker0: port 2(veth8968728) entered disabled state [40931.406330] device veth8968728 entered promiscuous mode [40931.780035] eth0: renamed from veth8b0ae09 [40931.840207] IPv6: ADDRCONF(NETDEV_CHANGE): veth8968728: link becomes ready [40931.840231] docker0: port 2(veth8968728) entered blocking state [40931.840232] docker0: port 2(veth8968728) entered forwarding state [41888.847459] docker0: port 1(veth2124c65) entered disabled state [41888.847547] veth99f60f2: renamed from eth0 [41888.994901] docker0: port 1(veth2124c65) entered disabled state [41888.995012] device veth2124c65 left promiscuous mode [41888.995014] docker0: port 1(veth2124c65) entered disabled state [41899.075265] docker0: port 2(veth8968728) entered disabled state [41899.075320] veth8b0ae09: renamed from eth0 [41899.195126] docker0: port 2(veth8968728) entered disabled state [41899.195201] device veth8968728 left promiscuous mode [41899.195202] docker0: port 2(veth8968728) entered disabled state [44983.095711] docker0: port 1(veth579ec1c) entered blocking state [44983.095713] docker0: port 1(veth579ec1c) entered disabled state [44983.095767] device veth579ec1c entered promiscuous mode [44983.095802] docker0: port 1(veth579ec1c) entered blocking state [44983.095803] docker0: port 1(veth579ec1c) entered forwarding state [44983.096169] docker0: port 1(veth579ec1c) entered disabled state [44983.558932] eth0: renamed from vethe31675b [44983.609007] IPv6: ADDRCONF(NETDEV_CHANGE): veth579ec1c: link becomes ready [44983.609031] docker0: port 1(veth579ec1c) entered blocking state [44983.609032] docker0: port 1(veth579ec1c) entered forwarding state [48140.938717] docker0: port 2(vethe31a522) entered blocking state [48140.938720] docker0: port 2(vethe31a522) entered disabled state [48140.938783] device vethe31a522 entered promiscuous mode [48140.938815] docker0: port 2(vethe31a522) entered blocking state [48140.938815] docker0: port 2(vethe31a522) entered forwarding state [48140.939141] docker0: port 2(vethe31a522) entered disabled state [48140.953626] docker0: port 2(vethe31a522) entered disabled state [48140.953890] device vethe31a522 left promiscuous mode [48140.953910] docker0: port 2(vethe31a522) entered disabled state [48163.430815] docker0: port 2(veth0cd2f65) entered blocking state [48163.430817] docker0: port 2(veth0cd2f65) entered disabled state [48163.430836] device veth0cd2f65 entered promiscuous mode [48164.076307] docker0: port 2(veth0cd2f65) entered disabled state [48164.076630] device veth0cd2f65 left promiscuous mode [48164.076652] docker0: port 2(veth0cd2f65) entered disabled state [48359.265419] docker0: port 2(veth87ce69e) entered blocking state [48359.265420] docker0: port 2(veth87ce69e) entered disabled state [48359.265439] device veth87ce69e entered promiscuous mode [48359.265464] docker0: port 2(veth87ce69e) entered blocking state [48359.265465] docker0: port 2(veth87ce69e) entered forwarding state [48359.265939] docker0: port 2(veth87ce69e) entered disabled state [48359.975849] docker0: port 2(veth87ce69e) entered disabled state [48359.975930] device veth87ce69e left promiscuous mode [48359.975932] docker0: port 2(veth87ce69e) entered disabled state [63661.051609] docker0: port 2(veth2a489c7) entered blocking state [63661.051611] docker0: port 2(veth2a489c7) entered disabled state [63661.051692] device veth2a489c7 entered promiscuous mode [63661.051745] docker0: port 2(veth2a489c7) entered blocking state [63661.051747] docker0: port 2(veth2a489c7) entered forwarding state [63661.052438] docker0: port 2(veth2a489c7) entered disabled state [63661.065926] docker0: port 2(veth2a489c7) entered disabled state [63661.065991] device veth2a489c7 left promiscuous mode [63661.065992] docker0: port 2(veth2a489c7) entered disabled state [63687.006899] docker0: port 2(veth2cbdb00) entered blocking state [63687.006901] docker0: port 2(veth2cbdb00) entered disabled state [63687.006921] device veth2cbdb00 entered promiscuous mode [63687.533240] eth0: renamed from veth869fda5 [63687.613534] IPv6: ADDRCONF(NETDEV_CHANGE): veth2cbdb00: link becomes ready [63687.613555] docker0: port 2(veth2cbdb00) entered blocking state [63687.613556] docker0: port 2(veth2cbdb00) entered forwarding state [63741.561335] docker0: port 3(veth4e13cbd) entered blocking state [63741.561337] docker0: port 3(veth4e13cbd) entered disabled state [63741.561359] device veth4e13cbd entered promiscuous mode [63741.561385] docker0: port 3(veth4e13cbd) entered blocking state [63741.561386] docker0: port 3(veth4e13cbd) entered forwarding state [63741.561689] docker0: port 3(veth4e13cbd) entered disabled state [63742.201594] docker0: port 3(veth4e13cbd) entered disabled state [63742.201696] device veth4e13cbd left promiscuous mode [63742.201697] docker0: port 3(veth4e13cbd) entered disabled state [63945.395071] docker0: port 3(veth5172a71) entered blocking state [63945.395073] docker0: port 3(veth5172a71) entered disabled state [63945.395096] device veth5172a71 entered promiscuous mode [63945.395127] docker0: port 3(veth5172a71) entered blocking state [63945.395127] docker0: port 3(veth5172a71) entered forwarding state [63945.395248] docker0: port 3(veth5172a71) entered disabled state [63946.001462] docker0: port 3(veth5172a71) entered disabled state [63946.001557] device veth5172a71 left promiscuous mode [63946.001558] docker0: port 3(veth5172a71) entered disabled state [63986.856749] docker0: port 2(veth2cbdb00) entered disabled state [63986.856794] veth869fda5: renamed from eth0 [63986.998482] docker0: port 2(veth2cbdb00) entered disabled state [63986.999130] device veth2cbdb00 left promiscuous mode [63986.999133] docker0: port 2(veth2cbdb00) entered disabled state [63987.085545] vethe31675b: renamed from eth0 [63987.213378] docker0: port 1(veth579ec1c) entered disabled state [63987.218358] docker0: port 1(veth579ec1c) entered disabled state [63987.218861] device veth579ec1c left promiscuous mode [63987.218862] docker0: port 1(veth579ec1c) entered disabled state [64786.418297] docker0: port 1(vethead51d5) entered blocking state [64786.418299] docker0: port 1(vethead51d5) entered disabled state [64786.418318] device vethead51d5 entered promiscuous mode [64786.872856] eth0: renamed from veth99b057f [64786.932949] IPv6: ADDRCONF(NETDEV_CHANGE): vethead51d5: link becomes ready [64786.932966] docker0: port 1(vethead51d5) entered blocking state [64786.932967] docker0: port 1(vethead51d5) entered forwarding state [64787.786553] docker0: port 1(vethead51d5) entered disabled state [64787.786605] veth99b057f: renamed from eth0 [64787.948146] docker0: port 1(vethead51d5) entered disabled state [64787.948881] device vethead51d5 left promiscuous mode [64787.948915] docker0: port 1(vethead51d5) entered disabled state [64807.747511] docker0: port 1(vethb24ff9b) entered blocking state [64807.747512] docker0: port 1(vethb24ff9b) entered disabled state [64807.747531] device vethb24ff9b entered promiscuous mode [64807.747561] docker0: port 1(vethb24ff9b) entered blocking state [64807.747562] docker0: port 1(vethb24ff9b) entered forwarding state [64807.747703] docker0: port 1(vethb24ff9b) entered disabled state [64808.132878] eth0: renamed from vetha7cd44c [64808.193099] IPv6: ADDRCONF(NETDEV_CHANGE): vethb24ff9b: link becomes ready [64808.193123] docker0: port 1(vethb24ff9b) entered blocking state [64808.193124] docker0: port 1(vethb24ff9b) entered forwarding state [64809.023917] vetha7cd44c: renamed from eth0 [64809.183134] docker0: port 1(vethb24ff9b) entered disabled state [64809.188226] docker0: port 1(vethb24ff9b) entered disabled state [64809.188767] device vethb24ff9b left promiscuous mode [64809.188769] docker0: port 1(vethb24ff9b) entered disabled state [65194.352051] docker0: port 1(veth9b9311c) entered blocking state [65194.352053] docker0: port 1(veth9b9311c) entered disabled state [65194.352072] device veth9b9311c entered promiscuous mode [65194.352101] docker0: port 1(veth9b9311c) entered blocking state [65194.352102] docker0: port 1(veth9b9311c) entered forwarding state [65194.352424] docker0: port 1(veth9b9311c) entered disabled state [65194.792790] eth0: renamed from veth5d3475b [65194.832884] IPv6: ADDRCONF(NETDEV_CHANGE): veth9b9311c: link becomes ready [65194.832906] docker0: port 1(veth9b9311c) entered blocking state [65194.832907] docker0: port 1(veth9b9311c) entered forwarding state [65195.722684] veth5d3475b: renamed from eth0 [65195.792916] docker0: port 1(veth9b9311c) entered disabled state [65195.878049] docker0: port 1(veth9b9311c) entered disabled state [65195.878715] device veth9b9311c left promiscuous mode [65195.878732] docker0: port 1(veth9b9311c) entered disabled state [66182.663567] scsi 0:0:0:2: Direct-Access Msft Virtual Disk 1.0 PQ: 0 ANSI: 5 [66182.664300] sd 0:0:0:2: Attached scsi generic sg2 type 0 [66182.664893] sd 0:0:0:2: [sdc] 536870912 512-byte logical blocks: (275 GB/256 GiB) [66182.664894] sd 0:0:0:2: [sdc] 4096-byte physical blocks [66182.664973] sd 0:0:0:2: [sdc] Write Protect is off [66182.664975] sd 0:0:0:2: [sdc] Mode Sense: 0f 00 00 00 [66182.665154] sd 0:0:0:2: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [66182.666668] sd 0:0:0:2: [sdc] Attached SCSI disk [66182.683579] EXT4-fs (sdc): mounted filesystem with ordered data mode. Opts: discard,errors=remount-ro,data=ordered [66187.399545] EXT4-fs (sdc): mounted filesystem with ordered data mode. Opts: discard,errors=remount-ro,data=ordered [66187.410623] FS-Cache: Duplicate cookie detected [66187.410624] FS-Cache: O-cookie c=000000004e228525 [p=000000006f69fc41 fl=222 nc=0 na=1] [66187.410625] FS-Cache: O-cookie d=0000000077b88f2e n=0000000013e7c87d [66187.410625] FS-Cache: O-key=[10] '34333031353536303333' [66187.410628] FS-Cache: N-cookie c=000000005b00e07c [p=000000006f69fc41 fl=2 nc=0 na=1] [66187.410629] FS-Cache: N-cookie d=0000000077b88f2e n=00000000f3cfd1ce [66187.410629] FS-Cache: N-key=[10] '34333031353536303333' [66187.676655] hv_pci c689411f-e482-4a4b-b5ec-379303b0c4a9: PCI VMBus probing: Using version 0x10003 [66187.677922] 9pnet_virtio: no channels available for device drvfs [66187.677925] WARNING: mount: waiting for virtio device... [66187.718390] hv_pci c689411f-e482-4a4b-b5ec-379303b0c4a9: PCI host bridge to bus e482:00 [66187.718393] pci_bus e482:00: root bus resource [mem 0xbffe10000-0xbffe12fff window] [66187.719402] pci e482:00:00.0: [1af4:1049] type 00 class 0x010000 [66187.720467] pci e482:00:00.0: reg 0x10: [mem 0xbffe10000-0xbffe10fff 64bit] [66187.721103] pci e482:00:00.0: reg 0x18: [mem 0xbffe11000-0xbffe11fff 64bit] [66187.721739] pci e482:00:00.0: reg 0x20: [mem 0xbffe12000-0xbffe12fff 64bit] [66187.725644] pci e482:00:00.0: BAR 0: assigned [mem 0xbffe10000-0xbffe10fff 64bit] [66187.726097] pci e482:00:00.0: BAR 2: assigned [mem 0xbffe11000-0xbffe11fff 64bit] [66187.726550] pci e482:00:00.0: BAR 4: assigned [mem 0xbffe12000-0xbffe12fff 64bit] [66187.782347] hv_pci 5de0a50d-7985-4767-96bc-4a4a80b94674: PCI VMBus probing: Using version 0x10003 [66187.783556] 9pnet_virtio: no channels available for device drvfs [66187.783561] WARNING: mount: waiting for virtio device... [66187.823210] hv_pci 5de0a50d-7985-4767-96bc-4a4a80b94674: PCI host bridge to bus 7985:00 [66187.823212] pci_bus 7985:00: root bus resource [mem 0xbffe14000-0xbffe16fff window] [66187.824217] pci 7985:00:00.0: [1af4:1049] type 00 class 0x010000 [66187.825200] pci 7985:00:00.0: reg 0x10: [mem 0xbffe14000-0xbffe14fff 64bit] [66187.825847] pci 7985:00:00.0: reg 0x18: [mem 0xbffe15000-0xbffe15fff 64bit] [66187.826493] pci 7985:00:00.0: reg 0x20: [mem 0xbffe16000-0xbffe16fff 64bit] [66187.830371] pci 7985:00:00.0: BAR 0: assigned [mem 0xbffe14000-0xbffe14fff 64bit] [66187.830823] pci 7985:00:00.0: BAR 2: assigned [mem 0xbffe15000-0xbffe15fff 64bit] [66187.831276] pci 7985:00:00.0: BAR 4: assigned [mem 0xbffe16000-0xbffe16fff 64bit] [66187.887658] hv_pci 28ccd863-7f1b-48fb-a06c-14f1032961b1: PCI VMBus probing: Using version 0x10003 [66187.929043] hv_pci 28ccd863-7f1b-48fb-a06c-14f1032961b1: PCI host bridge to bus 7f1b:00 [66187.929046] pci_bus 7f1b:00: root bus resource [mem 0xbffe18000-0xbffe1afff window] [66187.930038] pci 7f1b:00:00.0: [1af4:1049] type 00 class 0x010000 [66187.930989] pci 7f1b:00:00.0: reg 0x10: [mem 0xbffe18000-0xbffe18fff 64bit] [66187.931624] pci 7f1b:00:00.0: reg 0x18: [mem 0xbffe19000-0xbffe19fff 64bit] [66187.932284] pci 7f1b:00:00.0: reg 0x20: [mem 0xbffe1a000-0xbffe1afff 64bit] [66187.936143] pci 7f1b:00:00.0: BAR 0: assigned [mem 0xbffe18000-0xbffe18fff 64bit] [66187.936627] pci 7f1b:00:00.0: BAR 2: assigned [mem 0xbffe19000-0xbffe19fff 64bit] [66187.937076] pci 7f1b:00:00.0: BAR 4: assigned [mem 0xbffe1a000-0xbffe1afff 64bit] [66977.281402] docker0: port 1(veth0d37bc8) entered blocking state [66977.281404] docker0: port 1(veth0d37bc8) entered disabled state [66977.281423] device veth0d37bc8 entered promiscuous mode [66977.281453] docker0: port 1(veth0d37bc8) entered blocking state [66977.281453] docker0: port 1(veth0d37bc8) entered forwarding state [66977.281748] docker0: port 1(veth0d37bc8) entered disabled state [66978.181803] docker0: port 1(veth0d37bc8) entered disabled state [66978.181906] device veth0d37bc8 left promiscuous mode [66978.181907] docker0: port 1(veth0d37bc8) entered disabled state [67557.114920] docker0: port 1(veth9c4371d) entered blocking state [67557.114921] docker0: port 1(veth9c4371d) entered disabled state [67557.114944] device veth9c4371d entered promiscuous mode [67557.652243] eth0: renamed from veth99c3a3f [67557.802389] IPv6: ADDRCONF(NETDEV_CHANGE): veth9c4371d: link becomes ready [67557.802412] docker0: port 1(veth9c4371d) entered blocking state [67557.802413] docker0: port 1(veth9c4371d) entered forwarding state [67558.185775] veth99c3a3f: renamed from eth0 [67558.302350] docker0: port 1(veth9c4371d) entered disabled state [67558.307904] docker0: port 1(veth9c4371d) entered disabled state [67558.308442] device veth9c4371d left promiscuous mode [67558.308444] docker0: port 1(veth9c4371d) entered disabled state [67593.210939] docker0: port 1(veth228dbe2) entered blocking state [67593.210940] docker0: port 1(veth228dbe2) entered disabled state [67593.210960] device veth228dbe2 entered promiscuous mode [67593.210992] docker0: port 1(veth228dbe2) entered blocking state [67593.210993] docker0: port 1(veth228dbe2) entered forwarding state [67593.211282] docker0: port 1(veth228dbe2) entered disabled state [67593.722096] eth0: renamed from veth20ca901 [67593.782236] IPv6: ADDRCONF(NETDEV_CHANGE): veth228dbe2: link becomes ready [67593.782257] docker0: port 1(veth228dbe2) entered blocking state [67593.782258] docker0: port 1(veth228dbe2) entered forwarding state [68424.882867] init: (195) ERROR: operator():211: shutdown failed 107 [68424.884817] init: (195) ERROR: operator():211: shutdown failed 107 [68424.886584] init: (195) ERROR: operator():211: shutdown failed 107 [68424.888302] init: (195) ERROR: operator():211: shutdown failed 107 [68424.890043] init: (195) ERROR: operator():211: shutdown failed 107 [68424.891745] init: (195) ERROR: operator():211: shutdown failed 107 [68424.893473] init: (195) ERROR: operator():211: shutdown failed 107 [68424.896066] init: (195) ERROR: operator():211: shutdown failed 107 [68424.898353] init: (195) ERROR: operator():211: shutdown failed 107 [68424.900144] init: (195) ERROR: operator():211: shutdown failed 107 [68599.829580] init: (195) ERROR: operator():211: shutdown failed 107 [68599.832116] init: (195) ERROR: operator():211: shutdown failed 107 [68599.834452] init: (195) ERROR: operator():211: shutdown failed 107 [68599.836492] init: (195) ERROR: operator():211: shutdown failed 107 [68599.838390] init: (195) ERROR: operator():211: shutdown failed 107 [68599.840152] init: (195) ERROR: operator():211: shutdown failed 107 [68599.841806] init: (195) ERROR: operator():211: shutdown failed 107 [68599.843637] init: (195) ERROR: operator():211: shutdown failed 107 [68599.845480] init: (195) ERROR: operator():211: shutdown failed 107 [68599.847897] init: (195) ERROR: operator():211: shutdown failed 107 #### Driver information from `nvidia-smi -a` ==============NVSMI LOG============== Timestamp : Mon Oct 4 21:42:55 2021 Driver Version : 510.06 CUDA Version : 11.6 Attached GPUs : 1 GPU 00000000:2D:00.0 Product Name : NVIDIA GeForce RTX 2080 Ti Product Brand : GeForce Product Architecture : Turing Display Mode : Enabled Display Active : Enabled Persistence Mode : Enabled MIG Mode Current : N/A Pending : N/A Accounting Mode : Disabled Accounting Mode Buffer Size : 4000 Driver Model Current : WDDM Pending : WDDM Serial Number : N/A GPU UUID : GPU-4949b172-957c-5479-5dc3-12e0ea688389 Minor Number : N/A VBIOS Version : 90.02.30.00.b7 MultiGPU Board : No Board ID : 0x2d00 GPU Part Number : N/A Module ID : 0 Inforom Version Image Version : G001.0000.02.04 OEM Object : 1.1 ECC Object : N/A Power Management Object : N/A GPU Operation Mode Current : N/A Pending : N/A GSP Firmware Version : N/A GPU Virtualization Mode Virtualization Mode : None Host VGPU Mode : N/A IBMNPU Relaxed Ordering Mode : N/A PCI Bus : 0x2D Device : 0x00 Domain : 0x0000 Device Id : 0x1E0410DE Bus Id : 00000000:2D:00.0 Sub System Id : 0x12AE10DE GPU Link Info PCIe Generation Max : 3 Current : 3 Link Width Max : 16x Current : 16x Bridge Chip Type : N/A Firmware : N/A Replays Since Reset : 0 Replay Number Rollovers : 0 Tx Throughput : 7000 KB/s Rx Throughput : 221000 KB/s Fan Speed : 0 % Performance State : P8 Clocks Throttle Reasons Idle : Active Applications Clocks Setting : Not Active SW Power Cap : Not Active HW Slowdown : Not Active HW Thermal Slowdown : Not Active HW Power Brake Slowdown : Not Active Sync Boost : Not Active SW Thermal Slowdown : Not Active Display Clock Setting : Not Active FB Memory Usage Total : 11264 MiB Used : 2840 MiB Free : 8424 MiB BAR1 Memory Usage Total : 256 MiB Used : 2 MiB Free : 254 MiB Compute Mode : Default Utilization Gpu : N/A Memory : N/A Encoder : 0 % Decoder : 0 % Encoder Stats Active Sessions : 0 Average FPS : 0 Average Latency : 0 FBC Stats Active Sessions : 0 Average FPS : 0 Average Latency : 0 Ecc Mode Current : N/A Pending : N/A ECC Errors Volatile SRAM Correctable : N/A SRAM Uncorrectable : N/A DRAM Correctable : N/A DRAM Uncorrectable : N/A Aggregate SRAM Correctable : N/A SRAM Uncorrectable : N/A DRAM Correctable : N/A DRAM Uncorrectable : N/A Retired Pages Single Bit ECC : N/A Double Bit ECC : N/A Pending Page Blacklist : N/A Remapped Rows : N/A Temperature GPU Current Temp : 47 C GPU Shutdown Temp : 94 C GPU Slowdown Temp : 91 C GPU Max Operating Temp : 89 C GPU Target Temperature : 84 C Memory Current Temp : N/A Memory Max Operating Temp : N/A Power Readings Power Management : Supported Power Draw : 20.30 W Power Limit : 250.00 W Default Power Limit : 250.00 W Enforced Power Limit : 250.00 W Min Power Limit : 100.00 W Max Power Limit : 280.00 W Clocks Graphics : 387 MHz SM : 387 MHz Memory : 403 MHz Video : 539 MHz Applications Clocks Graphics : N/A Memory : N/A Default Applications Clocks Graphics : N/A Memory : N/A Max Clocks Graphics : 2100 MHz SM : 2100 MHz Memory : 7000 MHz Video : 1950 MHz Max Customer Boost Clocks Graphics : N/A Clock Policy Auto Boost : N/A Auto Boost Default : N/A Voltage Graphics : N/A Processes : None #### Docker version from `docker version` Client: Docker Engine - Community Version: 20.10.8 API version: 1.41 Go version: go1.16.6 Git commit: 3967b7d Built: Fri Jul 30 19:54:27 2021 OS/Arch: linux/amd64 Context: default Experimental: true Server: Docker Engine - Community Engine: Version: 20.10.8 API version: 1.41 (minimum version 1.12) Go version: go1.16.6 Git commit: 75249d8 Built: Fri Jul 30 19:52:33 2021 OS/Arch: linux/amd64 Experimental: false containerd: Version: 1.4.10 GitCommit: 8848fdb7c4ae3815afcc990a8a99d663dda1b590 runc: Version: 1.0.2 GitCommit: v1.0.2-0-g52b36a2 docker-init: Version: 0.19.0 GitCommit: de40ad0 #### NVIDIA packages version from `dpkg -l '*nvidia*'` _or_ `rpm -qa '*nvidia*'` Desired=Unknown/Install/Remove/Purge/Hold | Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend |/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad) ||/ Name Version Architecture Description +++-==================================-==========================-============-================================================> un libgldispatch0-nvidia <none> <none> (no description available) un libnvidia-compute <none> <none> (no description available) ii libnvidia-compute-460-server:amd64 460.91.03-0ubuntu0.20.04.1 amd64 NVIDIA libcompute package ii libnvidia-container-tools 1.5.1-1 amd64 NVIDIA container runtime library (command-line t> ii libnvidia-container1:amd64 1.5.1-1 amd64 NVIDIA container runtime library ii libnvidia-ml-dev 10.1.243-3 amd64 NVIDIA Management Library (NVML) development fil> un libnvidia-ml.so.1 <none> <none> (no description available) un libnvidia-ml1 <none> <none> (no description available) un libnvidia-tesla-418-ml1 <none> <none> (no description available) un libnvidia-tesla-440-ml1 <none> <none> (no description available) un libnvidia-tesla-cuda1 <none> <none> (no description available) ii nvidia-container-runtime 3.5.0-1 amd64 NVIDIA container runtime un nvidia-container-runtime-hook <none> <none> (no description available) ii nvidia-container-toolkit 1.5.1-1 amd64 NVIDIA container runtime hook ii nvidia-cuda-dev 10.1.243-3 amd64 NVIDIA CUDA development files ii nvidia-cuda-doc 10.1.243-3 all NVIDIA CUDA and OpenCL documentation ii nvidia-cuda-gdb 10.1.243-3 amd64 NVIDIA CUDA Debugger (GDB) ii nvidia-cuda-toolkit 10.1.243-3 amd64 NVIDIA CUDA development toolkit un nvidia-docker <none> <none> (no description available) ii nvidia-docker2 2.6.0-1 all nvidia-docker CLI wrapper un nvidia-driver <none> <none> (no description available) un nvidia-legacy-304xx-vdpau-driver <none> <none> (no description available) un nvidia-legacy-340xx-vdpau-driver <none> <none> (no description available) un nvidia-libopencl1 <none> <none> (no description available) un nvidia-libopencl1-dev <none> <none> (no description available) ii nvidia-opencl-dev:amd64 10.1.243-3 amd64 NVIDIA OpenCL development files un nvidia-opencl-icd <none> <none> (no description available) ii nvidia-profiler 10.1.243-3 amd64 NVIDIA Profiler for CUDA and OpenCL un nvidia-tesla-418-driver <none> <none> (no description available) un nvidia-tesla-440-driver <none> <none> (no description available) un nvidia-vdpau-driver <none> <none> (no description available) ii nvidia-visual-profiler 10.1.243-3 amd64 NVIDIA Visual Profiler for CUDA and OpenCL #### NVIDIA container library version from `nvidia-container-cli -V` version: 1.5.1 build date: 2021-09-20T14:30+00:00 build revision: 4afad130c4c253abd3b2db563ffe9331594bda41 build compiler: gcc-5 5.4.0 20160609 build platform: x86_64 build flags: -D_GNU_SOURCE -D_FORTIFY_SOURCE=2 -DNDEBUG -std=gnu11 -O2 -g -fdata-sections -ffunction-sections -fstack-protector -fno-strict-aliasing -fvisibility=hidden -Wall -Wextra -Wcast-align -Wpointer-arith -Wmissing-prototypes -Wnonnull -Wwrite-strings -Wlogical-op -Wformat=2 -Wmissing-format-attribute -Winit-self -Wshadow -Wstrict-prototypes -Wunreachable-code -Wconversion -Wsign-conversion -Wno-unknown-warning-option -Wno-format-extra-args -Wno-gnu-alignof-expression -Wl,-zrelro -Wl,-znow -Wl,-zdefs -Wl,--gc-sections #### NVIDIA container library logs 2021/10/04 21:48:07 Using bundle directory: /var/run/docker/containerd/daemon/io.containerd.runtime.v2.task/moby/43c805d8ac1895dc62353aa47b2ac77b5a6eb2d7af3a1441658e55abc97fae27 2021/10/04 21:48:07 Using OCI specification file path: /var/run/docker/containerd/daemon/io.containerd.runtime.v2.task/moby/43c805d8ac1895dc62353aa47b2ac77b5a6eb2d7af3a1441658e55abc97fae27/config.json 2021/10/04 21:48:07 Looking for runtime binary 'docker-runc' 2021/10/04 21:48:07 Runtime binary 'docker-runc' not found: exec: "docker-runc": executable file not found in $PATH 2021/10/04 21:48:07 Looking for runtime binary 'runc' 2021/10/04 21:48:07 Found runtime binary '/bin/runc' 2021/10/04 21:48:07 Running nvidia-container-runtime 2021/10/04 21:48:07 'create' command detected; modification required 2021/10/04 21:48:07 prestart hook path: /bin/nvidia-container-runtime-hook 2021/10/04 21:48:07 existing nvidia prestart hook in OCI spec file 2021/10/04 21:48:07 Forwarding command to runtime

ckitchell · August 5, 2022, 1:16pm

Yes. I have been trying to WSL… is there another easier way… 22.03 and 22.07 see to be giving the same issues for me. I read online WSL is not a good choice.

ngeneva · August 5, 2022, 3:30pm

Depending on your needs you could try a bare metal installation. Most of the utilities in Modulus should work if PyTorch works on your system, but of course we encourage the docker image for consistency between our users development environment.

Alternatively you could look into a cloud based service.

tsltaywb · September 27, 2022, 9:04am

Hi all, I am on WSL and it’s working well up to 22.03.1.

I tried the solution given by ngeneva. I deleted the files, one by one when the error lib is given, eg:

docker: Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: mount error: file creation failed: /var/lib/docker/overlay2/a92497fde29f5e4a16659087de1978a2ff7cf59a53b410f240467c3aead3f609/merged/usr/lib/x86_64-linux-gnu/libnvcuvid.so.1: file exists: unknown.
ERRO[0000] error waiting for container: context canceled

So I deleted /usr/lib/x86_64-linux-gnu/libnvcuvid.so.1

In the end, after deleting approx 6 files, there’s no more error msg but modulus reports:

ERROR: No supported GPU(s) detected to run this container

I run “nvidia-smi” and it did reported my GPU:

±----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
±----------------------------------------------------------------------------+

However, running the example still gives error:

Error executing job with overrides:
Traceback (most recent call last):
File “helmholtz.py”, line 92, in run
slv.solve()
File “/modulus/modulus/solver/solver.py”, line 159, in solve
self._train_loop(sigterm_handler)
File “/modulus/modulus/trainer.py”, line 521, in _train_loop
loss, losses = self._cuda_graph_training_step(step)
File “/modulus/modulus/trainer.py”, line 694, in _cuda_graph_training_step
self.warmup_stream = torch.cuda.Stream()
File “/opt/conda/lib/python3.8/site-packages/torch/cuda/streams.py”, line 34, in new
return super(Stream, cls).new(cls, priority=priority, **kwargs)
RuntimeError: CUDA error: no CUDA-capable device is detected
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.

So the GPU still can’t work correctly.

Anyone has a solution?

Thanks!

ngeneva · September 27, 2022, 7:01pm

Hi @tsltaywb

I would start with just getting PyTorch working and making sure the GPU is visible to PyTorch prior to running Modulus.

>>> import torch
>>> torch.cuda.is_available() # Should be true
>>> torch.cuda.device_count() # Should be 1
>>> torch.cuda.current_device()
>>> torch.cuda.device(0)
>>> torch.cuda.get_device_name(0)

Once just PyTorch works with the GPU then Modulus should function.

tsltaywb · September 28, 2022, 3:30am

Hi ngeneva,

Well, the problem is that w/o deleting libnvidia* and libcuda* files, I can’t enter the docker modulus environment. But if I entered after deleting these files, the GPU can’t work:
torch.cuda.is_available() will be false

Btw, 22.03.1 is working. I realise that in the docker modulus dir, there’s modulus and external dir. Can I overwrite them with the newer 22.09 ? Will I get the new features of 22.09 after I did this?

Thanks.

ngeneva · September 29, 2022, 3:55pm

Hi @tsltaywb

In theory yes you could do that with the modulus folder which should allow most PyTorch related features should function. The external folder is for the 2 external dependencies of Modulus (pysdf and tinycudann). I would be careful copying these over because these are compiled during build for the docker image. Could be worth a try if you want pysdf functionality.

I’ve seen you’ve figured out some work around with the 22.08 pytorch container, may also want to try some hacking with that method. Alternatively you could comment out the PySDF items in the docker file in the main repo and build your image with the same folder structure as the one we ship. Then try bringing over a pre-compiled PySDF files from the 22.09 container.

Thanks for updating the forums with your solutions for others!

Topic		Replies	Views
Problem using modulus 22.07 in WSL2 Technical Support (PhysicsNeMo Only)	3	2664	September 29, 2022
Dataset_convert tool is running properly but the TFrecords aren't getting created in output folder TAO Toolkit	32	1608	May 10, 2022
Modulus 22.07 Container version for Linux issue Report a Bug (PhysicsNeMo Only)	10	2190	August 5, 2022
RHEL8.5 nvidia-container-toolkit hook does not start CUDA Setup and Installation	0	1951	August 29, 2022
Stderr: nvidia-container-cli: initialization error: driver error: failed to process request\\\\n\\\"\"": unknown CUDA on Windows Subsystem for Linux	35	37797	August 21, 2023
Nvidia no devices were found on Ubuntu 20.04.2 LTS (GL552VW) Linux ubuntu	1	2558	March 2, 2023
WSL Modulus Docker run error (libnvidia-ml.so.1: file exists: unknown.) Technical Support (PhysicsNeMo Only)	7	6226	June 12, 2023
Creating Containers Using nvidia-docker with AGX Xavier Jetson AGX Xavier docker	8	2757	October 18, 2021
[BUG] target-docker-container running cuda-samples require unintended extra permission DRIVE AGX Orin General docker	12	1518	May 30, 2023
Nvidia-container-cli: initialization error: load library failed: libnvidia-ml.so.1: cannot open shared object file: no such file or directory Amazon Web Services (AWS) isaacsim	4	16695	April 18, 2024

Error running 22.07 container with examples - Failed to create shim task

Related topics