Multiple NVIDIA RTX GPU for Cuda (arch linux) with EGPU

andres27 · January 24, 2024, 4:52pm

I’ve got an arch linux, with two GPU in the laptop (thinkpad P14s Gen 4) + a new RTX 3090 plugged via thunderbolt 4 with the Cool Master EG200 GPU enclosure:

❯ lspci -k | grep -A 2 -E "(VGA|3D)"
00:02.0 VGA compatible controller: Intel Corporation Raptor Lake-P [Iris Xe Graphics] (rev 04)
        Subsystem: Lenovo Raptor Lake-P [Iris Xe Graphics]
        Kernel driver in use: i915
--
03:00.0 3D controller: NVIDIA Corporation GA107GLM [RTX A500 Laptop GPU] (rev a1)
        Subsystem: Lenovo GA107GLM [RTX A500 Laptop GPU]
        Kernel driver in use: nvidia
--
22:00.0 VGA compatible controller: NVIDIA Corporation GA102 [GeForce RTX 3090] (rev a1)
        Subsystem: Gigabyte Technology Co., Ltd GA102 [GeForce RTX 3090]
        Kernel driver in use: nvidia

The thunderbolt connection to the RTX 3090 is authorized as you can see here:

❯ sudo boltctl info c4010000-0070-740e-0362-00168691c921
[sudo] password for aemonge: 
 ● Cooler Master Technology,Inc MasterCase EG200
   ├─ type:          peripheral
   ├─ name:          MasterCase EG200
   ├─ vendor:        Cooler Master Technology,Inc
   ├─ uuid:          c4010000-0070-740e-0362-00168691c921
   ├─ dbus path:     /org/freedesktop/bolt/devices/c4010000_0070_740e_0362_00168691c921
   ├─ generation:    Thunderbolt 3
   ├─ status:        authorized
   │  ├─ domain:     69078780-60ab-fe2a-ffff-ffffffffffff
   │  ├─ parent:     69078780-60ab-fe2a-ffff-ffffffffffff
   │  ├─ syspath:    /sys/devices/pci0000:00/0000:00:0d.2/domain0/0-0/0-1
   │  ├─ rx speed:   40 Gb/s = 2 lanes * 20 Gb/s
   │  ├─ tx speed:   40 Gb/s = 2 lanes * 20 Gb/s
   │  └─ authflags:  boot
   ├─ authorized:    Wed 24 Jan 2024 06:49:10 AM UTC
   ├─ connected:     Wed 24 Jan 2024 06:49:10 AM UTC
   └─ stored:        Tue 23 Jan 2024 03:50:50 PM UTC
      ├─ policy:     iommu
      └─ key:        no

I really don’t care for the graphics, nor the RTX3090 to be loaded in the xorg nor the graphical interface. I just want it to be used as compute only workloads, and I have followed thouroly this arch wiki External GPU - ArchWiki

But givien that context, my nvidia-smi can’t seam to find the GPU:

❯ nvidia-smi -L
GPU 0: NVIDIA RTX A500 Laptop GPU (UUID: GPU-762410c2-1c0d-ef4a-89ac-91afd926381b)

Nor can a simple python script, cuda-devices.py:

❯ cat cuda-devics.py
import torch

# Check if CUDA is available
if torch.cuda.is_available():
    print("CUDA is available.")
    # Get the number of CUDA devices
    num_devices = torch.cuda.device_count()
    print(f"Number of CUDA devices: {num_devices}")
    # Get the name of each CUDA device
    for i in range(num_devices):
        print(f"Device {i} name: {torch.cuda.get_device_name(i)}")
else:
    print("CUDA is not available.")
❯ python cuda-devics.py
CUDA is available.
Number of CUDA devices: 1
Device 0 name: NVIDIA RTX A500 Laptop GPU

❯ CUDA_VISIBLE_DEVICES="0,1,2" python cuda-devics.py

CUDA is available.
Number of CUDA devices: 1
Device 0 name: NVIDIA RTX A500 Laptop GPU

I have also tried with these three repositories GitHub - ewagner12/all-ways-egpu: Configure eGPU as primary under Linux Wayland desktops , GitHub - karli-sjoberg/gswitch and GitHub - hertg/egpu-switcher: 🖥🐧 Setup script for eGPUs in Linux (X.Org). To disable the internal GPU’s A500 and Iris Xe but it’s blaking (black screen).

generix · January 25, 2024, 9:28am

Please run nvidia-bug-report.sh as root and attach the resulting nvidia-bug-report.log.gz file to your post.

andres27 · January 25, 2024, 1:00pm

Hi @generix , this is the result. Thanks in advance

nvidia-bug-report.log.gz (463.3 KB)

generix · January 25, 2024, 1:47pm

RmInitAdapter failed! (0x26:0x56:1482)
Same as this:
https://forums.developer.nvidia.com/t/dont-enter-graphical-interface-after-installing-driver-on-ubuntu20-04/280097/2

andres27 · January 25, 2024, 1:52pm

Bios updates, are updated:

❯ sudo fwupdmgr get-updates
[sudo] password for aemonge: 
Devices with no available firmware updates: 
 • UEFI Device Firmware
 • Fingerprint Sensor
 • Integrated Camera
 • ThinkPad Universal ThunderBolt 4 Dock
 • USB3.0 Hub
 • USB4 Retimer
 • VMM6212
Devices with the latest available firmware version:
 • KXG8AZNV1T02 LA KIOXIA
 • ThinkPad Thunderbolt 4 Dock
No updates available

What do you mean by -open driver version? do you mean extra/nvidia-open 545.29.06-12 ?

generix · January 25, 2024, 1:53pm

exactly

andres27 · January 25, 2024, 2:01pm

Thanks!

It’s one step ahead. Now nvidia-smi -L does recognize the GPU !

❯ nvidia-smi -L
GPU 0: NVIDIA RTX A500 Laptop GPU (UUID: GPU-762410c2-1c0d-ef4a-89ac-91afd926381b)
GPU 1: NVIDIA GeForce RTX 3090 (UUID: GPU-9560a6c8-9dd9-59e3-70d7-05b9cb6bc495)

My python cuda script doesn’t though:

❯ python cuda-devics.py
CUDA is available.
Number of CUDA devices: 1
Device 0 name: NVIDIA GeForce RTX 3090

Only sees the stronger one.

But as far as nvidia support, I’m really happy !

Thanks a lot @generix !

generix · January 25, 2024, 3:01pm

Is CUDA_VISIBLE_DEVICES set to exclude one gpu?

andres27 · January 25, 2024, 3:05pm

Oups!

I didn in fact had set up in my ~/.profile the export CUDA_VISIBLE_DEVICES=0. I’ve removed it, and all is golden :)

Fixed:

❯ echo $CUDA_VISIBLE_DEVICES
0,1,2

❯ python cuda-devics.py
CUDA is available.
Number of CUDA devices: 2
Device 0 name: NVIDIA GeForce RTX 3090
Device 1 name: NVIDIA RTX A500 Laptop GPU
  id  load    free memory    used memory    total memory    temperature
----  ------  -------------  -------------  --------------  -------------
   0  0.0%    371.0MB        3325.0MB       4096.0MB        56.0C
   1  0.0%    4583.0MB       19464.0MB      24576.0MB       40.0C

Topic		Replies	Views
eGPU is not recognized by nvidia-smi in a Nvidia optimus setting Linux cuda	23	2211	March 27, 2023
Nvidia-settings gives errors 3090ti egpu dell laptop Ubuntu Linux ubuntu	8	1254	August 15, 2022
Ubuntu 9.04 - Cuda 2.3 - no device supporting CUDA SLI GTX cards are not recognized by cuda runtime CUDA Programming and Performance	14	22026	October 23, 2009
Linux server unable to recognize GPU Linux	52	3139	July 2, 2021
GPU not detected Ubuntu Linux	35	100055	December 14, 2023
NVidia driver 520.61.05 / Cuda 11.8 / RTX 3090 = black display and superslow modesets Linux cuda , ubuntu	21	24584	December 6, 2022
RTX3090 not detected with nvidia-smi under Ubuntu 20.04.1 with kernel 5.11.0-27-generic Linux	0	1920	September 6, 2021
Getting CUDA to run with ATI primary graphics CUDA Programming and Performance	10	8214	December 13, 2007
RTX 3090 + NVLink + CUDA P2P - not working on Linux or Windows, in different ways? CUDA Programming and Performance	9	7464	May 24, 2023
Not all cuda devices detected in cuda fortran Windows 10 Legacy PGI Compilers	11	21992	November 19, 2018

Multiple NVIDIA RTX GPU for Cuda (arch linux) with EGPU

Related topics