Ubuntu 18.04 with hybrid GPU (Quadro M2200) and eGPU (GTX 980ti)

Hello!

My configuration is as follows: ThinkPad P51 with Ubuntu 18.04 with hybrid graphics (Quadro M2200) which is connected to two external monitors, and an external GPU (GTX 980ti) on an AKiTiO Thunder 2 Box. I disabled hybrid graphics from the BIOS and set it to discrete only because I’ve heard having hybrid on Linux with an eGPU creates a lot of problems. This doesn’t mean that I want to stick with this option, if we manage to get the hybrid setting and the eGPU work together, then great.

At first I wanted the eGPU to control both the two external monitors and the Quadro to control the built-in laptop’s monitor, and also to use the eGPU for additional CUDA applicaitons, but unfortunately this was too hard to accomplish in Linux (none of the displays would show anything after boot), so I connected all monitors directly to the laptop and left the eGPU connected through the Thunderbolt.

I installed the 390.48 driver and also CUDA 9.1 (the version for Ubuntu 17.10, and only cuda-toolkit-9-1, cuda-libraries-dev-9-1 and cuda-libraries-9-1, so no additional driver CUDA is bundled with) and cuDNN 7.1.

The CUDA samples run ok by default, and that is most probably because it chooses the Quadro GPU. However, when I try using the other eGPU with a command like this:

CUDA_VISIBLE_DEVICES=0 ./volumeFiltering

I get the following error:

CUDA error at volume.cpp:24 code=46(cudaErrorDevicesUnavailable) "cudaMalloc3DArray(&vol->content, &vol->channelDesc, dataSize, allowStore ? cudaArraySurfaceLoadStore : 0)"

What is the problem and how can I enable the eGPU?

Further questions:

  • Is it possible to use the eGPU for CUDA applications and for the two external monitors?
  • Also, is it possible to use the hybrid setting from BIOS and have a working environment?

nvidia-smi:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.48                 Driver Version: 390.48                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Quadro M2200        Off  | 00000000:01:00.0  On |                  N/A |
| N/A   55C    P0    N/A /  N/A |   1628MiB /  4035MiB |      3%      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX 980 Ti  Off  | 00000000:0A:00.0 Off |                  N/A |
|  0%   58C    P8    21W / 275W |      1MiB /  6083MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

deviceQuery:

./deviceQuery Starting...

CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 2 CUDA Capable device(s)

Device 0: "GeForce GTX 980 Ti"
CUDA Driver Version / Runtime Version 9.1 / 9.1
CUDA Capability Major/Minor version number: 5.2
Total amount of global memory: 6084 MBytes (6379470848 bytes)
(22) Multiprocessors, (128) CUDA Cores/MP: 2816 CUDA Cores
GPU Max Clock rate: 1291 MHz (1.29 GHz)
Memory Clock rate: 3505 Mhz
Memory Bus Width: 384-bit
L2 Cache Size: 3145728 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 2 copy engine(s)
Run time limit on kernels: No
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Supports Cooperative Kernel Launch: No
Supports MultiDevice Co-op Kernel Launch: No
Device PCI Domain ID / Bus ID / location ID: 0 / 10 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

Device 1: "Quadro M2200"
CUDA Driver Version / Runtime Version 9.1 / 9.1
CUDA Capability Major/Minor version number: 5.2
Total amount of global memory: 4035 MBytes (4231331840 bytes)
( 8) Multiprocessors, (128) CUDA Cores/MP: 1024 CUDA Cores
GPU Max Clock rate: 1036 MHz (1.04 GHz)
Memory Clock rate: 2754 Mhz
Memory Bus Width: 128-bit
L2 Cache Size: 1048576 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 2 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Supports Cooperative Kernel Launch: No
Supports MultiDevice Co-op Kernel Launch: No
Device PCI Domain ID / Bus ID / location ID: 0 / 1 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
> Peer access from GeForce GTX 980 Ti (GPU0) -> Quadro M2200 (GPU1) : No
> Peer access from Quadro M2200 (GPU1) -> GeForce GTX 980 Ti (GPU0) : No

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 9.1, CUDA Runtime Version = 9.1, NumDevs = 2
Result = PASS

this particular problem is arising because the OpenGL context is on one GPU (your laptop quadro) and the CUDA context is on another (one of your other GPUs), and the application uses CUDA/OpenGL interop but is expecting both contexts to be on the same GPU.

If you search around on these forums you’ll find other writeups of the same issue and some solution methods. I would just assume that you should run these kinds of apps on your laptop quadro and leave it at that. You already seem to know how to do that.

If everything else is working, most of the CUDA samples should run without error e.g.

CUDA_VISIBLE_DEVICES = “0” /path/to/vectorAdd