Hello!
My configuration is as follows: ThinkPad P51 with Ubuntu 18.04 with hybrid graphics (Quadro M2200) which is connected to two external monitors, and an external GPU (GTX 980ti) on an AKiTiO Thunder 2 Box. I disabled hybrid graphics from the BIOS and set it to discrete only because I’ve heard having hybrid on Linux with an eGPU creates a lot of problems. This doesn’t mean that I want to stick with this option, if we manage to get the hybrid setting and the eGPU work together, then great.
At first I wanted the eGPU to control both the two external monitors and the Quadro to control the built-in laptop’s monitor, and also to use the eGPU for additional CUDA applicaitons, but unfortunately this was too hard to accomplish in Linux (none of the displays would show anything after boot), so I connected all monitors directly to the laptop and left the eGPU connected through the Thunderbolt.
I installed the 390.48 driver and also CUDA 9.1 (the version for Ubuntu 17.10, and only cuda-toolkit-9-1, cuda-libraries-dev-9-1 and cuda-libraries-9-1, so no additional driver CUDA is bundled with) and cuDNN 7.1.
The CUDA samples run ok by default, and that is most probably because it chooses the Quadro GPU. However, when I try using the other eGPU with a command like this:
CUDA_VISIBLE_DEVICES=0 ./volumeFiltering
I get the following error:
CUDA error at volume.cpp:24 code=46(cudaErrorDevicesUnavailable) "cudaMalloc3DArray(&vol->content, &vol->channelDesc, dataSize, allowStore ? cudaArraySurfaceLoadStore : 0)"
What is the problem and how can I enable the eGPU?
Further questions:
- Is it possible to use the eGPU for CUDA applications and for the two external monitors?
- Also, is it possible to use the hybrid setting from BIOS and have a working environment?
nvidia-smi:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.48 Driver Version: 390.48 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Quadro M2200 Off | 00000000:01:00.0 On | N/A |
| N/A 55C P0 N/A / N/A | 1628MiB / 4035MiB | 3% Default |
+-------------------------------+----------------------+----------------------+
| 1 GeForce GTX 980 Ti Off | 00000000:0A:00.0 Off | N/A |
| 0% 58C P8 21W / 275W | 1MiB / 6083MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
deviceQuery:
./deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 2 CUDA Capable device(s)
Device 0: "GeForce GTX 980 Ti"
CUDA Driver Version / Runtime Version 9.1 / 9.1
CUDA Capability Major/Minor version number: 5.2
Total amount of global memory: 6084 MBytes (6379470848 bytes)
(22) Multiprocessors, (128) CUDA Cores/MP: 2816 CUDA Cores
GPU Max Clock rate: 1291 MHz (1.29 GHz)
Memory Clock rate: 3505 Mhz
Memory Bus Width: 384-bit
L2 Cache Size: 3145728 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 2 copy engine(s)
Run time limit on kernels: No
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Supports Cooperative Kernel Launch: No
Supports MultiDevice Co-op Kernel Launch: No
Device PCI Domain ID / Bus ID / location ID: 0 / 10 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
Device 1: "Quadro M2200"
CUDA Driver Version / Runtime Version 9.1 / 9.1
CUDA Capability Major/Minor version number: 5.2
Total amount of global memory: 4035 MBytes (4231331840 bytes)
( 8) Multiprocessors, (128) CUDA Cores/MP: 1024 CUDA Cores
GPU Max Clock rate: 1036 MHz (1.04 GHz)
Memory Clock rate: 2754 Mhz
Memory Bus Width: 128-bit
L2 Cache Size: 1048576 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 2 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Supports Cooperative Kernel Launch: No
Supports MultiDevice Co-op Kernel Launch: No
Device PCI Domain ID / Bus ID / location ID: 0 / 1 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
> Peer access from GeForce GTX 980 Ti (GPU0) -> Quadro M2200 (GPU1) : No
> Peer access from Quadro M2200 (GPU1) -> GeForce GTX 980 Ti (GPU0) : No
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 9.1, CUDA Runtime Version = 9.1, NumDevs = 2
Result = PASS