System:
Alienware 15R3 laptop with an internal GTX1070 and another GTX1070 in the Alienware Graphics Amplifier
Operating System: Unbuntu 16.04.3
uname -a
Linux joerg-Alienware-15-R3 4.4.0-116-generic #140-Ubuntu SMP Mon Feb 12 21:23:04 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
gcc --version
gcc (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609
Both cards are recognized on the bus:
lspci | grep -i nvidia
01:00.0 VGA compatible controller: NVIDIA Corporation Device 1be1 (rev a1)
02:00.0 VGA compatible controller: NVIDIA Corporation Device 1b81 (rev a1)
02:00.1 Audio device: NVIDIA Corporation Device 10f0 (rev a1)
Driver 390.48 installed via runfile. The driver gets loaded (I have X11 running well on the internal card)
dmesg | grep NVRM
[ 3.923123] NVRM: loading NVIDIA UNIX x86_64 Kernel Module 390.48 Thu Mar 22 00:42:57 PDT 2018 (using threaded interrupts)
[ 7.202809] NVRM: nv_acpi_wmmx_method: WMMX data invalid.
Any nouveau drivers are disabled:
lsmod | grep nouveau
nvidia-smi shows both cards:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.48 Driver Version: 390.48 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 1070 Off | 00000000:01:00.0 On | N/A |
| N/A 47C P0 34W / N/A | 303MiB / 8114MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 GeForce GTX 1070 Off | 00000000:02:00.0 Off | N/A |
| 0% 27C P8 4W / 151W | 2MiB / 8119MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1266 G /usr/lib/xorg/Xorg 141MiB |
| 0 2534 G compiz 130MiB |
| 0 3227 G ...-token=F2DC3ED385E29B00ED37BCC950C830B5 29MiB |
| 0 3345 G /usr/bin/nvidia-settings 0MiB |
+-----------------------------------------------------------------------------
nvidia-settings also shows both cards and provides all relevant information on those cards (power mode, frequencies, temps, etc.; can provide screenshot if needed)
CUDA 9.0.176 installed via runfile.
When running the deviceQuery sample, it fails at initialization:
./deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
cudaGetDeviceCount returned 3
-> initialization error
Result = FAIL
And the error code (3) seems to indicate an initialization error (“The API call failed because the CUDA driver and runtime could not be initialized.”)
Now for the weird stuff: When I set only one GPU to be visible, CUDA will initialize the other and work properly:
export CUDA_VISIBLE_DEVICES=0
./deviceQuery
./deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 1 CUDA Capable device(s)
Device 0: "GeForce GTX 1070"
CUDA Driver Version / Runtime Version 9.1 / 9.0
CUDA Capability Major/Minor version number: 6.1
Total amount of global memory: 8114 MBytes (8508604416 bytes)
(16) Multiprocessors, (128) CUDA Cores/MP: 2048 CUDA Cores
GPU Max Clock rate: 1695 MHz (1.70 GHz)
Memory Clock rate: 4004 Mhz
Memory Bus Width: 256-bit
L2 Cache Size: 2097152 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
Maximum Layered 1D Texture Size, (num) layers 1D=(32768), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(32768, 32768), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 2 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Supports Cooperative Kernel Launch: Yes
Supports MultiDevice Co-op Kernel Launch: Yes
Device PCI Domain ID / Bus ID / location ID: 0 / 1 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 9.1, CUDA Runtime Version = 9.0, NumDevs = 1
Result = PASS
and the other one:
export CUDA_VISIBLE_DEVICES=1
./deviceQuery
./deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 1 CUDA Capable device(s)
Device 0: "GeForce GTX 1070"
CUDA Driver Version / Runtime Version 9.1 / 9.0
CUDA Capability Major/Minor version number: 6.1
Total amount of global memory: 8120 MBytes (8513978368 bytes)
(15) Multiprocessors, (128) CUDA Cores/MP: 1920 CUDA Cores
GPU Max Clock rate: 1785 MHz (1.78 GHz)
Memory Clock rate: 4004 Mhz
Memory Bus Width: 256-bit
L2 Cache Size: 2097152 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
Maximum Layered 1D Texture Size, (num) layers 1D=(32768), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(32768, 32768), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 2 copy engine(s)
Run time limit on kernels: No
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Supports Cooperative Kernel Launch: Yes
Supports MultiDevice Co-op Kernel Launch: Yes
Device PCI Domain ID / Bus ID / location ID: 0 / 2 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 9.1, CUDA Runtime Version = 9.0, NumDevs = 1
Result = PASS
I can run Tensorflow just fine by masking one or the other:
2018-04-17 22:34:03.281079: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:895] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-04-17 22:34:03.281876: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1105] Found device 0 with properties:
name: GeForce GTX 1070 major: 6 minor: 1 memoryClockRate(GHz): 1.7845
pciBusID: 0000:02:00.0
totalMemory: 7.93GiB freeMemory: 7.83GiB
2018-04-17 22:34:03.282280: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1195] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: GeForce GTX 1070, pci bus id: 0000:02:00.0, compute capability: 6.1)
Setting both to be visible is a no go:
export CUDA_VISIBLE_DEVICES="0, 1"
./deviceQuery
./deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
cudaGetDeviceCount returned 3
-> initialization error
I’d really like to use both :-)
Any thoughts on this? I’m pretty certain I followed all the steps in the cuda install guide (and each card works individually!). Happy to try any suggestions or provide further info!
Many thanks in advance,
Joerg
p.s. Things I’ve tried:
- Rebooted (too often to count)
- Uninstalled/Reinstalled NVIDIA drivers and CUDA from scratch (also multiple times; what I’ve reported above is from a clean install of all NVIDIA software components)
Things I can’t (quickly) try:
- I don’t have another operating system available on this machine.
- I don’t have another NVIDIA card available to see if this might be hardware issue with the specific card in the Alienware Graphics Amplifier.