Would have had this up sooner but ran into this gem :
https://devtalk.nvidia.com/default/topic/878117/-solved-titan-x-for-cuda-7-5-login-loop-error-ubuntu-14-04-/
Hardware :
Mobo : Gigabyte z87x-ud5h (16x PCI-E 3.0)
CPU : Intel Core i7-4770 CPU @ 3.40 GHz (Max # of PCI Express Lanes 16)
RAM : Corsair Vengeance 32GB (4x8GB) DDR3 1600 MHz (PC3 12800) Desktop Memory
GPU : MSI GeForce GTX 1070 DirectX 12 GTX 1070 GAMING X 8G 8GB (Display and CUDA card)
Software :
- OS : Ubuntu 16.04 (Fresh install)
- Driver : UNIX x86_64 Kernel Module 367.35
- CUDA :
nvcc -V
nvcc: NVIDIA ® Cuda compiler driver
Copyright © 2005-2016 NVIDIA Corporation
Built on Wed_May__4_21:01:56_CDT_2016
Cuda compilation tools, release 8.0, V8.0.26
Max observed card clocks during test :
nvidia x-server settings
Real-time graphics clock : 1950 Mhz
Memory transfer rate : 8012Mhz
> nvidia-smi
Mon Aug 8 02:17:03 2016
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 367.35 Driver Version: 367.35 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 1070 Off | 0000:01:00.0 On | N/A |
| 0% 50C P8 11W / 230W | 330MiB / 8112MiB | 6% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 10644 G /usr/lib/xorg/Xorg 184MiB |
| 0 11072 G compiz 143MiB |
+-----------------------------------------------------------------------------+
> Command : ./deviceQuery
./deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 1 CUDA Capable device(s)
Device 0: "GeForce GTX 1070"
CUDA Driver Version / Runtime Version 8.0 / 8.0
CUDA Capability Major/Minor version number: 6.1
Total amount of global memory: 8112 MBytes (8506179584 bytes)
(15) Multiprocessors, (128) CUDA Cores/MP: 1920 CUDA Cores
GPU Max Clock rate: 1772 MHz (1.77 GHz)
Memory Clock rate: 4004 Mhz
Memory Bus Width: 256-bit
L2 Cache Size: 2097152 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
Maximum Layered 1D Texture Size, (num) layers 1D=(32768), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(32768, 32768), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 2 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Device PCI Domain ID / Bus ID / location ID: 0 / 1 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 8.0, CUDA Runtime Version = 8.0, NumDevs = 1, Device0 = GeForce GTX 1070
Result = PASS
Question :
Why is it that the old TitanX/980ti show (Run time limit on kernels: No) and the current
1070 shows it as (Run time limit on kernel: Yes)? Is this driver\software\ or hardware?
In the case that I intend to run a kernel for a week on the GPU, will this be a problem on the 1070?
Essentially, what is going on here? what is the consequence of a run-time limit?
Can it be bypassed via a command/setting?
Why the difference between the 1070 and the older 980ti/TitanX?
On to the bandwidth tests…
./bandwidthTest
[CUDA Bandwidth Test] - Starting...
Running on...
Device 0: GeForce GTX 1070
Quick Mode
Host to Device Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 11766.6
Device to Host Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 12463.2
Device to Device Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 191674.2
Result = PASS
NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.
./bandwidthTest --dtod --mode=range --start=1073741824 --end=1073741824 --increment=1
[CUDA Bandwidth Test] - Starting...
Running on...
Device 0: GeForce GTX 1070
Range Mode
Device to Device Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(MB/s)
1073741824 190531.1
Result = PASS
NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.
./probe_bw 0 256 5120 256 32
GeForce GTX 1070 : 15 SM : 8112 MB
Probing from: 256 - 5120 MB ...
alloc MB, probe MB, msecs, GB/s
256, 14336, 70.79, 197.76
512, 14336, 74.86, 187.01
768, 14336, 74.61, 187.64
1024, 14336, 73.59, 190.25
1280, 14336, 73.99, 189.21
1536, 14336, 74.22, 188.62
1792, 14336, 73.70, 189.96
2048, 14336, 73.97, 189.27
2304, 14336, 73.77, 189.77
2560, 14336, 72.71, 192.55
2816, 14336, 72.73, 192.49
3072, 14336, 73.06, 191.63
3328, 14336, 73.66, 190.05
3584, 14336, 73.24, 191.16
3840, 14336, 73.42, 190.68
4096, 14336, 73.03, 191.71
4352, 14336, 73.03, 191.71
4608, 14336, 74.43, 188.11
4864, 14336, 77.84, 179.87
5120, 14336, 79.90, 175.22
./probe_bw 0 256 5120 256 128
GeForce GTX 1070 : 15 SM : 8112 MB
Probing from: 256 - 5120 MB ...
alloc MB, probe MB, msecs, GB/s
256, 57344, 280.18, 199.87
512, 57344, 286.17, 195.69
768, 57344, 286.13, 195.71
1024, 57344, 287.81, 194.57
1280, 57344, 286.85, 195.22
1536, 57344, 287.30, 194.92
1792, 57344, 287.07, 195.07
2048, 57344, 287.30, 194.92
2304, 57344, 286.65, 195.36
2560, 57344, 287.80, 194.58
2816, 57344, 286.75, 195.29
3072, 57344, 287.25, 194.95
3328, 57344, 286.86, 195.22
3584, 57344, 287.44, 194.82
3840, 57344, 286.53, 195.45
4096, 57344, 286.05, 195.77
4352, 57344, 285.10, 196.42
4608, 57344, 294.86, 189.92
4864, 57344, 305.74, 183.16
5120, 57344, 317.00, 176.66
./probe_bw 0 8 256 8 32
GeForce GTX 1070 : 15 SM : 8112 MB
Probing from: 8 - 256 MB ...
alloc MB, probe MB, msecs, GB/s
8, 14336, 45.40, 308.36
16, 14336, 57.90, 241.78
24, 14336, 62.71, 223.25
32, 14336, 65.94, 212.32
40, 14336, 65.90, 212.44
48, 14336, 67.15, 208.49
56, 14336, 68.38, 204.75
64, 14336, 68.45, 204.54
72, 14336, 69.17, 202.40
80, 14336, 69.92, 200.21
88, 14336, 69.22, 202.26
96, 14336, 69.62, 201.10
104, 14336, 70.44, 198.75
112, 14336, 70.73, 197.93
120, 14336, 69.85, 200.43
128, 14336, 70.50, 198.57
136, 14336, 71.57, 195.62
144, 14336, 72.13, 194.08
152, 14336, 70.73, 197.94
160, 14336, 71.00, 197.18
168, 14336, 71.70, 195.26
176, 14336, 71.02, 197.13
184, 14336, 71.29, 196.37
192, 14336, 71.90, 194.71
200, 14336, 72.11, 194.14
208, 14336, 70.62, 198.25
216, 14336, 71.89, 194.75
224, 14336, 72.18, 193.97
232, 14336, 71.93, 194.64
240, 14336, 71.39, 196.12
248, 14336, 71.66, 195.37
256, 14336, 72.67, 192.65
./probe_bw 0 8 256 8 128
GeForce GTX 1070 : 15 SM : 8112 MB
Probing from: 8 - 256 MB ...
alloc MB, probe MB, msecs, GB/s
8, 57344, 178.35, 313.99
16, 57344, 227.36, 246.31
24, 57344, 245.75, 227.87
32, 57344, 253.75, 220.69
40, 57344, 260.07, 215.33
48, 57344, 262.42, 213.40
56, 57344, 267.58, 209.28
64, 57344, 268.05, 208.92
72, 57344, 269.32, 207.93
80, 57344, 272.32, 205.64
88, 57344, 272.80, 205.28
96, 57344, 274.32, 204.14
104, 57344, 274.19, 204.24
112, 57344, 276.57, 202.48
120, 57344, 275.87, 203.00
128, 57344, 277.53, 201.78
136, 57344, 277.51, 201.80
144, 57344, 279.49, 200.37
152, 57344, 277.43, 201.86
160, 57344, 278.94, 200.76
168, 57344, 280.26, 199.81
176, 57344, 280.23, 199.83
184, 57344, 279.81, 200.13
192, 57344, 279.75, 200.18
200, 57344, 280.22, 199.84
208, 57344, 280.03, 199.98
216, 57344, 281.29, 199.08
224, 57344, 282.30, 198.37
232, 57344, 282.02, 198.57
240, 57344, 281.46, 198.97
248, 57344, 281.34, 199.05
256, 57344, 281.95, 198.62
For kicks, changed the powermizer setting to maximum performance at the end and re-ran the tests.
The only notable change was in the Bandwidth Test host-to-device value (11766.6 MB/s to 12399.4 MB/s) :
./bandwidthTest
[CUDA Bandwidth Test] - Starting...
Running on...
Device 0: GeForce GTX 1070
Quick Mode
Host to Device Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 12399.4
Device to Host Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 12463.5
Device to Device Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 190703.2

This puppy is ready for dev ^_^