GTX 1050 ti does not have Compute Preemption?

james_hancox · November 24, 2016, 2:44pm

I’ve just been testing out our new GTX 1050ti, and I’ve found that according to cudaDeviceGetAttribute() this device does not support Compute Preemption. I was under the impression that all Pascal GPUs had this feature- is this feature not enabled on the 1050ti, or is this a bug in the CUDA runtime?

cbuchner1 · November 24, 2016, 4:08pm

Is this on Windows or Linux?

james_hancox · November 24, 2016, 4:12pm

Windows 10 64-bit.

cbuchner1 · November 24, 2016, 4:17pm

Hmm, too bad that deviceQuery does not show this property. If it did, it would be easy to check support from various online posted deviceQuery results like the following for a GTX 1080

CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: “GeForce GTX 1080”
CUDA Driver Version / Runtime Version 8.0 / 8.0
CUDA Capability Major/Minor version number: 6.1
Total amount of global memory: 8112 MBytes (8506179584 bytes)
(20) Multiprocessors, (128) CUDA Cores/MP: 2560 CUDA Cores
GPU Max Clock rate: 1835 MHz (1.84 GHz)
Memory Clock rate: 5005 Mhz
Memory Bus Width: 256-bit
L2 Cache Size: 2097152 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
Maximum Layered 1D Texture Size, (num) layers 1D=(32768), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(32768, 32768), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 2 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Device PCI Domain ID / Bus ID / location ID: 0 / 1 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

Robert_Crovella · November 24, 2016, 4:52pm

I have a pascal titan X running on Ubuntu 14.04 linux with CUDA 8 and it supports this property:

$ cat t36.cpp
#include <stdio.h>
#include <cuda_runtime_api.h>
#include <assert.h>
#include <stdlib.h>

int main(int argc, char *argv[]){

  int support, device = 0;
  if (argc > 1) device = atoi(argv[1]);
  cudaDeviceProp my_prop;
  cudaError_t err = cudaDeviceGetAttribute(&support, cudaDevAttrComputePreemptionSupported, device);
  assert(err == cudaSuccess);
  err = cudaGetDeviceProperties(&my_prop, device);
  assert(err == cudaSuccess);
  if (support) printf("%s device %d supports compute preemption\n", my_prop.name, device);
  else printf("%s device %d does not support compute preemption\n", my_prop.name, device);
  return 0;
}
$ g++ -I/usr/local/cuda/include  t36.cpp -o t36 -L/usr/local/cuda/lib64 -lcudart
$ ./t36
TITAN X (Pascal) device 0 supports compute preemption
$

james_hancox · November 25, 2016, 9:37am

That’s great information, thanks! Are you running your Titan X in TCC mode, or under the regular display driver?

cbuchner1 · November 25, 2016, 10:46am

I thought the WDDM/TCC mode distinction was a Windows thing only?

james_hancox · November 25, 2016, 10:52am

Ah, you might be right- I’ve only toyed about with CUDA on Linux (most of the time I’ve worked on Windows) so I’m less familiar with the details.

james_hancox · December 19, 2016, 3:04pm

Aha, just found this in the Pascal Tuning Guide (which seems to have appeared in the past few weeks):

“Compute Preemption is a new feature specific to GP100.”
https://docs.nvidia.com/cuda/pascal-tuning-guide/index.html#preemption

Explains why it isn’t on the 1050ti.

SPWorley · December 19, 2016, 10:11pm

The GTX 1080 whitepaper says that Pascal has preemption of both graphics and compute pipelines at instruction level. However the CUDA 8 toolchain has not exposed this yet as far as I know, even for GP100. I had not noticed that line in the tuning guide, which contradicts the Pascal whitepaper.

Additionally, Pascal’s driver level automatic support for silent preemption of compute tasks (eliminating the longstanding kernel time limit watchdog killer) has not been implemented (yet?) either.

Topic		Replies	Views
CUDA for GeForce GTX 1050 Ti CUDA Programming and Performance	8	77758	February 6, 2018
Geforce GTX 1050Ti CUDA Setup and Installation	4	23703	September 13, 2017
one CUDA card unrecognized in 64bit Win7 CUDA Programming and Performance	5	1698	April 15, 2011
cudaMemPrefetchAsync returns cudaErrorInvalidDevice CUDA Programming and Performance	21	4518	November 15, 2021
Driver for GTX 1080 Ti CUDA Programming and Performance	21	19615	June 22, 2017
CUDA Driver Version / Runtime Version problem? CUDA Programming and Performance	4	1371	January 25, 2019
How to utilize Compute Preemption in the new Pascal architecture (Tesla P100 and GTX1080)? CUDA Programming and Performance	8	3567	April 7, 2018
GTX 1080ti CUDA Programming and Performance	14	6058	March 11, 2017
Perplexing CUDA performance experiment of GTX560(1G) and GTX1050TI(4G) CUDA Programming and Performance	9	1143	July 31, 2018
Cannot debug both gpus under nsight CUDA Programming and Performance	2	469	May 15, 2017

GTX 1050 ti does not have Compute Preemption?

Related topics