Does Nvidia Titan x have native FP16 and int8 support?

dwaras3284 · August 11, 2016, 9:28pm

Anantech article that came out in July says NVIDIA Titan X will have int8 support. Does anyone know anything about this?

njuffa · August 11, 2016, 9:50pm

https://blogs.nvidia.com/blog/2016/07/21/titan-x/
Here are its numbers:

11 TFLOPS FP32
44 TOPS INT8 (new deep learning inferencing instruction) <<<<<<<<<<<<<
12B transistors
3,584 CUDA cores at 1.53GHz (versus 3,072 cores at 1.08GHz in previous TITAN X)
Up to 60% faster performance than previous TITAN X
High performance engineering for maximum overclocking
12 GB of GDDR5X memory (480 GB/s)

dwaras3284 · August 11, 2016, 11:35pm

I saw that announcement. Did anyone get a chance to test the performance of 8bit MAD on the actual physical card?

dwaras3284 · August 11, 2016, 11:40pm

I don’t see any builtin data type /math functions for int8 in cuda programming guide that came with cuda toolkit 8.0RC. Will it be released in the full version of cuda toolkit 8.0? If 8bit support is present in Nvidia Titan X, how do we access it and test it’s performance?

njuffa · August 12, 2016, 12:24am

Since C/C++ have had 8-bit integer data types for a very long time, and CUDA supports short-vector 8-bit integer types such as ‘uchar4’, I am not sure what kind of new data type would be needed?

If your question is, “Does CUDA 8.0 provide new device function intrinsics to access the new DP2A and DP4A instructions”, I don’t know the answer to that. But I would expect these instructions to be accessible from inline PTX at a minimum. Have you checked the latest PTX specification?

SPWorley · August 12, 2016, 1:53am

Int8 support, meaning 4 parallel byte multiply-accumulates, is supported by all Kepler, Maxwell, and Pascal NVidia cards (sm 3.0 and later). It’s performed in CUDA PTX by the vmad instruction.

fp16x2, defined as 2 parallel 16 bit IEEE floating point fused multiply/accumulates, is supported by P100 and also surprisingly by X1, the Maxwell based ARM SoC.

As Norbert says, DP2A and DP4A are new byte and word dot-product-and-accumulate device 6.1 instructions on GP106, GP104, and GP102, but not P100.

dwaras3284 · August 12, 2016, 9:21pm

I was expecting a device intrinsic for 8 bit more like _hadd and _hfma for half floats. I haven’t worked with vmad before. It’s a scalar 32 bit mad operation. Of course, we can pack 4 8 bits and do it but how different it is from dp4a instruction for 8bit MAD? Did anyone test dp4a on GTX 1080 and nvidia titan X and seen the throughput? Is it 4x?

BulatZiganshin · August 12, 2016, 10:26pm

It’s scalar instruction, so it perfroms only 1 MAD. simd instructions are described in PTX ISA 8.3 and was hardware implemented only in Kepler, with 1/4 throughput, so the overall throughput was still 1 MAD/cycle

Topic		Replies	Views
INT8 cublasGemmEx support on Tegra X2 and Tesla P100 GPU-Accelerated Libraries	4	1868	October 17, 2017
Cuda 9 FP16 CUDA Programming and Performance	5	1871	August 5, 2017
GTX 1080 vs GTX 1080 ti does both have INT8 instruction ? CUDA Programming and Performance	2	2895	September 6, 2018
integer arithmetic capabilities of Tesla GPUs & definition of terms CUDA Programming and Performance	5	1876	December 6, 2017
Nvidia Pascal TITAN Xp, TITAN X, GeForce GTX 1080 Ti, GTX 1080, GTX 1070, GTX 1060, GTX 1050 & GT 1030 CUDA Programming and Performance	157	78317	September 25, 2017
Mixed-Precision Programming with CUDA 8 Technical Blog	1	416	February 23, 2017
TensorRT 2.1.2 supports INT8 inference on GTX1080 and Pascal TitanX? GPU-Accelerated Libraries	1	1030	September 4, 2017
CUDA integer ops in hardware the skinny on ints in CUDA and hardware CUDA Programming and Performance	3	20170	March 26, 2007
FP16 --half=true option doesn't work on GTX 1080 TI although it runs ./sample_int8 INT8 GPU-Accelerated Libraries	2	5002	August 23, 2017
how can FP support integer type? integer support CUDA Programming and Performance	3	3153	July 23, 2007

Does Nvidia Titan x have native FP16 and int8 support?

Related topics