Nvidia Pascal TITAN Xp, TITAN X, GeForce GTX 1080 Ti, GTX 1080, GTX 1070, GTX 1060, GTX 1050 & GT 1030

NVD · July 19, 2016, 2:15pm

http://www.nvidia.com/download/driverResults.aspx/105037/en-us

http://www.nvidia.com/download/driverResults.aspx/105033/en-us

GeForce 368.81 WHQL driver for GTX 1060.

http://www.nvidia.com/download/driverResults.aspx/105343/en-us

GeForce 367.35 Linux driver for 1060.

NVD · July 22, 2016, 3:35am

http://www.geforce.com/whats-new/articles/nvidia-titan-x-pascal-available-august-2nd

http://www.geforce.com/hardware/10series/titan-x-pascal

Pascal TITAN X with 12GB GDDR5X, 480GB/s of bandwidth.

allanmac · July 22, 2016, 5:48am

Gadzooks!

I don’t think anyone saw the TITAN X coming!

So why is it called TITAN X and not:

TITAN P
TITAN X′
TITAN X²
TITAN XX
TITAN XXX
TITAN YOLO
TITAN MMXVI
TITAN $1200
TITAN ¯\_(ツ)_/¯

naibaf7 · July 22, 2016, 12:00pm

Uhh yes, why not call the Titans with the initial of the architecture?
Titan K, Titan M, Titan P, Titan V…
that would also make the naming closer to the Quadro cards (K6000, M6000) and avoid the messy confusion we will get now.

More important question though: How many FP16 units per SM are in GP102?
15.3 B transistors on the P100 (due to FP16v2? Here also the HBM and NVLINK interfaces have to be taken into account)
11.0 B transistors on the GP102 (52%, probably just cache, memory interface and additional SMs)
7.2 B transistros on the GP104
4.4 B transistors on the GP106

Transistor count suggests the GP102 is approx. GP104+GP106 big (the math adds up).

I guess this would suggest also one FP16 unit per SM on the Titan X?

Was nVidia not able to put 24 GB of GDDR5X on that card for that price tag?

Epsylon3 · July 22, 2016, 12:09pm

Titan “10” :)

BulatZiganshin · July 22, 2016, 12:27pm

i’m think that gp102 has the same 6.1 arch and 3840 alus. but yields are yet low, so they disabled two SMs. as process will mature, i think it will be renamed to Ti, and real titan arrives - with all 3840 alus, 24 GB and $1000 price

SvenMeyer · July 22, 2016, 1:07pm

Thanks @NVE, so we have for the GTX1080 - GP104

FP64 TFLOP = 1/32 FP32 TFLOP !
FP16 TFLOP = 1/64 FP32 TFLOP !

… now again the question. This time : How will it look like for Titan X / GP102 ?

naibaf7 · July 22, 2016, 1:16pm

@SvenMeyer
Probably the same, seems like GP102 is SM61.
Here, nVidia only talks about FP32 and INT8 performance, suggesting this is just an upscaled GP104:

Since they mention deep learning, I’m sure they would have mentioned fast FP16 if it would exist.

For comparison Titan X has 44 TOP INT8, 1080 has 33 TOP INT8

shaklee3 · July 22, 2016, 2:12pm

I’m calculating 4-5 GFLOPS/W less than the 1080, so unless you need more memory/memory bandwidth, it’s not a big step.

NVD · August 16, 2016, 6:17am

http://www.nvidia.com/download/driverResults.aspx/105851/en-us

http://www.nvidia.com/download/driverResults.aspx/105847/en-us

Nvidia GeForce 372.54 WHQL driver released.

Epsylon3 · August 17, 2016, 7:56pm

Look like the shared mem is bugged on windows with this version (win 7)

GabrielYS · September 14, 2016, 1:53pm

Hi,
I was looking on the posts on FP16 on GTX 1080 and Titan.
I am wondering what is the cost of casting the FP16 data in FP32, in order to use the fast calculation capacity?
Our idea is to use FP16, to reduce memory requirements but to use FP32 for computation.
Our system is limited mainly by memory transfers.
Is it a reasonable approach?

SPWorley · September 14, 2016, 4:15pm

It’s absolutely supported and does indeed relieve memory throughput bottleneck compute. A new header file in CUDA 7.5 cuda_fp16.h has routines for packing and unpacking from a word. There’s also an packed-FP16 SGEMM in 7.5+ cuBLAS called SGEMMex which uses FP32 math on the packed fp16 format. That’s different than HGEMM which does packed fp16x2 math natively on P100 and X1.

scottgray · September 14, 2016, 4:45pm

Note that there is a performance bug in fp16 → fp32 in cuda 8 rc and consumer pascal hardware. The bug has been fixed but it’s waiting on the next 8.0 release.

naurislv · September 29, 2016, 9:26am

Hello,

So I bought 1080GTX and I’m disappointed about FP16, would know that before I would buy Tesla X (Pascal version).

Is there a chance that Nvidia will “fix” issue with FP16 for 1080GTX and we will get double performance over FP32 or should I just change this card to Tesla X while I can ?

My apologies if I missed that information in earlier posts.

BulatZiganshin · September 29, 2016, 10:40am

consumer cards and most professional cards drops 90% of fp64/16 hardware to boost fp32 performance. you can do nothing on software level to fix that. only way is to buy P100-based cards, and afair only one way to do it now is to buy $100K monster

tor.datel · September 25, 2017, 9:11pm

Hello Guys,
I seek your help. If may I ask, I have a question.
How the hell did you get to run these beast under your control? I tried to tame the with driver, but I cant run them, only on X.Org X.

Would you help me to run my GTX1070 Windforce on any kind of linux?

thank you for all suggestions

tor.datel · September 25, 2017, 9:11pm

DOUBLE POST

Hello Guys,
I seek your help. If may I ask, I have a question.
How the hell did you get to run these beast under your control? I tried to tame the with driver, but I cant run them, only on X.Org X.

Would you help me to run my GTX1070 Windforce on any kind of linux?

thank you for all suggestions

Topic		Replies	Views
GTX 580 is not as good as GTX480 for CUDA ? CUDA Programming and Performance	23	3948	November 7, 2010
Fermi architecture details where can I find them? CUDA Programming and Performance	16	4067	April 8, 2012
GF100 vs GF104 Performance question CUDA Programming and Performance	18	8957	September 4, 2010
GTX 480 / 470 Double Precision Reduced? CUDA Programming and Performance	178	266185	October 9, 2010
Inside Pascal: NVIDIA's Newest Computing Platform Technical Blog	51	855	December 8, 2017
More details on new Tesla w/ Fermi GPU posted CUDA Programming and Performance	37	11488	December 12, 2009
GTX 460 CUDA Programming and Performance	58	60270	August 5, 2010
Is nvidia forcing SP compute customers into expensive cards? Why is SP Cuda so slow on gtx680? Somet CUDA Programming and Performance	49	13350	May 20, 2012
TITAN X CUDA Programming and Performance	35	10458	March 23, 2015
How is 1/8 DP performance in GF-100 done? CUDA Programming and Performance	33	11171	November 7, 2010

Nvidia Pascal TITAN Xp, TITAN X, GeForce GTX 1080 Ti, GTX 1080, GTX 1070, GTX 1060, GTX 1050 & GT 1030

Related topics