Uhh yes, why not call the Titans with the initial of the architecture?
Titan K, Titan M, Titan P, Titan V…
that would also make the naming closer to the Quadro cards (K6000, M6000) and avoid the messy confusion we will get now.
More important question though: How many FP16 units per SM are in GP102?
15.3 B transistors on the P100 (due to FP16v2? Here also the HBM and NVLINK interfaces have to be taken into account)
11.0 B transistors on the GP102 (52%, probably just cache, memory interface and additional SMs)
7.2 B transistros on the GP104
4.4 B transistors on the GP106
Transistor count suggests the GP102 is approx. GP104+GP106 big (the math adds up).
I guess this would suggest also one FP16 unit per SM on the Titan X?
Was nVidia not able to put 24 GB of GDDR5X on that card for that price tag?
i’m think that gp102 has the same 6.1 arch and 3840 alus. but yields are yet low, so they disabled two SMs. as process will mature, i think it will be renamed to Ti, and real titan arrives - with all 3840 alus, 24 GB and $1000 price
@SvenMeyer
Probably the same, seems like GP102 is SM61.
Here, nVidia only talks about FP32 and INT8 performance, suggesting this is just an upscaled GP104:
Since they mention deep learning, I’m sure they would have mentioned fast FP16 if it would exist.
For comparison Titan X has 44 TOP INT8, 1080 has 33 TOP INT8
Hi,
I was looking on the posts on FP16 on GTX 1080 and Titan.
I am wondering what is the cost of casting the FP16 data in FP32, in order to use the fast calculation capacity?
Our idea is to use FP16, to reduce memory requirements but to use FP32 for computation.
Our system is limited mainly by memory transfers.
Is it a reasonable approach?
It’s absolutely supported and does indeed relieve memory throughput bottleneck compute. A new header file in CUDA 7.5 cuda_fp16.h has routines for packing and unpacking from a word. There’s also an packed-FP16 SGEMM in 7.5+ cuBLAS called SGEMMex which uses FP32 math on the packed fp16 format. That’s different than HGEMM which does packed fp16x2 math natively on P100 and X1.
Note that there is a performance bug in fp16 → fp32 in cuda 8 rc and consumer pascal hardware. The bug has been fixed but it’s waiting on the next 8.0 release.
So I bought 1080GTX and I’m disappointed about FP16, would know that before I would buy Tesla X (Pascal version).
Is there a chance that Nvidia will “fix” issue with FP16 for 1080GTX and we will get double performance over FP32 or should I just change this card to Tesla X while I can ?
My apologies if I missed that information in earlier posts.
consumer cards and most professional cards drops 90% of fp64/16 hardware to boost fp32 performance. you can do nothing on software level to fix that. only way is to buy P100-based cards, and afair only one way to do it now is to buy $100K monster
Hello Guys,
I seek your help. If may I ask, I have a question.
How the hell did you get to run these beast under your control? I tried to tame the with driver, but I cant run them, only on X.Org X.
Would you help me to run my GTX1070 Windforce on any kind of linux?
Hello Guys,
I seek your help. If may I ask, I have a question.
How the hell did you get to run these beast under your control? I tried to tame the with driver, but I cant run them, only on X.Org X.
Would you help me to run my GTX1070 Windforce on any kind of linux?