Nvidia Pascal TITAN Xp, TITAN X, GeForce GTX 1080 Ti, GTX 1080, GTX 1070, GTX 1060, GTX 1050 & GT 1030









It’s finally here, Pascal GeForce.

The demo was running at 2114 GHz air cooled!

Amazing and totally ridiculous.

If NVIDIA gimps or disables FP16 on this brace yourselves for NIRP

“brace yourselves for NIRP” does seem like something Janet Yellen might say.

NIRP will force me to buy a GTX 1080 instead of watching my savings account lose value.

Well played, Janet Yellen!

Can Nvidia release a white paper or post the output of ./DeviceQuery on the 1080/1070 when they get a chance?
I’m dying to get more details as to what in the world is going on inside that card as I would like to consider it for cuda development.
It’s a no go until I understand how exactly its different architecturally w.r.t to the cuda execution pipeline.

I’m surprised at how scant the details have been with relation to specs on the card.
Does it at least have dual copy engines like the previous 980?
If not, how can there be a claim of asynchronous compute?

What about runtime limits on kernel execution… etc etc.
What new features are being added to the card?
Are you guys removing any features that were already present on the 980/980ti/titanX ?

Lets get some details for the tech minded portion of your consumer base.

All that info is under NDA until May 17th, reviews will be up then.

As for dual copy engines, even the lowest end GM206 GPUs has it.

So Pascal should have it, after all it’s Compute Capability 6.x, higher than Maxwell’s 5.2.

it was just a teaser so we don’t forget about Huang existence :D

Excellent. Thank you for your reply.
This has calmed me a bit ^_^.

I look forward to getting more juicy details when this NDA is finally lifted.
I’m hoping that Nvidia has people doing more CUDA centric reviews and detailing beyond the gaming centric review channels.

Gaming benchmarks released for the GTX 1080 vs the competition;


Generally I have found that for CUDA (32-bit) the performance difference between cards for compute generally match those for gaming, so this appears to be very good news. Cant wait to get my hands on a couple of these guys.

LOL, they are already talking about a 2.5 GHz version;


The smaller shared memory per SM capacity is a bit of a bummer, but overall a large performance increase.

Not sure what they mean by ‘Async Compute’ in a CUDA context. Is this just a Direct X 12 thing or will it affect how we launch asynchronous kernels?

Cant find any info on the GTX 64-bit compute capability, other than the corresponding info for the P100.

I’m sure you’ve already considered that if SMPs wind up having 64 cores then shared memory and registers per core will be greatly increased.

GTX 1080: 2560 cores / 40 SMPs / 2560 KB smem / 2,621,440 regs → each core gets 1KB smem and 1024 registers
GTX 980 Ti: 2816 cores / 22 SMMs / 2112 KB smem / 1,441,792 regs

If SMPs remain 128 cores then those numbers won’t look as good.

One early report indicates GP104 SMPs have 128 cores.

We will know soon!

Wow, a reliable 2.5 GHz liquid cooled card is a compelling product especially for compute. It’s like getting an extra 1100 cores!

Edit: another 128-core claim is here.

sm_61 = sm_50 + fp16x2?

It will be interesting to see how code optimized for Maxwell runs on Pascal. In one of my projects I push on the max shared memory limit per SM (a large shared memory scratch pad per thread block) and that code will probably have to be adjusted to reduce the shared memory requirements.

GP104 is Compute Capability 6.1, but the devicequery is from CUDA Toolkit 6.5 so it might not properly support Pascal, might need to wait for CUDA Toolkit 8.0’s devicequery.

Apparently GP100 got 2MB page size support (NVIDIA Pascal GPU Architecture Preview: Inside The NVIDIA GP100 GPU - Page 2 | HotHardware), this should finally address the TLB trashing issues with random memory accesses that I’ve been struggling with on Maxwell. I hope this trickled down to GP104 and will get OS/driver support with the release. Can’t wait ;)

Is the run-time limit on kernels found on the 1080 a product of the hardware or windows drivers (WDDM)?
The 980ti and the titanX don’t have run-time limits on kernels (linux) :

However, the 980 has run-time limits on kernels.

Why is this the case? When is cuda 8 supposed to be released?

GP104 FP64 is 1/32, same as GM200 & GM204.

Enough to code and debug FP64 code but won’t set any performance records.

GP104 SMs have 96KB of shared memory, :)

(via HardOCP.com)

Net result is: except for GP100, Pascal devices has the same 5.2 architecture inside SM. There are changes outside, such as virtual memory, 2 MB pages and fast thread switching.

It’s possible that GP100 is essentially Volta, or at least half-way to Volta. Overall, it seems that NVidia took the well-known tick-tock approach: Maxwell was a tick (new architecture on old techprocess), and Pascal is a tock. And only GP100 is an exception with its own agenda, as well as testing lab for future architecture updates.

The important detail is that only GP100 need to work with high-speed HBM memory, and it may be the main reason of its halved SM (that still runs 2048 threads simultaneously, so there are 2x more threads per entire GPU). If that idea is correct, we will see 64-alu SMs in all Volta products.

The best source as usual: GP104 : 7.2 milliards de transistors en 16 nm - Nvidia GeForce GTX 1080, le premier GPU 16nm en test ! - HardWare.fr