Cuda 9 FP16

“INCREASE APPLICATION THROUGHPUT WITH FP16 AND INT8 SUPPORT”

Can someone please outline the level of support for FP16 in CUDA9 on Pascal Geforces and Titans?

You’re misreading that page. Those comments refer to CUDA 8. Please study that page carefully. Read it from top to bottom. notice where the CUDA 9 section starts. Notice where the CUDA 8 section starts. Notice where that quote is with respect to the start of the CUDA 8 section.

FP16 and INT8 support both appeared in CUDA 8, and for applications which can take advantage of them, they can increase throughput. This is not news.

Not a single Pascal GeForce card supports FP16 (Titan X (Pascal) and Titan Xp are still GeForce).

Out of all Pascal GeForce cards, only Titan X (Pascal), Titan Xp and 1080 Ti support INT8 inference.

Not at all, or at very low throughput? I thought it was the latter.

The latter. Native FP16 has 1/64 throughput on consumer Pascals so there might as well be none, as it’s much slower than “simulated” FP16 (FP16 storage but FP32 compute).

Not sure what you mean by this. Every pascal GeForce card supports the dp4a instruction at effective 4x fp32 math throughput. The only pascals that don’t support it are gp100 and Parker, but these have never been deployed in GeForce boards.