Titan V FP16 Performance

Can someone from NVIDIA provide a solid spec for the Titan V’s FP16 performance?

I’ve seen 15TFLOPS FP32 and 110TFLOPS using the Tensor Cores, but no spec in the marketing materials for FP16.

FP16 not using TensorCore should be at double the FP32 rate, for this V100 based product. This is a characteristic of the V100 device, and similar to all other GPUs with full-rate FP16 throughput (i.e. sm_53, sm_60, sm_62, sm_70) This general principle is observable here:

http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#arithmetic-instructions

Thanks for the response.

I suspected as much, but wanted confirmation for the Titan V, since previous GTX incarnations did not have FP16 support like their datacenter counterparts (i.e. Titan Xp vs P100).

Can you confirm that the Titan V has native FP16?

It’s confirmed here
http://docs.nvidia.com/cuda/cuda-binary-utilities/index.html#volta

there’s a DP4A hardware instruction on the V100 chip.

dp4a has nothing to do with FP16 computation. You are thinking of INT8

The Titan V is a compute capability 7.0 device.

https://www.reddit.com/user/hellotanjent

The FP16 throughput (not using TensorCore) for compute capability 7.0 is given in the table I already linked:

http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#arithmetic-instructions

Oh you’re right. Sorry about the misinformation. I’ve been focusing too much on integer stuff recently.