Today at the Solidworks conference, NVidia released a desktop PCIE GP100 GPU, the Quadro GP100. This is a mighty interesting card in that GP100 itself is unique with ECC, 1/2 rate FP64, fp16x2, and the extreme bandwidth of HBM2 memory. Perhaps more unique is that this card is the first to have not a SLI bridge spanning multiple cards, but an NVLink bridge. SLI is irrelevant for CUDA, but NVLink is not, since it provides a significantly faster and lower latency memory communication between two GPUs than PCIE. With Unified Addressing in CUDA 8 we may be able to in some ways treat multiple GP100 GPUs as having pooled memory if the NVLink support is fast and transparent enough (not sure how this will work in practice, but that’s what makes it interesting!)
The GP100 card has about 10% less FP32 throughput than the existing GP102 cards like the Pascal TitanX or Quadro P6000, and lacks the DP4A 8-bit machinine learning evaluation instruction.
This is the first NVidia PCIE GPU since K80 (over two years ago!) that has high throughput FP64.
I’m sure this card will be in the $5000 range, but considering the only way to get a GP100 before this was with a $120K DGX-1 system or an IBM Power-8 server.