significant difference in relative memory bandwidth efficiency, Quadro P5000 vs Titan X Pascal

Just made a few tests in Windows with a workstation with both a Quadro P5000 and Titan X Pascal inside.
Both are using GDDR5X memory. The Quadro P5000 is ‘GP104’, and the Titan X Pascal is ‘GP102’ chip.

The strange thing is that there is a big difference in the ‘efficiency’ of the memory system (achieved bandwidth reported by the SDK sample program versus theoretical peak bandwidth).

Quadro P5000: 248 / 288 = 86 %
Titan X Pascal: 335 / 480 = 70 %

Which makes me wondering - why the Titan X Pascal has such significantly lower ‘relative’ memory effiency ?

System: Windows 10 x64, 375.86 driver. I didn’t use the nvidia-smi command line tool for tuning.

I complained about this a while back, and that is just the way it is. Use the TCC driver for the Titan X and that will give you about a 10% boost in performance. Also set the memory clock to the ‘max supported’ via NVSMI.
The best I was ever able to get out of the Pascal Titan X was ~365 GBs with coalesced loads of the float4 type.

The Maxwell Titan X can get about 91% of theoretical maximum global memory bandwidth while the Pascal Titan X can only get about 76%. I even filed a bug with NVIDIA for this issue, but no fix to date.

What CudaaduC said. There is a lengthy thread on the memory efficiency of the GDDR5X based GPUs somewhere on this forum.

Some GPUs require core clocks higher than default to achieve full memory throughput. Not sure whether this applies here, but if the GPU supports the setting of application clocks, I would suggest dialing in the highest possible core clock setting. I would also suggest dialing in the highest supported power limit to make sure the core clock “never” has to be lowered due to reaching the power limit.