We have been running some tests with a Magneto-hydrodynamics code which uses numerical stabilization to control solution stability at shocked regions of the grid for the problem.
For the code in question we have achieved good levels of performance and performance scales well with the
size of the grid.
However, when we have compared the performance of the code on the K20 and the M2070 we obtained performance improvements much less than the factor of 2 which has been claimed for many applications.
We would like to know if the following reason might be plausible.
Basically the K20 has 2688 cores as compared to the 448 cores of the M2070. The K20 has a bandwidth of 208GB/s as compared to the M2070 bandwidth which is 144GB/s.
Is it plausible that with its increased number of cores the Kepler K20 imposes a much greater demand on the memory bandwidth. This is not compensated for by the increased Kepler K20 bandwidth. Our problem places such a demand on bandwidth because we employ a large number of fields and temporary fields used for the numerical stabilisation.
What other reasons might there be for our observed performance?
Many Thanks