How to disable/enable ECC on C2050?

Ok, let’s say R is equal to the ratio between the number of floating point operations and the number of memory load in a matrix matrix multiplication routine. I suspect that for most algorithms, R is roughly equal to the dimension of the square tiles that divide up the input matrices. Given that the shared memory in Fermi has increased from 16 to 48 kB, are we not at a point where R is large enough so that the memory bandwidth number doesn’t affect the GEMM performance? Or am I missing something here?

A lot of the more optimized algorithms are actually avoiding shared memory and instead trying to keep everything in registers.

If we assume 500 GFLOP/s double precision peak and 150Gb/s memory bandwidth, being compute bound would imply at least 27 double precision flops per memory transaction. I doubt it will be possible to achieve that level of arithmetic intensity in DGEMM.