999 (MHz memory clock) * 2 (DDR multiplier) * 448 (bit width/pin count of the memory interface) / 8 (bits per byte) = 111888 Mb/s
My simple but speedy reduction code (runs 106.4GB/s on GTX 295) 106.4/111.9=95.1% to the peak bandwi
Related topics
Topic | Replies | Views | Activity | |
---|---|---|---|---|
Memory problem? ...incredible slowdown | 29 | 16300 | January 30, 2011 | |
Cuda program results are always zero in HW, correct in EMU? | 35 | 11153 | May 23, 2010 | |
What's new in Maxwell 'sm_52' (GTX 9xx) ? | 69 | 26917 | December 23, 2014 | |
2x slower kernel if the inner dimension is divsible by 16/32 | 13 | 57 | July 26, 2024 | |
my speedy SGEMM | 91 | 275905 | May 29, 2013 | |
Speedy general reduction sum code ( ~88.5 % of peak ) Updated for Kepler! __shfl() .... etc,. | 53 | 14917 | March 24, 2018 | |
Speedy general reduction code ( 83.5 % of peak) Works for any size | 44 | 30386 | October 29, 2010 | |
Could someone compile simple example for me on the mobile card? | 20 | 10175 | November 11, 2009 | |
Memory copy by two CUDA kernels - why speed differs? | 10 | 668 | September 28, 2018 | |
Understanding and adjusting Mark Harris's array reduction | 11 | 4302 | August 26, 2018 |