Looking at the code, it really isn’t very suprising that the GPU is slower. The memory access patterns are about as suboptimal as you could possibly make them for a compute 1.1 device. There is a section in the programming guide which discusses the concept of memory coalescing and how to achieve peak memory throughput on the GPU. You might want to review it.
Related topics
Topic | Replies | Views | Activity | |
---|---|---|---|---|
GPU/CPU precision comparison and Kernel instructions question | 5 | 679 | April 4, 2017 | |
Cuda code performance | 14 | 3149 | December 16, 2014 | |
Confused about GPU vs CPU speed in multiplication | 8 | 6548 | February 19, 2009 | |
GPU and CPU don't run in (pure) parallel ? | 24 | 20148 | May 4, 2007 | |
GPU Perfomance How much GFlops??? | 27 | 37403 | August 30, 2009 | |
Cuda program results are always zero in HW, correct in EMU? | 35 | 11161 | May 23, 2010 | |
Wishlist Place your considered suggestions here | 201 | 204317 | April 13, 2009 | |
Memory problem? ...incredible slowdown | 29 | 16301 | January 30, 2011 | |
Is GPU worth it? GPU currently too slow. | 16 | 6040 | December 8, 2008 | |
Problem with performance running parallel blocks | 3 | 582 | March 26, 2018 |