Hi Guys, I wrote a CUDA code for 2D convolution,
the code is every simple as attached.
However I tested my code on Tesla, it got no misses compare with the CPU result, but it’s much slower than the CPU code:
setting device 0 with name Tesla C1060
GPU Runtime: 0.009131s
CPU Runtime: 0.001287s
Number of misses: 0
But if I ran my code on fermi card, it’s two times faster.
Anybody can tell me why?
2DConvolution.cu (4.08 KB)