GTX460 is slower than 9800GT GTX460 is slower than 9800GT in Convolution Operation

Does anyone test the convolution Separable example in CUDA SDK using New Architecture GPU of Fermi?

I’m confused about those data:

Image size: 3072x3072, kernel size: 17x17

9800 GT:
1929.0962 MPixels/sec, Time = 0.00489 s, Size = 9437184 Pixels

GTX 460:
1769.5650 MPixels/sec, Time = 0.00533 s, Size = 9437184 Pixels

And the convolution example using Texture:

9800 GT:
convolutionRowsGPU: 2.430496 ms.
convolutionColumnsGPU: 2.465828 ms

GTX 460:
convolutionRowsGPU: 4.105941 ms.
convolutionColumnsGPU: 4.104741 ms

I don’t know why GTX460 is slower than 9800 GT, and using texture is slower about twice, WHY???

P.S. My driver version is: 258.96; and CUDA toolkit version is 3.1, and CUDA SDK version is 3.10.608.1117

would be interesting to see if code compiled with CUDA SDK 2.3 still shows the same differences.

I have a GTX 460 too, so I might try.

It looks to me from the source code that the problem is shared memory bank conflict.

Fermi based GPU have 32 banks, while earlier GPUs have 16 banks. It’s clearly that the program is written with 16 banks in mind. I tested it with Visual Profiler and indeed it has serious amount of bank conflicts.

Fortunately, it’s easy to correct this because Fermi supports larger blocks. Just modify the two defines in the code:

#define   ROWS_BLOCKDIM_X 16   // change this to 32

and

#define   COLUMNS_BLOCKDIM_X 16  // change this to 32

This is for shared memory version. I can get around 3400 Mpix/s on my factory OC’d 460 (not by much, only @ 715MHz).

I don’t know why the texture version is slower though, as GeForce 460 should have a little higher texture fillrate than 9800GT.

I will try later, what’s your result?

But I don’t why cause these difference.

Yes, I try BLOCKDIM_x = 16, I can get 3400 Mpixel/s again, Thanks your mentions about shared memory with bank confilct.

And, I also confused the data about convolution using texture, there is no reason that GTX460 is slower than 9800GT twice.