GTX460 is slower than 9800GT GTX460 is slower than 9800GT in Convolution Operation

egg · August 3, 2010, 1:21pm

Does anyone test the convolution Separable example in CUDA SDK using New Architecture GPU of Fermi?

Iâ€™m confused about those data:

Image size: 3072x3072, kernel size: 17x17

9800 GT:
1929.0962 MPixels/sec, Time = 0.00489 s, Size = 9437184 Pixels

GTX 460:
1769.5650 MPixels/sec, Time = 0.00533 s, Size = 9437184 Pixels

And the convolution example using Texture:

9800 GT:
convolutionRowsGPU: 2.430496 ms.
convolutionColumnsGPU: 2.465828 ms

GTX 460:
convolutionRowsGPU: 4.105941 ms.
convolutionColumnsGPU: 4.104741 ms

I donâ€™t know why GTX460 is slower than 9800 GT, and using texture is slower about twice, WHY???

P.S. My driver version is: 258.96; and CUDA toolkit version is 3.1, and CUDA SDK version is 3.10.608.1117

cbuchner1 · August 3, 2010, 1:35pm

would be interesting to see if code compiled with CUDA SDK 2.3 still shows the same differences.

I have a GTX 460 too, so I might try.

pcchen · August 3, 2010, 2:13pm

It looks to me from the source code that the problem is shared memory bank conflict.

Fermi based GPU have 32 banks, while earlier GPUs have 16 banks. It’s clearly that the program is written with 16 banks in mind. I tested it with Visual Profiler and indeed it has serious amount of bank conflicts.

Fortunately, it’s easy to correct this because Fermi supports larger blocks. Just modify the two defines in the code:

#define   ROWS_BLOCKDIM_X 16   // change this to 32

and

#define   COLUMNS_BLOCKDIM_X 16  // change this to 32

This is for shared memory version. I can get around 3400 Mpix/s on my factory OC’d 460 (not by much, only @ 715MHz).

I don’t know why the texture version is slower though, as GeForce 460 should have a little higher texture fillrate than 9800GT.

egg · August 4, 2010, 5:41am

I will try later, what’s your result?

But I don’t why cause these difference.

egg · August 4, 2010, 6:10am

It looks to me from the source code that the problem is shared memory bank conflict.

Fermi based GPU have 32 banks, while earlier GPUs have 16 banks. It’s clearly that the program is written with 16 banks in mind. I tested it with Visual Profiler and indeed it has serious amount of bank conflicts.

Fortunately, it’s easy to correct this because Fermi supports larger blocks. Just modify the two defines in the code:
#define   ROWS_BLOCKDIM_X 16   // change this to 32
and
#define   COLUMNS_BLOCKDIM_X 16  // change this to 32
This is for shared memory version. I can get around 3400 Mpix/s on my factory OC’d 460 (not by much, only @ 715MHz).

I don’t know why the texture version is slower though, as GeForce 460 should have a little higher texture fillrate than 9800GT.

Yes, I try BLOCKDIM_x = 16, I can get 3400 Mpixel/s again, Thanks your mentions about shared memory with bank confilct.

And, I also confused the data about convolution using texture, there is no reason that GTX460 is slower than 9800GT twice.

Topic		Replies	Views
GTX460 is slower than 9800GT GTX460 is slower than 9800GT in Convolution Operation CUDA Programming and Performance	6	10187	February 22, 2011
CUDA performance degradation on GTX460 CUDA performance issues CUDA Programming and Performance	2	1103	February 22, 2011
9800 GTX and CUDA performance problems Slower than 8800 GT in some cases CUDA Programming and Performance	10	15112	June 27, 2008
3D texture based separable convolution extension of SDK example CUDA Programming and Performance	1	1861	April 6, 2010
GTX 470 Seems Slow... No Better than GTX 260? CUDA Programming and Performance	5	9312	April 29, 2010
Will CUDA 2.0 Support 9600GT(G94)? CUDA Programming and Performance	10	12526	August 23, 2009
GTX 480 - performance CUDA Programming and Performance	8	6807	June 9, 2010
Texture memory fetch extremely slow CUDA Programming and Performance	13	3138	December 21, 2017
Serious problems with GTX 460 just about ready to give up CUDA Programming and Performance	2	11876	February 22, 2011
GTX460 number of multiprocessors CUDA Programming and Performance	16	10154	September 22, 2010

GTX460 is slower than 9800GT GTX460 is slower than 9800GT in Convolution Operation

Related topics