Performance evaluation

JGuilmette · August 15, 2008, 7:09pm

Hi everybody,

I am currently running a program using a kernel that does operations on a worst case scenario to evaluate the maximum frame rate per second that we can get with our graphic card, a GeForce 8600.

We will be using, in our final product, a Fx 3600 and I was wondering if there was a way to predict, approximately, what will be the performances using this graphic card.

This specific kernel can do, using the CUDA Profiler, approximately 124 instructions by usecond. Knowing how many instructions I need to run the kernel, I can predict the time that it will take to run the kernel in the GPU in useconds on my GeForce 8600 and it seems quite accurate yet.

Knowing that the Fx 3600 has 12 multiprocessors and that the GeForce 8600 has only 4 of them, can I say that the number of operations by useconds could be around 3 times those of the GeForce, so about 372 instructions/useconds or is it a too simplist way to look at the card performances using CUDA ? :unsure:

If I’m wrong, then what elements would give me more details on the performances of my kernel under a FX 3600 ?

Thank you in advance !

Jerome

seibert · August 16, 2008, 12:21am

There are two other factors to consider when predicting performance:

Stream processor (aka “shader”) clock rate: I can’t seem to find the shader clock for the FX 3600, but you should also compare that to the 8600. Be sure to look at the shader clock and not the core clock.
Memory bandwidth: If your kernel is memory bound (many kernels are), then the performance will scale like the memory bandwidth and not like the number of stream processors.

JGuilmette · August 18, 2008, 2:22pm

Thanks,

I’ll look into this.

Jerome

JGuilmette · August 18, 2008, 2:43pm

I have found the following for the Fx 3600:

[url=“http://www.nvidia.com/object/quadrofx_3600m.html”]Page Not Found | NVIDIA

I also can’t get the information about the stream processors’s frequency.

The memory bandwidth is 51.2GB/sec. compared to 32.0GB/sec. for the Geforce 8600.

Just wanna make sure: by memory bound kernel, you mean the fact of using shared memory to do operations on the device ?

So following the idea that my kernel is memory bound, the performance gain would be of 51.2 GB/sec. / 32 GB/sec. and the number of multiprocessors ( stream processors ) would have no impact on my performances ?

Thank you once again !

Jerome

MisterAnderson42 · August 18, 2008, 2:53pm

Memory bound means you are bound by the global memory bandwidth (the number you listed). If you do any less than 100’s of floating point operations for each one global memory read, then your kernel is most likely memory bound.

Taking the two memory bandwidths and dividing to get an effective speedup should give you a very rough estimate of the performance if your kernels are memory bound. For a concrete example: my app is memory bound and performance 18% faster on 8800 GTX compared to the 8800 GTS (G92). The 8800 GTX memory is 35% faster than the 8800 GTS (G92): 86.4/64.

JGuilmette · August 20, 2008, 2:59pm

It seems I’ll not be able to really find out about performance gain until I test it.

My current test kernel is not memory bound, but depending on the technological decisions we’re gonna take and the applications that we’re gonna run on GPU, my CUDA functions will probably be memory-bound.

Thanks for all the info, it’s getting clearer and I’m learning more everyday about GPGPU under CUDA environment External Image

Jerome

Topic		Replies	Views
Gap between measured perf. and peak CUDA Programming and Performance	8	13074	March 20, 2008
Is it possible to estimate the performance ? 8500GT (current) -> 9800xxx = ? CUDA Programming and Performance	15	8267	June 13, 2008
How to know where the bottleneck is? CUDA Programming and Performance	3	4249	February 29, 2008
Is there any tool which can tell my kernel is compute bound or memory bound CUDA Programming and Performance	7	6006	April 3, 2010
memory bound CUDA Programming and Performance	3	1187	April 10, 2013
Disable cores to benchmark performance CUDA Programming and Performance	7	2890	September 14, 2008
Is it possible to estimate the performance ? 8500GT (current) -> 9800x (or GT200) = ? CUDA Programming and Performance	4	3661	June 13, 2008
compare performance across different GPU cards and how to figure out the frequency the GPU clock? CUDA Programming and Performance	4	9937	June 14, 2010
Speed improvement CUDA Programming and Performance	18	8269	December 5, 2008
Scalability question CUDA Programming and Performance	3	9128	June 6, 2009

Performance evaluation

Related topics