flops calculation by profiler / of maximum


  1. I wonder if there is a way to let the profiler count FLOPs and if not if there is any other method to calculate FLOPS per second beneath analyzing the algorithm…?
  2. How can I find out the theoretical maximum of FLOPS/s for my board (Geforce 9600) ?

The first I do not know.

Second. The GFLOPS/s reported by Nvidia is your:

SP Clock Rate * #SP * #Floating point operations pr cc for each SP

In your case Shader Clock is 1.625 GHz. It has 64 SP. Each SP can do a MAD instruction which is 2 flops and the SFU associated with each SP can do 1 multiply - if they not busy. Giving you a peak GFLOP/s at:

1.625 * 64 * 3 = 312 GFLOP/s

A more reasonable estimate is not counting the SFU and just issuing 1 instruction pr cc. But that’s on the low side. This doesn’t count for double precision floating point operations.

Thanks a lot!
I thought the stream processors and the multiprocessors are the same thing, Now I see that I have 12 multiprocessors but 64 streaming processors, can you tell me the difference?

Each multiprocessor contains multiple stream processors.

Also, I think you have the number of multiprocessors or number of stream processors wrong for your card. All CUDA-devices made thus far have 8 stream processors per multiprocessor.

The GeForce 9600 GT has 64 stream processors, grouped into 8 multiprocessors. The GeForce 9600 GSO has 96 stream processors, grouped into 12 multiprocessors.

I was wondering this too, but wasn’t positive so I thought I’d wait to see if anyone else posted.

You are completely right, GT has only 8 MPs (GSO hast 12). Thanks for your comments.