GPU Perfomance How much GFlops???

Mac, i thik that is something wrong on my enviroment. Can you plese run the next test on your PC.

simply replace the files in “simpleMultiGPU” SDK example and execute.

I am realy confused. Where am I wrong?

One more file.
CPP have to be changed to .cu

I understand and agree with you. But it depends on application. If finaly i will get 800% (8 times) better perfomance, i am ready to spend time to understand Stream. Again, it depend in which field you are working.

For example development of ASIC takes years, but finaly aniway it wins. :-)

I think that is a missunderstaning in concept.

IHMO, GPU it is like FPGA with one difference. If in FPGA everithing have to be done in VHDL, in GPU simple parts have to be done in CUDA or Streams(Small logic cells, shaders, could be programed in C language), and complicated calculation in high lavel language. For me GPU is like a co Processor. Execution have to be done in GPU, managing have to be done in CPU. For that should be possible to execute kernels very fast. To save memory bandwidth, intermidiet results between calls of different functions, could be stored in shared memory, and so on… But, anyway, i need my GFlops! :-)

I promise, you will see. :-)

Just give me a time . First, i will implement 1D FFT and FIR filter. I like ATI Streams more because I need more then just APIs. :-)

Dimitri, the way you launch your kernels (going through 1 - 10 blocks and 1 - 256 threads) is really bad. First of, 10 blocks won’t even saturate your GPU. You’ll want hundreds if not thousands. Secondly, one should launch a multiple of 32 threads per block. Otherwise you’re loosing performance because the blocks are not aligned to warp size (SIMD vector length).

Lastly, unrolling the loop 1024 times instead of ex. 200 will likely hurt performance because the executable will get too big to fit into the GPU’s instruction cache.

Tomorrow I will write a program that runs the kernels as they’re supposed to and writes flops to console. I will give you the whole project and a binary just in case. Are you using Visual Studio 2008?

Super!

Thank you very much!

Yes, i am using MSVS2005 and 2008.

If i will be able get the perfomance, will be very good. :rolleyes:

Thanks,

Dmitry

Here’s the project.
[attachment=10314:flopsBenchmark.rar]

Requires:

GTX 260 results (theoretical peak/obtained)
mad 545/524, 96% efficiency
mad+mul 817/576, 70% efficiency

My 8800 GTS results
mad 414/403, 97% efficiency
mad+mul 622/458, 74% efficiency

Hi Big Mac!

Finaly i have made a tests, and i rely got my GFlops. (476/445, and 458/715).

Thank you very much!

I will made some tests soon, and will try to find the reason why it was not possible to get flops before.

Again, thanks a lot!

Dmitry