FYI - A good Multi-OpenCL benchmark app...

Talonman · December 24, 2009, 7:53pm

FYI - A good Multi-GPU OpenCL benchmark app…

DirectCompute & OpenCL Benchmark. (By Pat.)

http://www.ngohq.com/graphic-cards/16920-d…-benchmark.html

Version v0.45 is special. It has outstanding Multi-GPU workload balance.

Version v0.44 looked like this loaded up:

First 1/2 of my 295 reporting 61% utilization…

Second 1/2 of my 295 reporting 58% utilization…

280 checking in at a whopping 92% utilization… (Go Dedicated PhysX processor!)

Very light CPU utilization, showing only 2%.

Best score generated on v0.44:

New improved version 0.45, with better workload balancing…

97%, 98%, and 98% GPU utilization… Sweet!

CPU now up from 2%, to 34% utilization.

New High score running v0.45 with all system settings the exact same as used in the v0.44 test. Score is up from C1786.0:

This is a good OpenCL test to show off Multi-GPU Rigs. External Image

Talonman · December 25, 2009, 6:19am

We have some OpenCL scores being posted.

http://www.evga.com/forums/tm.aspx?high=&a…p;mpage=1#89761

So that means:

A 8800 GTS and a single 4850 produces around C453.4

A single 260 produces around C707.3

A single XFX HD 5770 1GB produces around C1042.9

A single 295 produces around C1431 using both sides of the GPU…

A single 4890 produces around C2350

A single 295 and single 280 produce around C2575

Luv’s single 5870 produces around C4405

I noticed Pat had posted on page 2…

"Setting different profiles for CPU and OpenCL does not mean anything so you got almost the same results (it’s hard to get the same results for CPU because of background tasks)

The profile combobox is only enabled in DirectCompute tests and force the DirectX shaders compiler to build the GPU code for specific shader model.

The score you get is simply the number of mega kernel loops (10^6) per second that your CPU can process (using 12 threads). Higher number = better CPU performance.

The scores for different APIs are comparable so getting C1000 and M10 means your graphic card can handle 100x more calculations per second than your CPU. Thats mainly because the GPU can process thousands of threads at the same time without threads switching and the CPU usually can process 2, 4 or 8 threads."

Question: If scores for both CPU’s and GPU’s are generated by counting mega kernel loops (10^6) per second…

I know Nvidia Shaders do more work in 1 clock cycle than ATI.

I wonder if just counting kernel loops will equate to real world performance, when comparing ATI to Nvidia in OpenCL apps?

I still have a hard time accepting that a single 5870 would actually deliver more performance, than a 295 and 280 working together, all with high utilization.

I think the app gives accurate performance info when comparing Nvidia to Nvidia, or ATI to ATI, but am still not sure about comparing Nvidia to ATI.

The ‘counting kernel loops’ thing has me wondering now… :)

Opinions are welcome…

Topic		Replies	Views
GPU vs CPU performance comparison CUDA Programming and Performance	9	14991	August 13, 2009
Performance with multiGPU ... and the 9800 GX2. CUDA Programming and Performance	4	7946	October 22, 2008
Single & Multi GPU measuring performance increasing ? CUDA Programming and Performance	2	7022	January 11, 2010
OpenCL performance issues CUDA Programming and Performance	5	1925	January 25, 2010
CPU cores vs GPUs CUDA Programming and Performance	6	9843	March 18, 2009
CUDA with AMD ATI Radeon 5870 CUDA Programming and Performance	5	3007	November 3, 2009
Opinions on OpenCL on nVidia/AMD GPUs Is it worth supporting both vendors so I can always use the be CUDA Programming and Performance	14	9457	March 27, 2012
Need help testing OpenCL program CUDA Programming and Performance	2	1453	May 24, 2012
Performance gap for a short test code between GPU and CPU CUDA Programming and Performance	8	1861	October 26, 2017
MultiGPU information CUDA Programming and Performance	3	2325	June 8, 2009

FYI - A good Multi-OpenCL benchmark app...

Related topics