So, having dived into OpenCL recently, and duly building and running the SDK examples etc, I have noticed that there seems to be no performance benefit from using OpenCL on my machine.
The example programs which allow you to compare the performance of OpenCL on the GPU to local CPU are pretty much identical in performance, and the nbodies demo is clunky and slow.
Other folks in the office are running on Linux and Mac (I’m on Windows) and theirs run fine - nbodies is smooth and fast.
I have the ostensibly appropriate drivers installed (190.89) and am running the sdk 2.3a.
Any suggestions as to what could be wrong with my setup?
Running a Quadro FX 580 on a Quad Core Dell system.
Thanks!
What example programs are you referring to? Samples from an SDK or something else?
Not sure of what you are comparing, but keep in mind that the Quadro FX 580 has only 32 processing cores and is not the most powerful gpu out there. It is far from the 240 cores the current generation Tesla card have and the 512 cores promised for the next generation.
Are you running examples on the same hardware ?
I’m running both the samples with the SDK, and a concoction of our own devising which, on MAC/Linux, runs at least 5 times faster using openCL rather than local CPU, even on lower spec GPUs than my own.
Do you use AMD’s runtime for CPU OpenCL? Do you switch the .dll? NVIDIA’s drivers don’t support CPU execution at all.
Suspect there is a misunderstanding here.
In a number of the examples provided, and in our own local example OpenCL code (which is fine for MAC/Linux), there is the ability to switch between OpenCL running on GPU, and native C++ code on the CPU.
The problem is simply that OpenCL on the GPU ain’t doing a damn thing.
Interestingly when using the CUDA example code (provided with the gpu computing sdk), performance is awesome.
Have you tried newer drivers? Try downloading the newest ones (I believe 196,21 at the time of writing) and SDK 3.0 Beta (2.3 won’t work with new drivers). There have been some important changes to the OpenCL runtime introduced after 195.