I’m newbie in GPU computing & want to start my research about performance measuring between single & multiple GPU.
because I still having trouble to create my own programs, so I look for programs that can run on a single or Multi-GPU (optional), if there is any example to test I’m so glad for it.
I’ve tried the example program on CUDA SDK but it seems there is no one can run on single and multi-GPU?
If you have a program that can run on a single optional Multi-GPU or GPU, and need beta tester, I will do it for you.
It does give a charming ‘workload share report’, but feel it needs further optimizations to do us Nvidia boys justice.
I too am doing my level best to help Pat (The developer) in any way. ;)
We need a ‘BETA Testers Standing Ready for Nvidia Developers’ section.
I would be a resource for them too.
My favorite multi-GPU program is Mandelbulb. Due to it using the Optix libraries, it does do an outstanding job of using all of your GPU’s installed in the system. External Image
(That’s both GPU’s operating in SLI mode, and Dedicated PhysX mode.)
The CPU workload will also make use of multiple cores if available.
The ATI GPU’s cant get near that running the same SIMPLE_64SIZE .bat file…
Best conclusion so far: The simple scene consists of a lot of “nothing to do” rays, I guess (off to infinity and beyond). This should mean the duration of the kernel is short. The variation with CPU clock seems to suggest that CPU-side stuff is some kind of bottleneck. Finally the lower ATI performance for this scene seems to suggest that kernel launch overhead is higher on ATI.
We also are toying with the idea that the app might be using alternating memory buffers, that are being updated with blocking enabled causing a spinlock wait.
Problem I’m now thinking with that theroy is, when I generate my 114,154.1K Samples/sec running the SIMPLE_64SIZE scene, my CPU Utilization didn’t drop.
I would have expected it to, yet I still had stunning performance. The screen does indeed get rendered quick!
Bottom line…
Both for single and dual GPU performance, my impression is there is lots of fine tuning required, for GPU accelerated apps to be the best they can be. External Image
However, when you get them dialed in, you can get crazy performance. I also think it is largely going to be app dependent, both on their level of parallelism, and how well the programmer was able to fine tune it.
(I should warn you that I am not a CUDA programmer.)
DYL-280 and BASIC are the only languages that I have messed with.