Speedup examples?: "Old" GPGPU -> CUDA


does anyone have an example of an algorithm that has been implemented in Brook or some similar “oldschool” GPGPU language, as well as in CUDA?

I’m writing a grant application and it’d be great to have a concrete reference to an actual speedup of implementing in CUDA.

Many thanks.

Look at this paper for Brook/CUDA comparisons:

Abstract: “The porting of two- and three-dimensional Euler solvers from a conventional CPU implementation to the novel target platform of the Graphics Processing Unit (GPU) is described. The motivation for such an effort is the impressive performance that GPUs offer: typically 10 times more floating point operations per second than a modern CPU, with over 100 processing cores and all at a very modest financial cost. Both codes were found to generate the same results on the GPU as the FORTRAN versions did on the CPU. The 2D solver ran up to 29 times quicker on the GPU than on the CPU; the 3D solver 16 times faster.” (Tobias Brandvik and Graham Pullan, Acceleration of a 3D Euler Solver Using Commodity Graphics Hardware. 46th AIAA Aerospace Sciences Meeting and Exhibit. January, 2008.)



This is pretty useful:

High-performance direct gravitational N-body simulations on graphics processing units

High Performance Direct Gravitational N-body Simulations on Graphics Processing Units – II: An implementation in CUDA

This compares x86 vs. Cg vs CUDA vs GRAPE 6a, a dedicated hardware board for doing gravitational calculations.