Perlin noise, as most of you are aware, is used to generate 3D procedural textures. It is used so frequently in computer graphics and can be fairly time consuming so it seems like an obvious application for CUDA acceleration.
I converted two flavors of Perlin noise to CUDA. This was fairly simple to do. Unfortunately when I compared the speed to the native version, I was not getting nearly as large of a speed increase as I would expect. I was hoping for about 60x speed increase on my 8800 GTX compared to a single AMD Opteron 2GHZ core. I quickly determined that the problem was that CUDA is very slow when accessing constant arrays.
I was able to optimize the improved Perlin noise algorithm substantially however I started noticing another problem. Depending on the scale of the noise, the speed would change. When zoomed into the noise (so that little detail was visible) the speed was good. When zooming out, the processing time increased dramatically (up to 2-4 times slower as I remember it). This slow down is caused by different threads addressing different elements in the constant array. When zoomed in, most of the threads are accessing the same array elements. As you zoom out, more and more threads need to look up different array elements and the processing time increases linearly until all the threads are accessing different array elements. This speed decrease became even more noticable when I used generated many octaves of fractal noise.
The solution to my slowdown problem was to get rid of the arrays completely. To do this, I replaced most of the Perlin noise algorithm with a cryptographic hashing function. These functions take a single 32 bit integer value as an input and produce a pseudorandom hashed value as a result. Here is a web page with some hash functions that work nicely:
The hashing version of the Perlin noise algorithm gives me pretty good performance and does not suffer any slow down regardless of the scale of the noise being generated.
I had an interesting idea about the cryptographic hasing functions. If nVidia could implement one of them as an intrinsic function with hardware on a future GPU, it could be possible to replace many texture maps with proceduraly generated 3D textures with very little processing cost. In fact it may also be possible for them to implement the entire Perlin noise algorithms in 1, 2, 3 and 4 dimensions as built in functions. This could dramatically decrease the memory requirements for many games since they would not need to allocate texture memory for many types of texture maps. In addition, a procedural noise map based on 2D Perlin noise would only repeat every 65536 pixels in X and Y. This would get rid of the obvious repeating maps in games. A fairly simple modification to the Perlin function would also produce bump or displacement normals which can be used to simulate wrinkled surfaces.
I would love to claim credit for the idea of using a cryptographic hash for Perlin noise but a quick Google serach showed that this has been discussed previously on other sites: