Perlin Noise and Arrays Avoiding arrays when generating 3D noise

Perlin noise, as most of you are aware, is used to generate 3D procedural textures. It is used so frequently in computer graphics and can be fairly time consuming so it seems like an obvious application for CUDA acceleration.

I converted two flavors of Perlin noise to CUDA. This was fairly simple to do. Unfortunately when I compared the speed to the native version, I was not getting nearly as large of a speed increase as I would expect. I was hoping for about 60x speed increase on my 8800 GTX compared to a single AMD Opteron 2GHZ core. I quickly determined that the problem was that CUDA is very slow when accessing constant arrays.

I was able to optimize the improved Perlin noise algorithm substantially however I started noticing another problem. Depending on the scale of the noise, the speed would change. When zoomed into the noise (so that little detail was visible) the speed was good. When zooming out, the processing time increased dramatically (up to 2-4 times slower as I remember it). This slow down is caused by different threads addressing different elements in the constant array. When zoomed in, most of the threads are accessing the same array elements. As you zoom out, more and more threads need to look up different array elements and the processing time increases linearly until all the threads are accessing different array elements. This speed decrease became even more noticable when I used generated many octaves of fractal noise.

The solution to my slowdown problem was to get rid of the arrays completely. To do this, I replaced most of the Perlin noise algorithm with a cryptographic hashing function. These functions take a single 32 bit integer value as an input and produce a pseudorandom hashed value as a result. Here is a web page with some hash functions that work nicely:

The hashing version of the Perlin noise algorithm gives me pretty good performance and does not suffer any slow down regardless of the scale of the noise being generated.

I had an interesting idea about the cryptographic hasing functions. If nVidia could implement one of them as an intrinsic function with hardware on a future GPU, it could be possible to replace many texture maps with proceduraly generated 3D textures with very little processing cost. In fact it may also be possible for them to implement the entire Perlin noise algorithms in 1, 2, 3 and 4 dimensions as built in functions. This could dramatically decrease the memory requirements for many games since they would not need to allocate texture memory for many types of texture maps. In addition, a procedural noise map based on 2D Perlin noise would only repeat every 65536 pixels in X and Y. This would get rid of the obvious repeating maps in games. A fairly simple modification to the Perlin function would also produce bump or displacement normals which can be used to simulate wrinkled surfaces.

I would love to claim credit for the idea of using a cryptographic hash for Perlin noise but a quick Google serach showed that this has been discussed previously on other sites:…topic_id=459087

-Mark Granger

The benefit of constant memory is it being cached. However, accesses to different locations in constant memory within a warp do get serialized. That’s why you were seeing a slow down when threads were accessing different array elements. Also, if you’re missing the constant cache, the access times become really slow - I’d say they’re in the neighborhood of uncoalesced gmem accesses.


Have you tried storing the hash tables in textures instead? That’s how our fastest shader-based implementations of Perlin noise work. You can actually use a 2D texture for the first 2 dimensions of the permutation table (see my article in GPU Gems 2).

Other people have suggested putting some kind of hash function in hardware, but how would we decide on the function? Some people want their GPU noise to match existing CPU implementations.

I considered storing the hash tables in textures. I may give this a try in fact. I found the cryptographic hash to be too tempting to resist implementing. It has the ability to generate a really large set of non-repeating 2D or 3D noise values. I personally do not like the look of the “improved” Perlin noise. It has too many straight segments and has obvious temporal artifacts if you animate the Z coordinate for 3D noise mapped onto a 2D surface. I prefer the look of cubic interpolation of 0…1 noise values. If you use tables to create that type of noise, you either end up with very large tables (with lots of cache hits) or a very small repetition stride.

In a perfect world, we would have the choice of several 1, 2, 3 and 4 dimensional noise functions. We would also have open source CPU implementations of each function so we can precisely match it if we don’t have access to a GPU. Of course the latter comes with a cost: A CPU implementation of a GPU noise function would be slower than a well optimized CPU only noise function. For now I would be very happy if we just had a hardware implementation of one or more cryptographic hash functions. This would allow us to write our own noise functions which should be faster than if they were implemented with textures. The cryptographic hash function is just a series of shifts, adds and xors. I am not a hardware guy but I suspect that it should be less complex to implement in hardware as an instruction than some of the existing floating point math instructions.

-Mark Granger