Benchmark CUDA Code Sieve of Eratosthenes, anyone?

Does anyone know of some benchmark CUDA code. I know that there are some example problems in the CUDA 2.3 distribution, but I was thinking of the Sieve of Eratosthenes. That would be a great bench mark to use!


If you’re looking for a high sieving speed, go for the Sieve of Atkins. There are also alternative methods to generate primes up to a certain number using some really fast primality tests (no sieving required).

If you want to stick with Eratosthenes, there is an entire thread on these forums devoted to the sieve of Eratosthenes. I’ve published some source code that tries to make use of shared memory during sieving, using bit patterns to sieve. Some people have weighed in, benchmarking memory performance for various methods of global memory writes. Eventually we concluded that writing to global memory with byte-wise writes is the fastest technique to sieve.


Are you looking for existing benchmarks suites for CUDA, or are you planning on writing your own benchmark and are looking for good applications?