Prime Generator for Cuda

A beautifull day,

i have develloped a Cuda application for primes which is really nice.

Results:
constant ca 8.317.200 Primes / sec
on a Cuda graphic card Geforce 650 with 192 Cores

Cuda source code:
http://109.91.184.78/devalco/ulam_31.cu

and mathematical description
http://109.91.184.78/devalco/prime_sieve_ulam_vertical.pdf

Is it possible to speed up the algorithm, i am a beginner with Cuda and there might be some better implementation which are faster.

I use a Geforce 650 Graphic card and it would be nice to get some comparison to other faster Cuda graphic cards.

The implementation has nothing to do with the sieve of Eratosthenes, it is more a derivation of the Sieve of Ulam with some improvements

Nice Greetings from the primes
Bernhard

http://devalco.de