Turkish Sieve: Ultra-Efficient GPU Twin/Cousin Prime Sieve with N/6 Bit Structure – 50% Less Memory + 33% Speedup from Mirror Symmetry Pattern

Hi NVIDIA CUDA community,I’d like to share my new project: Turkish Sieve Engine (TSE v1.0.0) – a fully deterministic sieve designed for scanning twin (p, p+2) and cousin (p, p+4) primes at large scales (up to 10¹⁴ tested). Key innovations:

  • N/6 bit data structure → ~50% less memory than typical N/3 bit approaches (e.g., ~1.1 GB VRAM for 10¹⁴ range on RTX 3070).

  • Replaces modular arithmetic with simple integer additions + bitwise ops → low register pressure and branch divergence on GPU.

  • Integrated mirror symmetry 35-bit pattern → Just added this yesterday, resulting in a 33% throughput speedup (RTX 3070 hits ~339 billion candidates/sec in 10¹² range).

  • CUDA kernel optimizations + OpenMP support; ready-to-run Windows exe included.

Benchmarks highlights:

  • RTX 3070: ~135 billion twin prime pairs processed in ~38 minutes at 10¹⁴.

  • GPU speedup: Up to ~11× over CPU-only mode.

Full open-source code:
https://github.com/bilgisofttr/turkishsieveTheoretical foundations (deterministic patterns, absolute mirror symmetry, twin prime candidate continuity proofs) in Zenodo preprints:
https://zenodo.org/records/18038661
https://zenodo.org/records/18004889Looking forward to your feedback and questions:

  • Ideas to push beyond memory bandwidth limits at higher ranges?

  • Suggestions for scaling the pattern to larger primorial cycles (e.g., 11#, 13#)?

  • Anyone profiled similar kernels with Nsight Compute? Happy to compare!

If you have an RTX 3070 or similar, feel free to run the exe and share your logs/benchmarks – I’ll shoutout the top results! Thanks in advance!

(Turkish Sieve Engine CEO).