Hi NVIDIA CUDA community,I’d like to share my new project: Turkish Sieve Engine (TSE v1.0.0) – a fully deterministic sieve designed for scanning twin (p, p+2) and cousin (p, p+4) primes at large scales (up to 10¹⁴ tested). Key innovations:
-
N/6 bit data structure → ~50% less memory than typical N/3 bit approaches (e.g., ~1.1 GB VRAM for 10¹⁴ range on RTX 3070).
-
Replaces modular arithmetic with simple integer additions + bitwise ops → low register pressure and branch divergence on GPU.
-
Integrated mirror symmetry 35-bit pattern → Just added this yesterday, resulting in a 33% throughput speedup (RTX 3070 hits ~339 billion candidates/sec in 10¹² range).
-
CUDA kernel optimizations + OpenMP support; ready-to-run Windows exe included.
Benchmarks highlights:
-
RTX 3070: ~135 billion twin prime pairs processed in ~38 minutes at 10¹⁴.
-
GPU speedup: Up to ~11× over CPU-only mode.
Full open-source code:
https://github.com/bilgisofttr/turkishsieveTheoretical foundations (deterministic patterns, absolute mirror symmetry, twin prime candidate continuity proofs) in Zenodo preprints:
https://zenodo.org/records/18038661
https://zenodo.org/records/18004889Looking forward to your feedback and questions:
-
Ideas to push beyond memory bandwidth limits at higher ranges?
-
Suggestions for scaling the pattern to larger primorial cycles (e.g., 11#, 13#)?
-
Anyone profiled similar kernels with Nsight Compute? Happy to compare!
If you have an RTX 3070 or similar, feel free to run the exe and share your logs/benchmarks – I’ll shoutout the top results! Thanks in advance!
(Turkish Sieve Engine CEO).