Hi all,
I’ve been working on a new parallel pseudo-random number generator (PRNG) designed for high-performance Monte Carlo workloads on GPU and CPU, and I would greatly appreciate feedback from the cuRAND / HPC engineers.
Context / setup
- Generator name: `PRNG_MONTMORY_CTACM`
- Architectures tested:
• Apple Silicon GPU (Metal backend)
• NVIDIA A10 (CUDA backend – OVH cloud)
- Test suite: **TestU01 v1.2.3**
- Batteries: **SmallCrush, Crush, BigCrush**
- BigCrush configuration:
• 8192 threads × 16384 draws
• 160 / 160 tests passed
• No anomalies reported
• Fully reproducible runs across seeds
On both platforms (Metal and CUDA), the generator passes *all* SmallCrush / Crush / BigCrush tests and produces consistent behavior across architectures.
The design is:
- massively parallel,
- deterministic per lane,
- bit-for-bit reproducible across CPU / Metal / CUDA.
I’m *not* trying to replace cuRAND or share proprietary code here
My only goals are to:
1. Get expert feedback on whether this class of generator could be of interest as a future optional engine for cuRAND (or for GPU Monte Carlo workflows), and
2. Know if there is a recommended technical contact or process inside NVIDIA for discussing PRNG research.
I can provide (privately if needed):
- full BigCrush logs (CUDA + Metal),
- all SmallCrush / Crush logs,
- performance benchmarks vs Philox / XORWOW,
- a minimal reproducible harness.
If this is not the right place for such a topic, any redirection to the appropriate NVIDIA team or contact would be very helpful.
Thanks a lot in advance for your time and guidance.
Pascal