This is MRG32k3a in particular, from Pierre L’Ecuyer, “Good Parameters and Implementations for Combined Multiple Recursive Random Number Generators,” *Operations Research* 47(1):159-164. The original code in the paper specifies:

```
p1 = a12 * s11 - a13n * s10;
k = p1 / m1;
p1 -= k * m1;
```

which is basically the modulo computation OP is inquiring about.

The key observation here is that `m1 = 4294967087 = 0xFFFFFF2F = (0x100000000 - 209)`

. It’s been too long ago that I worked with MRG32k3a to reliably connect the rest of the dots off the top of my head. I think the basic idea is that

m1 = 2^{32} - x

t0 = (y div 2^{32}) * x

t1 = y mod 2^{32}

t2 = if (t1 ≥ (m1 - t0)) then m1; else 0

y mod m1 = t0 + t1 - t2

where here we have x = 209. Voila, modulo computation without expensive division and using just one 32-bit multiplication. This works for m1, t0, t1, t2 all unsigned 32-bit integers and unsigned 64-bit y in [0, 0x013991c1ffffffcc]. Since this is a modulo computation, y ≥ 0 is easily assured despite the subtraction in the original code by adding a sufficiently large multiple of m1, which is what we see reflected in the code posted by OP. Depending on specific constraints, further simplifications might apply.

BTW, I wonder where you found this code, because the source code of CURAND is not publicly available to my knowledge. Is this reverse engineered from machine code disassembly?