This is MRG32k3a in particular, from Pierre L’Ecuyer, “Good Parameters and Implementations for Combined Multiple Recursive Random Number Generators,” Operations Research 47(1):159-164. The original code in the paper specifies:
p1 = a12 * s11 - a13n * s10;
k = p1 / m1;
p1 -= k * m1;
which is basically the modulo computation OP is inquiring about.
The key observation here is that
m1 = 4294967087 = 0xFFFFFF2F = (0x100000000 - 209). It’s been too long ago that I worked with MRG32k3a to reliably connect the rest of the dots off the top of my head. I think the basic idea is that
m1 = 232 - x
t0 = (y div 232) * x
t1 = y mod 232
t2 = if (t1 ≥ (m1 - t0)) then m1; else 0
y mod m1 = t0 + t1 - t2
where here we have x = 209. Voila, modulo computation without expensive division and using just one 32-bit multiplication. This works for m1, t0, t1, t2 all unsigned 32-bit integers and unsigned 64-bit y in [0, 0x013991c1ffffffcc]. Since this is a modulo computation, y ≥ 0 is easily assured despite the subtraction in the original code by adding a sufficiently large multiple of m1, which is what we see reflected in the code posted by OP. Depending on specific constraints, further simplifications might apply.
BTW, I wonder where you found this code, because the source code of CURAND is not publicly available to my knowledge. Is this reverse engineered from machine code disassembly?