What GPU? What’s the value of delta_r01? I will assume you are running on a Fermi or Kepler class GPU. The compiler very likely generates code using two FFMAs (single-precision fused multiply-adds) for the latter part of the computation, that is,
[expr] * fmaf (fmaf (9.333334f, -delta_r01, 6.0f), -delta_r01 , 2.0f)
If either of the two products is close to the corresponding constant, but of opposite sign (meaning delta_r01 is positive) there will be subtractive cancellation, followed by renormalization. On the CPU, where the product is computed to single precision, the bits shifted in on the right will be zero, but on the GPU where all product bits are retained inside the FMA, lower order bits of the product will be shifted in on the right. The closer the product to the constant, the bigger the difference will be.
If you look at the bit pattern of the intermediate result, and see trailing zeros in the CPU result, but non-zero trailing bits in the GPU result, that would be a good indication that my working hypothesis is correct, and in that case the GPU delivers the more accurate result thanks to FMA.
If your turn off FMA generation with -fmad=false, do the results match between CPU and GPU?
I would suggest reading the following whitepaper and also the references it cites: