There are 3 forms of division instructions for 1.3 compute devices:
div.approx.f32 - described as a fast division using the reciprocal
div.full.f32 - this is the default division, described as a full-range approximate division that scales the operands to retain accuracy
div.rn.f32 - IEEE compliant division , slow
Is div.full.f32 also a reciprocal style division? Is it simply div.approx but with the scaling of operands? Or something different?