I cannot speak to the quality of this library, but I wonder whether using it as a “golden” reference is justified. In particular, I would be concerned about issues with double rounding, based on the following description on the website you pointed to:
“arithmetic operations are internally rounded to single-precision using the underlying single-precision implementation’s current rounding mode, those values are then converted to half-precision using the default half-precision rounding mode”.
I do not recall the exact details of when double rounding is without problems, as it differs by operation, but seem to recall that it requires > 2*p+2 bits for the wider format for some of them, which would mean > 24 for p=11.
[Later:]
S. A. Figueroa: “When is double rounding innocuous?”. SIGNUM Newsletter 30(3), 21-26 (1995) showed that double rounding is innocuous if q >= 2p for multiplication and division, q >= 2p+1 for addition, and q >= 2p+2 for square root. Using binary32 for binary16 computation we have q=24 and p=11, so these operations would be safe.
However, this may not apply to other operations, such as FMA (fused multiply-add), rsqrt, or integer-to-float conversions. For example, in an analogous case of converting 64-bit integers to binary32, it has been shown that f32_s64(n) != f32_f64(f64_s64(n)) and similarly for u64. [Sylvie Boldo, Jacques-Henri Jourdan, Xavier Leroy, and Guillaume Melquiond: “Verified Compilation of Floating-Point Computations”. Journal of Automated Reasoning 54 (2), 135-163 (2015)].
[Even later:]
Cristina Iordache and David W. Matula: “On Infinitely Precise Rounding for Division, Square Root,
Reciprocal and Square Root Reciprocal”. In Proceedings of the 14th IEEE Symposium on Computer Arithmetic (ARITH-14) , pp. 233–240, states that rsqrt (reciprocal square root) requires q >= 2p+3 to avoid double rounding issues.