'half' datatype - IEEE 754 conformance

Since half is covered by the IEEE-754 standard ever since the 2008 revision, it stands to reason that two conformant implementations will deliver identical result for identical operations. By “validated” I meant, “How has it been established that the software on SourceForge is conforming to the standard?” By the way, what do you mean by "the GPU ‘near-native’ type?

maybe my posting was little bit misleading.
For every GPU (image processing) routine, we have a ‘golden’ CPU reference code against we compare for.
And, for the CPU reference code we are using the ‘half_float’ class.
We compare all our GPU image processing routines for half (convolutional, arithmetic, …) against the CPU reference code and noticed, on average, only neglible differences (in the order of 10e-5) which are explainable for the convolutional operations by different order of summations, etc.

So that is not a strict validation, but it is enough for us. Actually, we do a full wavelet-based denoising algorithm with many intermediate operations, and the CPU (using ‘half_float’ class) and GPU result are virtually the same.

And the term ‘near-native’ is misleading, i was meaning that i don’t have a native C++ type for ‘half’, so i have to use ‘unsigned short’ as representation in memory.

If I read the CUDA 7.5 release notes correctly, it seems that a proper “half” type has been added, but I haven’t checked it out first hand. What support there is for the ‘half’ type on the GPU itself is definitely in hardware.

My concern with the ‘half’ code on SourceForge is that you are using it as a golden model, although the validation status of that code seems to be unclear. Having written a lot of emulation code in my career, I know that getting emulation right in general can be difficult, although it is easier for ‘half’ since that lends itself to exhaustive test. For example, for the FMA emulation that used to ship with CUDA for sm_1x platforms I did a three-way comparison between my emulation code, Itanium’s FMA, and the GPU’s FMA. Since FMA is impossible to test exhaustively even in the single-precision case, errors may have remained but at least considerable effort went into making sure the emulation worked correctly.

Wanted to bump this thread(almost 2 years old) to see if anything had changed in how one defines and accesses a 3D texture using 16-bit values (will be consumed as floats after the 16-bit load).

In other words with CUDA 8.0 has anything changed in terms of syntax when creating/using a 3D texture which will be populated with 16-bit values?
I will not need to read as 3D interpolated values so maybe there is no benefit to using texture memory.