How to convert floats into halfs on NVidia's implementation?

The following kernel works on Apple’s implementation:

__kernel void ConvertHalf(__global half *dst, __global const float *src)


  size_t globalIdx = get_global_id(0);

  dst[globalIdx] = (half)src[globalIdx];


On the NVidia implementation I get the following:

So what is the prescribed way to convert floats into halfs?

Note, I understand that cl_khr_fp16 extension is not supported. But half is still supported as a “storage format” in the core OpenCL 1.0 specification.…l_khr_fp16.html


I’ve run into the same problem just recently: due to memory limitations I have to use half precision floats in my OpenCL app. I was trying to use the “half” type in my kernel, but pretty soon I realized that it’s not really supported (on NVidia hardware, with the current drivers at least). Just by browsing through the OpenCL docu I noticed that they’re mentioning the IEEE754 as the floating point format descriptor. On wikipedia they describe pretty well what’s the story with that:

plus there’s a link to a C code which does all the conversions for you:

Basically I customized this code to run on the GPU, and I’m using it to convert my 32bit floats to 16bit floats (using singles2halfp(…) method), and I’m storing those into an “ushort” array, which could be then read / written, transferred back and forth between CPU-GPU and so on. When I want to access the values I have to convert them back to float32 first (using the halfp2singles(…) function). I know this could be ridiculously slow, un-optimal etc. but it solved my problem pretty well :-)

I hope that in the future OpenCL versions there will be proper support for half precision floats and I won’t have to tweak the stuff like this.

Hope this helped :D