Half precision reciprocals in OpenCL

stolk · May 17, 2023, 4:42pm

In my OpenCL code, 1.0h / value will work for half precision float values.

But when I use OpenCL’s built in function for that: a = half_recip(b) it will fail compilation, with a and b half precision floats.

The source:

#pragma OPENCL EXTENSION cl_khr_fp16 : enable

__kernel
void foo
(
	__global const half* dst,
	__global half* src
)
{
	const uint index = get_global_id(0);
	dst[index] = half_recip(src[index]);
}

The compiler output:

<kernel>:11:15: error: call to 'native_recip' is ambiguous
        dst[index] = half_recip(src[index]);
                     ^~~~~~~~~~~~~~~~~~~~~~
cl_kernel.h:1182:23: note: expanded from macro 'half_recip'
#define half_recip(x) native_recip(x)
                      ^~~~~~~~~~~~
cl_kernel.h:1164:24: note: candidate function
float __OVERLOADABLE__ native_recip(float);
                       ^
cl_kernel.h:1173:25: note: candidate function
double __OVERLOADABLE__ native_recip(double);
                        ^
cl_kernel.h:1165:25: note: candidate function
float2 __OVERLOADABLE__ native_recip(float2); 
                        ^
cl_kernel.h:1167:25: note: candidate function
float3 __OVERLOADABLE__ native_recip(float3); 
                        ^
cl_kernel.h:1169:25: note: candidate function
float4 __OVERLOADABLE__ native_recip(float4); 
                        ^
cl_kernel.h:1170:25: note: candidate function
float8 __OVERLOADABLE__ native_recip(float8); 
                        ^
cl_kernel.h:1171:26: note: candidate function
float16 __OVERLOADABLE__ native_recip(float16); 
                         ^
cl_kernel.h:1174:26: note: candidate function
double2 __OVERLOADABLE__ native_recip(double2); 
                         ^
cl_kernel.h:1176:26: note: candidate function
double3 __OVERLOADABLE__ native_recip(double3); 
                         ^
cl_kernel.h:1178:26: note: candidate function
double4 __OVERLOADABLE__ native_recip(double4); 
                         ^
cl_kernel.h:1179:26: note: candidate function
double8 __OVERLOADABLE__ native_recip(double8); 
                         ^
cl_kernel.h:1180:27: note: candidate function
double16 __OVERLOADABLE__ native_recip(double16);

This compile error makes no sense: the argument and lvalue are both of type half-precision-float, so it should have no issue whatsoever to select the variant for the overloaded function.

Using Intel or AMD, this OpenCL kernel compiles just fine, but not when using nvidia’s OpenCL.

FULL_PROFILE OpenCL 3.0 CUDA 12.1.68 NVIDIA CUDA NVIDIA Corporation

NVIDIA GeForce RTX 3070 NVIDIA Corporation with [46 units] localmem=48KiB globalmem=7970MiB dims=3(1024x1024x64) max workgrp sz 1024

Robert_Crovella · May 17, 2023, 6:31pm

From what I have seen, NVIDIA doesn’t support the cl_khr_fp16 extension, but I haven’t checked lately (edit: see here). The compiler output here seems to suggest that. There is no indication of a candidate function that the compiler can find that takes a half argument (look through the list - its not there.) Therefore the compiler finds a number of possible alternatives (via type conversion) that it sees as equally usable, therefore the substitution is “ambiguous”. It makes sense to me, anyway, from what I see here.

I’m puzzled by the use of

const on dst but not src, although that doesn’t seem to be at issue here.

stolk · May 17, 2023, 6:40pm

Oops… the const was a typo, when I tried to make a minimal case of my code.

I was confused about the candidate list, I think.

I assumed float16 was ‘half’ or ‘fp16’ but I guess I was wrong, and it is 16 float matrix instead?

I can happily feed it kernel source that uses ‘half’ though. Is that not what the extension does? Make half operations available?

Robert_Crovella · May 17, 2023, 7:07pm

float16 has the same syntax as double16, float8, float4 and float2. They are vector types.

I’m not an expert on what is or isn’t possible with NVIDIA OpenCL. The compiler seems to be making a rational statement to me, however. You’re welcome to file a bug if you think that half_recip should be usable (this way); it might be a bug, or it might just be unsupported.

stolk · May 17, 2023, 7:22pm

Thank you.

I think the following is the case: FP16 operations in OpenCL are possible, both in Nvidia’s and AMD’s implementation.

But both report that they do not support cl_khr_fp16 when queried via clGetDeviceInfo CL_DEVICE_EXTENSIONS.

Intel reports cl_khr_fp16 supported.

I guess the support is partial on nvidia and amd, and therefore not advertised via the device info.

For completeness, these are the extension reported for the rtx3070:

cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_copy_opts cl_nv_create_buffer cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_device_uuid cl_khr_pci_bus_info cl_khr_external_semaphore cl_khr_external_memory cl_khr_external_semaphore_opaque_fd cl_khr_external_memory_opaque_fd

system · May 31, 2023, 7:22pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to convert floats into halfs on NVidia's implementation? CUDA Programming and Performance	1	10226	July 15, 2010
No half precision with OpenCL on Nvidia GTX 1080Ti and Titan V Other Tools	0	1192	March 23, 2020
Declaring double precision constants in kernel? CUDA Programming and Performance	7	7554	July 18, 2009
OpenCL Double Precision Support using Nvidia 1.3 compute hardware CUDA Programming and Performance	8	29849	November 22, 2010
Double precision support CUDA Programming and Performance	5	4424	September 10, 2009
Meaning of CL_DEVICE_SINGLE_FP_CONFIG in absence of cl_khr_select_fprounding_mode extension CUDA Programming and Performance	2	5012	May 27, 2011
CLBlast could not produce FP16 tuning results with NVIDIA GPUs like RTX4090 on Windows Drivers - Linux, Windows, MacOS opencl	3	1107	June 8, 2023
NVIDIA OpenCL SDK deployment so 90ies CUDA Setup and Installation	1	716	November 5, 2016
Cl_khr_fp16 OpenCL support? CUDA Programming and Performance	6	2048	July 26, 2023
OpenCL, double precision, GeForce 8500 GT CUDA Programming and Performance	4	4412	June 11, 2014

Half precision reciprocals in OpenCL

Related topics