Meaning of CL_DEVICE_SINGLE_FP_CONFIG in absence of cl_khr_select_fprounding_mode extension

Section 9.4 of the latest version of the OpenCL 1.0 spec (10/6/09) states:

OpenCL 1.0 adds support for specifying the rounding mode for an instruction or group of

instructions in the program source as an optional extension. An application that wants to use this

feature will need to include the

#pragma OPENCL EXTENSION cl_khr_select_fprounding_mode : enable directive.

If the cl_khr_select_fprounding_mode extension is supported, the OpenCL implementation

must support all four rounding modes for single precision floating-point i.e. the

CL_DEVICE_SINGLE_FP_CONFIG described in table 4.3 must include

CL_FP_ROUND_TO_ZERO and CL_FP_ROUND_TO_INF. This is already the case for double

precision floating-point.

The appropriate rounding mode can be specified using the following pragma in the program

source.

#pragma OPENCL SELECT_ROUNDING_MODE rounding-mode

The rounding-mode-value can be one of the following values:

rte

rtz

rtp

rtn

What does it mean if your card (e.g. GTX 285) doesn’t support this extension, BUT

does have the support for all four rounding modes? I see all four specified for

CL_DEVICE_SINGLE_FP_CONFIG when I run oclDeviceQuery.

Currently my take on this is that I can’t control which rounding mode is used

(e.g. when I add two floats together). The only sense I can make out

of this rounding mode support is that it must have something to do with the

“convert_” operator (also discussed in the spec).

My attempts to select the rounding mode (despite the apparent lack of support

for this extension) did nothing.

In CUDA, with the same card, I have modified the SDK example VectorAdd to prove

to myself that I can do addition of floating point numbers with arbitrary

rounding mode using the device functions (e.g. __fadd_ru(), etc.) and get

an exact match for what I do on the CPU (using fesetround() defined in the ISO

C standard).

I guess I just need to buy a 2.x compute capability GPU to get the

cl_khr_select_fprounding extension so that I can do in OpenCL what I can

already do in CUDA.

Can anyone confirm any of these guesses?

Thanks.

Chris

I should have mentioned that I got the CUDA code to work after reading the following

topic on “Precision and rounding in floating point registers” in the CUDA

Programming and Development Forum.

http://forums.nvidia.com/index.php?showtopic=181349

Thanks again.

Chris

Here is the sample code to demonstrate the issue.

Can someone who has a GPU that implements the cl_khr_select_fprounding_mode extension tell me if this code
runs properly on that GPU?

This code mimics the SDK oclVectorAdd example. If you put it in a directory named
~/NVIDIA_GPU_Computing_SDK/OpenCL/src/oclVectorSub
it should compile.

Then just change the call to fesetround() in verifyResults() to try the four different rounding modes.
Depending on which mode you select, different kernels should pass/fail.

In my case, all four cases behave exactly the same because the pragmas in VectorSub.cl are ignored.

BTW, I’m using the nvidia drivers. Is it possible that this extension is supported under the AMD drivers for
the GTX 285? Does anyone know?

I can post my CUDA code (which does exactly the same thing as this, but actually works) if that will clarify matters. It uses the calls __fadd_rn(), __fadd_rz(), __fadd_rd() and __fadd_ru().

Thanks.

Chris

My kernel file is:

#pragma OPENCL EXTENSION cl_khr_select_fprounding_mode : enable
__kernel void VectorSub(__global const float* a, __global const float* b, __global float* c, int iNumElements)
{
int iGID = get_global_id(0);
if (iGID >= iNumElements) return;
c[iGID] = a[iGID] - b[iGID];
}

__kernel void VectorSub_re(__global const float* a, __global const float* b, __global float* c, int iNumElements)
{
int iGID = get_global_id(0);
if (iGID >= iNumElements) return;
#pragma OPENCL SELECT_ROUNDING_MODE rte
c[iGID] = a[iGID] - b[iGID];
}

__kernel void VectorSub_rz(__global const float* a, __global const float* b, __global float* c, int iNumElements)
{
int iGID = get_global_id(0);
if (iGID >= iNumElements) return;
#pragma OPENCL SELECT_ROUNDING_MODE rtz
c[iGID] = a[iGID] - b[iGID];
}

__kernel void VectorSub_rp(__global const float* a, __global const float* b, __global float* c, int iNumElements)
{
int iGID = get_global_id(0);
if (iGID >= iNumElements) return;
#pragma OPENCL SELECT_ROUNDING_MODE rtp
c[iGID] = a[iGID] - b[iGID];
}

__kernel void VectorSub_rn(__global const float* a, __global const float* b, __global float* c, int iNumElements)
{
int iGID = get_global_id(0);
if (iGID >= iNumElements) return;
#pragma OPENCL SELECT_ROUNDING_MODE rtn
c[iGID] = a[iGID] - b[iGID];
}
oclVectorSub.cpp (16 KB)