How to cuda half and half functions

lingchao.zhu · January 9, 2019, 6:45am

I have tested the half type with the code as below:

#include <cuda_runtime.h>
#include <helper_cuda.h>
#include <helper_functions.h>
#include <cuda_fp16.h>

    
__device__ __managed__ u_char A[100];
__device__ __managed__ u_char B[100];
__device__ __managed__ u_char C[100];



__global__ void vector_add_u8()
{
    int idx = threadIdx.x;
    __half a = __float2half((float)(A[idx]));
    __half b = __float2half((float)(B[idx]));
    __half c = __hadd(a, b);

    printf("a = %f, b = %f, c = %f\n", __half2float(a), __half2float(b), __half2float(c));
    C[idx] = (u_char)__half2float(c);
}



int main(int argc, char **argv)
{
    memset(A, 11, 100);
    memset(B, 10, 100);

    dim3 blocks(1);
    dim3 threads(100, 1);
    vector_add_u8<<<blocks, threads>>>();

    checkCudaErrors(cudaDeviceSynchronize());
    checkCudaErrors(cudaGetLastError());

    for(int i = 0; i < 100; i++)
    {
        printf("A[i] + B[i] = C[i]: %d + %d = %d\n", A[0], B[0], C[0]);
    }
    return 1;
}

And get the result:

a = 11.000000, b = 10.000000, c = 10.000000
A[i] + B[i] = C[i]: 11 + 10 = 10

The result is the half sum of a and b and round-to-nearest-even.

I also tried a lot, but the result is always integer.

Is it right?
If so, how can I get the true value?

Thanks very much.

Robert_Crovella · January 9, 2019, 9:11am

Which GPU and CUDA version are you using?
What is your compile command line?

lingchao.zhu · January 9, 2019, 9:20am

I use GTX1070 and cuda 9.0.176
The cmd is nvcc -Xcompiler -fPIC …/src/test.cu -o test

Robert_Crovella · January 9, 2019, 9:24am

try adding -arch=sm_61 to the compile command line

Robert_Crovella · January 9, 2019, 9:44am

From what I can tell, something peculiar is happening here.

There happen to be two different intrinsics with the same name. The floating point __half add function (__hadd):

[url]https://docs.nvidia.com/cuda/cuda-math-api/group__CUDA__MATH____HALF__ARITHMETIC.html#group__CUDA__MATH____HALF__ARITHMETIC_1ga07e44376f11eaa3865163c63372475d[/url]

and an integer intrinsic that has nothing to do with __half datatype, which is also named __hadd:

[url]https://docs.nvidia.com/cuda/cuda-math-api/group__CUDA__MATH__INTRINSIC__INT.html#group__CUDA__MATH__INTRINSIC__INT_1g6acbd5fd1fef78022471b58852e495da[/url]

This function computes an integer average of two integers (add and halve?).

If you compile for an architecture which supports __half arithmetic (cc 5.3 or higher), and include cuda_fp16.h, you get the first one. If you don’t, you get the second one.

It seems that someone else has pointed this out also:

[url]https://github.com/tensorflow/tensorflow/issues/19198[/url]

lingchao.zhu · January 10, 2019, 1:36am

Thanks for your reply.

I have fixed the peculiar by adding -arch=sm_61.

Topic		Replies	Views
__hadd not working correctly CUDA Programming and Performance cuda	3	401	October 19, 2023
half calculation generates incorrect result CUDA Programming and Performance	1	574	October 19, 2019
error when trying to use half (fp16) CUDA Programming and Performance	16	20464	October 13, 2015
__half and standard operators + * / - CUDA Programming and Performance	5	606	February 7, 2023
Precision is be influenced when adopting the __half(fp16) dataType CUDA Programming and Performance cuda , programming	2	474	July 6, 2023
CUDA __half atomicAdd Poor computing time CUDA NVCC Compiler cuda	3	507	February 2, 2024
FP16 add Arithmetic Function Variety CUDA NVCC Compiler	1	695	July 1, 2022
AtomicAdd not overloaded for c10::Half CUDA Programming and Performance cuda	5	3615	March 5, 2022
Half performance on a100 CUDA Programming and Performance	0	539	January 13, 2021
How to use 16 bit floating point (half) data in both .C and .CU files? CUDA Programming and Performance	2	1986	September 29, 2016

How to cuda half and half functions

Related topics