Does static_cast can support half data type?

larrytech · August 21, 2024, 2:46am

Does static_cast can support non cpp native data types in cuda code? such as half, bhalf.

What’s the difference between using static_cast and cuda Intrinsics, CUDA Math API :: CUDA Toolkit Documentation

Robert_Crovella · August 24, 2024, 9:48pm

Did you try it? it seems to work for me in CUDA 12.2 or newer. eg. casting float to half or vice-versa.

My guess would be (for those examples) that static_cast and cuda instrinsics would compile to approximately the same SASS code.

example:

# cat t265.cu
#include <cuda_fp16.h>
__global__ void k(float a, half *b){

#ifdef USE_INTRIN
  *b = __float2half(a);
#else
  *b = static_cast<half>(a);
#endif
}
# nvcc -arch=sm_70 -c t265.cu
# cuobjdump -sass ./t265.o

Fatbin elf code:
================
arch = sm_70
code version = [1,7]
host = linux
compile_size = 64bit

        code for sm_70
                Function : _Z1kfP6__half
        .headerflags    @"EF_CUDA_TEXMODE_UNIFIED EF_CUDA_64BIT_ADDRESS EF_CUDA_SM70 EF_CUDA_VIRTUAL_SM(EF_CUDA_SM70)"
        /*0000*/                   MOV R1, c[0x0][0x28] ;              /* 0x00000a0000017a02 */
                                                                       /* 0x000fc40000000f00 */
        /*0010*/              @!PT SHFL.IDX PT, RZ, RZ, RZ, RZ ;       /* 0x000000fffffff389 */
                                                                       /* 0x000fe200000e00ff */
        /*0020*/                   F2F.F16.F32 R5, c[0x0][0x160] ;     /* 0x0000580000057b04 */
                                                                       /* 0x000e220000200800 */
        /*0030*/                   MOV R2, c[0x0][0x168] ;             /* 0x00005a0000027a02 */
                                                                       /* 0x000fe40000000f00 */
        /*0040*/                   MOV R3, c[0x0][0x16c] ;             /* 0x00005b0000037a02 */
                                                                       /* 0x000fd00000000f00 */
        /*0050*/                   STG.E.U16.SYS [R2], R5 ;            /* 0x0000000502007386 */
                                                                       /* 0x001fe2000010e500 */
        /*0060*/                   EXIT ;                              /* 0x000000000000794d */
                                                                       /* 0x000fea0003800000 */
        /*0070*/                   BRA 0x70;                           /* 0xfffffff000007947 */
                                                                       /* 0x000fc0000383ffff */
                ..........



Fatbin ptx code:
================
arch = sm_70
code version = [8,2]
host = linux
compile_size = 64bit
compressed
# nvcc -arch=sm_70 -c t265.cu -DUSE_INTRIN
# cuobjdump -sass ./t265.o

Fatbin elf code:
================
arch = sm_70
code version = [1,7]
host = linux
compile_size = 64bit

        code for sm_70
                Function : _Z1kfP6__half
        .headerflags    @"EF_CUDA_TEXMODE_UNIFIED EF_CUDA_64BIT_ADDRESS EF_CUDA_SM70 EF_CUDA_VIRTUAL_SM(EF_CUDA_SM70)"
        /*0000*/                   MOV R1, c[0x0][0x28] ;              /* 0x00000a0000017a02 */
                                                                       /* 0x000fc40000000f00 */
        /*0010*/              @!PT SHFL.IDX PT, RZ, RZ, RZ, RZ ;       /* 0x000000fffffff389 */
                                                                       /* 0x000fe200000e00ff */
        /*0020*/                   F2F.F16.F32 R5, c[0x0][0x160] ;     /* 0x0000580000057b04 */
                                                                       /* 0x000e220000200800 */
        /*0030*/                   MOV R2, c[0x0][0x168] ;             /* 0x00005a0000027a02 */
                                                                       /* 0x000fe40000000f00 */
        /*0040*/                   MOV R3, c[0x0][0x16c] ;             /* 0x00005b0000037a02 */
                                                                       /* 0x000fd00000000f00 */
        /*0050*/                   STG.E.U16.SYS [R2], R5 ;            /* 0x0000000502007386 */
                                                                       /* 0x001fe2000010e500 */
        /*0060*/                   EXIT ;                              /* 0x000000000000794d */
                                                                       /* 0x000fea0003800000 */
        /*0070*/                   BRA 0x70;                           /* 0xfffffff000007947 */
                                                                       /* 0x000fc0000383ffff */
                ..........



Fatbin ptx code:
================
arch = sm_70
code version = [8,2]
host = linux
compile_size = 64bit
compressed
#

Note that this is not full constexpr evaluation. For that, this may be of interest as a roadmap.

Topic		Replies	Views
error when trying to use half (fp16) CUDA Programming and Performance	16	20137	October 13, 2015
Identifier "__HALF2_TO_UI" is undefined when using asm for cuda CUDA Programming and Performance cuda	18	556	October 27, 2023
create a fp16(half) value directly CUDA Programming and Performance	2	1463	April 14, 2016
__half and standard operators + * / - CUDA Programming and Performance	5	515	February 7, 2023
half/Half2 constants CUDA Programming and Performance	8	178	December 11, 2024
How to cuda half and half functions CUDA Programming and Performance	5	4062	January 10, 2019
'half' datatype - IEEE 754 conformance CUDA Programming and Performance	23	10954	March 10, 2017
Load and store half-floats from device memory. How to shift those bits correctly... CUDA Programming and Performance	8	2468	May 28, 2009
CUDA type casting unexpected behaviour CUDA Programming and Performance	2	2403	February 2, 2013
half2 constants CUDA Programming and Performance	0	883	July 1, 2019

Does static_cast can support half data type?

Related topics