Does static_cast can support non cpp native data types in cuda code? such as half, bhalf.
What’s the difference between using static_cast and cuda Intrinsics, CUDA Math API :: CUDA Toolkit Documentation
Does static_cast can support non cpp native data types in cuda code? such as half, bhalf.
What’s the difference between using static_cast and cuda Intrinsics, CUDA Math API :: CUDA Toolkit Documentation
Did you try it? it seems to work for me in CUDA 12.2 or newer. eg. casting float
to half
or vice-versa.
My guess would be (for those examples) that static_cast and cuda instrinsics would compile to approximately the same SASS code.
example:
# cat t265.cu
#include <cuda_fp16.h>
__global__ void k(float a, half *b){
#ifdef USE_INTRIN
*b = __float2half(a);
#else
*b = static_cast<half>(a);
#endif
}
# nvcc -arch=sm_70 -c t265.cu
# cuobjdump -sass ./t265.o
Fatbin elf code:
================
arch = sm_70
code version = [1,7]
host = linux
compile_size = 64bit
code for sm_70
Function : _Z1kfP6__half
.headerflags @"EF_CUDA_TEXMODE_UNIFIED EF_CUDA_64BIT_ADDRESS EF_CUDA_SM70 EF_CUDA_VIRTUAL_SM(EF_CUDA_SM70)"
/*0000*/ MOV R1, c[0x0][0x28] ; /* 0x00000a0000017a02 */
/* 0x000fc40000000f00 */
/*0010*/ @!PT SHFL.IDX PT, RZ, RZ, RZ, RZ ; /* 0x000000fffffff389 */
/* 0x000fe200000e00ff */
/*0020*/ F2F.F16.F32 R5, c[0x0][0x160] ; /* 0x0000580000057b04 */
/* 0x000e220000200800 */
/*0030*/ MOV R2, c[0x0][0x168] ; /* 0x00005a0000027a02 */
/* 0x000fe40000000f00 */
/*0040*/ MOV R3, c[0x0][0x16c] ; /* 0x00005b0000037a02 */
/* 0x000fd00000000f00 */
/*0050*/ STG.E.U16.SYS [R2], R5 ; /* 0x0000000502007386 */
/* 0x001fe2000010e500 */
/*0060*/ EXIT ; /* 0x000000000000794d */
/* 0x000fea0003800000 */
/*0070*/ BRA 0x70; /* 0xfffffff000007947 */
/* 0x000fc0000383ffff */
..........
Fatbin ptx code:
================
arch = sm_70
code version = [8,2]
host = linux
compile_size = 64bit
compressed
# nvcc -arch=sm_70 -c t265.cu -DUSE_INTRIN
# cuobjdump -sass ./t265.o
Fatbin elf code:
================
arch = sm_70
code version = [1,7]
host = linux
compile_size = 64bit
code for sm_70
Function : _Z1kfP6__half
.headerflags @"EF_CUDA_TEXMODE_UNIFIED EF_CUDA_64BIT_ADDRESS EF_CUDA_SM70 EF_CUDA_VIRTUAL_SM(EF_CUDA_SM70)"
/*0000*/ MOV R1, c[0x0][0x28] ; /* 0x00000a0000017a02 */
/* 0x000fc40000000f00 */
/*0010*/ @!PT SHFL.IDX PT, RZ, RZ, RZ, RZ ; /* 0x000000fffffff389 */
/* 0x000fe200000e00ff */
/*0020*/ F2F.F16.F32 R5, c[0x0][0x160] ; /* 0x0000580000057b04 */
/* 0x000e220000200800 */
/*0030*/ MOV R2, c[0x0][0x168] ; /* 0x00005a0000027a02 */
/* 0x000fe40000000f00 */
/*0040*/ MOV R3, c[0x0][0x16c] ; /* 0x00005b0000037a02 */
/* 0x000fd00000000f00 */
/*0050*/ STG.E.U16.SYS [R2], R5 ; /* 0x0000000502007386 */
/* 0x001fe2000010e500 */
/*0060*/ EXIT ; /* 0x000000000000794d */
/* 0x000fea0003800000 */
/*0070*/ BRA 0x70; /* 0xfffffff000007947 */
/* 0x000fc0000383ffff */
..........
Fatbin ptx code:
================
arch = sm_70
code version = [8,2]
host = linux
compile_size = 64bit
compressed
#
Note that this is not full constexpr
evaluation. For that, this may be of interest as a roadmap.