How CUDA Fortran map operations to cuda bin code

The native instructions is more effective than the regular one, as pointed out in the CUDA C programming guide. I’m not sure if CUDA Fortran utilize these functions or not, or there is a way that user can choose to use the native instructions, rather the normal one?


Single-Precision Floating-Point Addition and Multiplication Intrinsics
__fadd_r[d,u], __fmul_r[d,u], and __fmaf_r[n,z,d,u] (see Section C.2.1) compile to tens of instructions for devices of compute capability 1.x, but map to a single native instruction for devices of compute capability 2.0.
Single-Precision Floating-Point Division
__fdividef(x, y) (see Section C.2.1) provides faster single-precision floating-point division than the division operator.


Hi Tuan,

The “-Mcuda=nofma” will use the fmul routines in order to avoid Fuse-Multiply-Add operations on the GPU. By default, the compile will use the fdiv for divides and “/” when “-Mcuda=fastmath” is added. For fdivides, I’ll put in a feature request.

  • Mat