ptx in kernel code

Is there any way to include ptx code directly into the kernel code…

Say, that if i wanted to use a specifically genrated ptx for some part of the kernel (device functions), can this be directly included as ptx into kernel code…

is there anyway to do this ??

Basically my problem is that - i want to use fasthmath functions (div.approx, rcp.approx, sqrt.approx) for only parts of my kernel…
so, the only way i could come across to do this, was directly change the ptx…
is there any other/better way to do this ??

thanks

The fastmath functions are available with a “__” prefix. Check appendix C.2 of the Programming Guide.

The fastmath functions are available with a “__” prefix. Check appendix C.2 of the Programming Guide.

my understanding was the __ prefixed functions are device intrinsics and not necessarily the fastmath functions ?

my understanding was the __ prefixed functions are device intrinsics and not necessarily the fastmath functions ?

It’s exactly like tera stated: -use_fast_math simply maps some of the regular math library functions to the corresponding device function, e.g. sinf → __sinf, expf → __expf. I usually recommend specifically invoking these device functions where needed (i.e. crucial for speed), rather than switching all instances by passing -use_fast_math. Note that for sm_2x targets -use_fast_math also implies -ftz=true -prec-div=false -prec-sqrt=false.

It’s exactly like tera stated: -use_fast_math simply maps some of the regular math library functions to the corresponding device function, e.g. sinf → __sinf, expf → __expf. I usually recommend specifically invoking these device functions where needed (i.e. crucial for speed), rather than switching all instances by passing -use_fast_math. Note that for sm_2x targets -use_fast_math also implies -ftz=true -prec-div=false -prec-sqrt=false.

thanks for the info - njuffa

another quick follow question, is there any way to turn on function with -prec-div=true and -prec-sqrt=true, for only some parts of the kernel ?

thanks for the info - njuffa

another quick follow question, is there any way to turn on function with -prec-div=true and -prec-sqrt=true, for only some parts of the kernel ?

Control via the compiler flags provides compilation-unit granularity. By passing -prec-div=false -prec-sqrt=false on the compiler command line and using the device functions __fdiv_rn(), __frcp_rn(), and__fsqrt_rn() inside the code, programmers can select between approximate and IEEE-rounded operations on an operation by operation basis.

Control via the compiler flags provides compilation-unit granularity. By passing -prec-div=false -prec-sqrt=false on the compiler command line and using the device functions __fdiv_rn(), __frcp_rn(), and__fsqrt_rn() inside the code, programmers can select between approximate and IEEE-rounded operations on an operation by operation basis.