Is there any way to include ptx code directly into the kernel code…

Say, that if i wanted to use a specifically genrated ptx for some part of the kernel (device functions), can this be directly included as ptx into kernel code…

is there anyway to do this ??

Basically my problem is that - i want to use fasthmath functions (div.approx, rcp.approx, sqrt.approx) for only parts of my kernel…
so, the only way i could come across to do this, was directly change the ptx…
is there any other/better way to do this ??

It’s exactly like tera stated: -use_fast_math simply maps some of the regular math library functions to the corresponding device function, e.g. sinf -> __sinf, expf --> __expf. I usually recommend specifically invoking these device functions where needed (i.e. crucial for speed), rather than switching all instances by passing -use_fast_math. Note that for sm_2x targets -use_fast_math also implies -ftz=true -prec-div=false -prec-sqrt=false.

It’s exactly like tera stated: -use_fast_math simply maps some of the regular math library functions to the corresponding device function, e.g. sinf -> __sinf, expf --> __expf. I usually recommend specifically invoking these device functions where needed (i.e. crucial for speed), rather than switching all instances by passing -use_fast_math. Note that for sm_2x targets -use_fast_math also implies -ftz=true -prec-div=false -prec-sqrt=false.

Control via the compiler flags provides compilation-unit granularity. By passing -prec-div=false -prec-sqrt=false on the compiler command line and using the device functions __fdiv_rn(), __frcp_rn(), and__fsqrt_rn() inside the code, programmers can select between approximate and IEEE-rounded operations on an operation by operation basis.

Control via the compiler flags provides compilation-unit granularity. By passing -prec-div=false -prec-sqrt=false on the compiler command line and using the device functions __fdiv_rn(), __frcp_rn(), and__fsqrt_rn() inside the code, programmers can select between approximate and IEEE-rounded operations on an operation by operation basis.