ptx in kernel code

NCC-1701D · September 3, 2010, 9:26am

Is there any way to include ptx code directly into the kernel code…

Say, that if i wanted to use a specifically genrated ptx for some part of the kernel (device functions), can this be directly included as ptx into kernel code…

is there anyway to do this ??

Basically my problem is that - i want to use fasthmath functions (div.approx, rcp.approx, sqrt.approx) for only parts of my kernel…
so, the only way i could come across to do this, was directly change the ptx…
is there any other/better way to do this ??

thanks

tera · September 3, 2010, 10:00am

The fastmath functions are available with a “__” prefix. Check appendix C.2 of the Programming Guide.

tera · September 3, 2010, 10:00am

The fastmath functions are available with a “__” prefix. Check appendix C.2 of the Programming Guide.

NCC-1701D · September 3, 2010, 10:54am

my understanding was the __ prefixed functions are device intrinsics and not necessarily the fastmath functions ?

NCC-1701D · September 3, 2010, 10:54am

my understanding was the __ prefixed functions are device intrinsics and not necessarily the fastmath functions ?

njuffa · September 3, 2010, 11:03am

It’s exactly like tera stated: -use_fast_math simply maps some of the regular math library functions to the corresponding device function, e.g. sinf → __sinf, expf → __expf. I usually recommend specifically invoking these device functions where needed (i.e. crucial for speed), rather than switching all instances by passing -use_fast_math. Note that for sm_2x targets -use_fast_math also implies -ftz=true -prec-div=false -prec-sqrt=false.

njuffa · September 3, 2010, 11:03am

It’s exactly like tera stated: -use_fast_math simply maps some of the regular math library functions to the corresponding device function, e.g. sinf → __sinf, expf → __expf. I usually recommend specifically invoking these device functions where needed (i.e. crucial for speed), rather than switching all instances by passing -use_fast_math. Note that for sm_2x targets -use_fast_math also implies -ftz=true -prec-div=false -prec-sqrt=false.

NCC-1701D · September 3, 2010, 11:59am

thanks for the info - njuffa

another quick follow question, is there any way to turn on function with -prec-div=true and -prec-sqrt=true, for only some parts of the kernel ?

NCC-1701D · September 3, 2010, 11:59am

thanks for the info - njuffa

another quick follow question, is there any way to turn on function with -prec-div=true and -prec-sqrt=true, for only some parts of the kernel ?

njuffa · September 3, 2010, 5:48pm

Control via the compiler flags provides compilation-unit granularity. By passing -prec-div=false -prec-sqrt=false on the compiler command line and using the device functions __fdiv_rn(), __frcp_rn(), and__fsqrt_rn() inside the code, programmers can select between approximate and IEEE-rounded operations on an operation by operation basis.

njuffa · September 3, 2010, 5:48pm

Control via the compiler flags provides compilation-unit granularity. By passing -prec-div=false -prec-sqrt=false on the compiler command line and using the device functions __fdiv_rn(), __frcp_rn(), and__fsqrt_rn() inside the code, programmers can select between approximate and IEEE-rounded operations on an operation by operation basis.