Does PTX support double sin() and cos()?

I have a complex number class for device functions and I use sin() and cos() in it. It has been working well.
Recently, I have to use ptx. After compiling to ptx, I have runtime error

“caught exception: defs/uses not defined for PTX instruction”

It is found that the error comes from sin() and cos() in device function. If I switch to sinf() and cosf(), everything is ok.

Why sin(double) or cos(double) works well in .cu device functions but cannot be compiled to ptx?

PTX is a virtual instruction set that exposes little beyond instructions supported by GPU hardware. There are some exceptions for operations that are commonly present as instructions on other compute platforms, such as integer and floating-point division which are instructions at the PTX level, but really implemented as emulation routines “under the hood”.

GPU hardware provides minimal hardware support for the following higher single-precision operations: reciprocal, reciprocal square root, sine, cosine, exponentiation base 2, logarithm base 2. These are exposed via PTX. CUDA offers some device function intrinsics [such as__sinf(), __cosf()] which are thin wrappers around these PTX instructions. If CUDA code is built with -use_fast_math, some math library functions [such as sinf() and cosf()] are mapped automatically to the corresponding intrinsic. From your description above it sound slike this is how you may be building your code?

You can find the supported PTX instructions in the document ptx_isa_4.1.pdf that ships with CUDA. For your purposes, you would want to consult section 8.7.3 Floating-point instructions. For example, the PTX instruction “sin” is described in sub-section 8.7.3.18 with the following synopsis:

sin.approx{.ftz}.f32 d, a;

As can be seen, there is no double-precision version of this instruction (since no such hardware instruction exists in the GPU).

Generally, the single-precision hardware implementations mentioned above are very high performance but “quick & dirty” since they were designed for use in graphics. Comprehensive math libraries for general computation obviously require many more functions and also typically need higher accuracy and better special case handling as prescribed by the IEEE-754 floating-point standard and the ISO C/C+ standards. Note also that the hardware does not provide any kind of higher double-precision operations.

Like just about any other computing platform including x86 and ARM, CUDA therefore ships with a math library that sits on top of the assembly language level (i.e. upstream of PTX) in the software stack. In CUDA 6.5, the math library is provided as part of a device library. The documentation for this device library resides in a file called libdevice-users-guide.pdf that ships with CUDA. The actual code is in multiple files libdevice.compute_??.??.bc. Best I know these libraries are usable by tool chains other than CUDA and I believe there is at least one project which makes use of that.

Here is a presentation from GTC 2013 that shows how GPU compilers are structured. On slide 11 it is shown where the contents of libdevice enters the flow inside the tool chain, well before the PTX assembly code is generated:

[url]http://on-demand.gputechconf.com/gtc/2013/presentations/S3185-Building-GPU-Compilers-libNVVM.pdf[/url]

Thanks for your detailed explanation. Yes, I built with --use_fast_math because otherwise I would have runtime error

OptiX Error: Invalid value (Details: Function “RTresult _rtProgramCreateFromPTXFile(RTcontext, const char*, const char*, RTprogram_api**)” caught exception: defs/uses not defined for PTX instruction (Try --use_fast_math): madc.hi, [1310866])

The problem is even after I have --use_fast_math, I have to change all the sin() cos() to sinf() cosf() (but not necessary to change exp()) to avoid the same Error. However, these functions sin() and cos() work well in my device code when I built without generating ptx.

I am unable to provide a diagnosis based on the scant information. The observation in the last paragraph would appear to be due to the following:

Single-precision math functions sinf() and cosf() normally map [like double-precision sin(), cos()] to functions in libdevice. But when you compile with -use_fast_math, they are replaced by the intrinsics __sinf() and __cosf(), which map directly to the PTX instructions sin.ftz.f32 and cos.ftz.f32. As a consequence the resulting PTX would compile even without libdevice being part of the build.

[Later:] As for MADC.HI, check your architecture target and PTX version number. My memory is hazy, but this instruction was only added three years ago or so. How the error relates to -use_fast_math, I have no clue.

Please always provide the OptiX version you’re having issues with.

There have been fixes around the use of madc.hi inside OptiX 3.6.3, see
[url]https://devtalk.nvidia.com/default/topic/779816/optix/optix-3-6-3-released/[/url]
“Bug fix Double-precision cosine and other transcendentals now work properly, although ray tracing internals are still single-precision.”

If that doesnt’ help, please provide a minimal reproducer in failing state to the OptiX team including the following information:
OS version, OS bitness, OptiX version, CUDA toolkit version, installed GPU(s), display driver version.