Here is a bit more insight, now that you have clarified the question:
From that link above I got found the fdlibm sources link:
I should mention that if you choose to go that route (to build a library from scratch in CUDA), it is no small feat. There might be other libraries that are cleaner and not as bloated and better suited for porting to CUDA, that is just the first one I found that had source code.
My solution to this would be… since pretty much any of the trig function real/imaginary parts can be derived from complex exponentials (exp, expf) or other functions (sin,cos)… just use the provided CUDA functions, and use your complex wrapper function to calculate each real and imaginary argument separately. I don’t think you’d want to re-invent the wheel when CUDA already provides device functions that are already tested and work. Just adapt your wrapper class to calculate the arguments independently and return a CComplex struct/output or however you do it.
It’s very much possible that you could optimize single functions further from the method I have just described, building them up from a ‘from scratch’ implementation from some library sources… but I really think that would not be the most productive thing to do.
Hopefully the fdlibm library sources give you an idea of how to use the CUDA native sin,cos,exp, etc to build the functions in your template that would work on complex numbers… that would be the ideal way to do it, I think.