I frequently use complex arithmetics in CUDA and need to define my own implementations of, for example, transcendental functions (sin, cos, exp, …) on complex numbers. I want now to definitely solve the problem by writing a library of such functions so to consider a “large” number of cases and avoiding being pit-stopped any time for new implementations.
I’m aware that for some of them, code writing could be non-trivial from the point of view of trade-off between accuracy and computational complexity. Nevertheless, I’m also aware that C++ has its own complex math library.
Is there any possibility to “recycle” already existing C++ solutions for CUDA purposes?
Cam you be more explicit as to why the native functions don’t work for you? Is this for float? double? even higher than double precision? Would something like this work?
[url]http://mplapack.sourceforge.net/[/url]
Mathematical device functions like sin, cos, sinh, atan, log, exp etc. are defined on double precision arguments, if I correctly understand the NVIDIA CUDA Library Documentation:
[url]http://developer.download.nvidia.com/compute/cuda/4_2/rel/toolkit/docs/online/group__CUDA__MATH__DOUBLE_g3ebbca20a2937d1fe51329402880df85.html#g3ebbca20a2937d1fe51329402880df85[/url]
I’m searching for implementations thereof on complex numbers.
Okay, I get it now. Now here’s my suggestion. Why not just keep parallel arrays of the imaginary and real parts? I suppose if this is production code that has some spec that needs some sort of robust complex operations to make it ‘user/coder friendly’ as part of a deliverable then it makes sense. Just throwing some ideas out there.
See: https://devtalk.nvidia.com/default/topic/414672/complex-library-in-cuda/ and http://en.wikipedia.org/wiki/Euler%27s_formula#Relationship_to_trigonometry
For example, here is a complex product function prototype keeping real and imaginary parts separate:
__device__ void ComplexProd(double *a_R, double *a_I, double *b_R, double *b_I,
double *c_R, double *c_I)
/* Calculates c=a*b */
{
*c_R=a_R[0]*b_R[0]-a_I[0]*b_I[0];
*c_I=a_R[0]*b_I[0]+a_I[0]*b_R[0];
}
Thanks for your suggestions. I have already my own implementation of a wrapper complex type class as well as related overloaded operators. I try to better explain myself.
C/C++ has its own implementations of sin, cos, exp etc. on real numbers. They are optimized for accuracy/efficiency according to some numerical schemes (polynomial expansions, etc.).
C/C++ has also complex implementations of those functions. For many of them (e.g., trigonometric functions) the implementations seems rather simple: they are combination of mathematical functions on real argument. If, for example, the source files of those functions on complex arguments would be available, then one could port them to CUDA as easily as adding a device keywork in front. Of course, in this simple scenario invoking sin, cos, exp etc. would mean to invoke native CUDA mathematical functions.
Here is a bit more insight, now that you have clarified the question:
[url]How does C compute sin() and other math functions? - Stack Overflow
From that link above I got found the fdlibm sources link:
[url]http://gnuwin32.sourceforge.net/packages/fdlibm.htm[/url]
I should mention that if you choose to go that route (to build a library from scratch in CUDA), it is no small feat. There might be other libraries that are cleaner and not as bloated and better suited for porting to CUDA, that is just the first one I found that had source code.
My solution to this would be… since pretty much any of the trig function real/imaginary parts can be derived from complex exponentials (exp, expf) or other functions (sin,cos)… just use the provided CUDA functions, and use your complex wrapper function to calculate each real and imaginary argument separately. I don’t think you’d want to re-invent the wheel when CUDA already provides device functions that are already tested and work. Just adapt your wrapper class to calculate the arguments independently and return a CComplex struct/output or however you do it.
It’s very much possible that you could optimize single functions further from the method I have just described, building them up from a ‘from scratch’ implementation from some library sources… but I really think that would not be the most productive thing to do.
Hopefully the fdlibm library sources give you an idea of how to use the CUDA native sin,cos,exp, etc to build the functions in your template that would work on complex numbers… that would be the ideal way to do it, I think.
I am not aware of a package of complex math functions for CUDA C. I believe CUDA Fortran provides the customary complex math in device code, but my experience with CUDA Fortran is limited and does not extend to complex arithmetic. If CUDA Fortran is an option, you may want to investigate that.
As far as support for complex math functions in CUDA C is concerned, consider filing an enhancement request through the bug reporting form linked from the registered developer website. If you decide to do so, please prefix the synopsis (i.e. subject line) with “RFE:” to mark this as a “request for enhancement” rather than a bug.
First of all, let me thank vacaloca for all his efforts.
Concerning the link
[url]A new implementation of C++ std::complex for complex numbers on CUDA - CUDA Programming and Performance - NVIDIA Developer Forums
let me say that I have already implemented something similar.
Regarding
[url]How does C compute sin() and other math functions? - Stack Overflow
my opinion is the same as yours: exploit as much as possible what CUDA already has in terms of mathematical functions and develop only what is needed, especially on complex arguments.
I have taken a look at
[url]http://gnuwin32.sourceforge.net/packages/fdlibm.htm[/url]
and I think it would be a very good starting point to possibly develop new mathematical functions in CUDA if NVIDIA will not make them available in future.
So, finally, thank you very much again.
Let me also thank njuffa for his usually kind answer. I have filed a “request for enhancement” as suggested.