attempt at a CUDA complex maths library working in C++ and CUDA likewise

Hi there,

Please have a look at my attempt at a complex datatype and basic maths operators. The overall aim is to provide a complex data type that is about as easy to use as a float. This code might evolve to something really useful, so check back frequently for updates. This is under a BSD style license so you can use it more or less like public domain.

The code is supposed to work in CUDA kernels in device and host mode as well as in plain old C/C++ functions (e.g. the Gold CPU kernels in the various SDK project templates). I’ve verified that this compiles in both the emulation mode and in the device mode. My compilation tests were on Mac Mini with the MAC OS X CUDA 1.1 SDK installed (No hardware CUDA GPU present apparently). I also tested this on Visual C++ 2005 Express on CUDA SDK 2.0 beta2 and executed it on a CUDA device.

The complex data type is implemented as a struct (not a C++ object), so no constructors are available. I named it cudacomplex in order to not conflict with existing complex types. Neither the built-in C99 _Complex data type nor std::complex are used.

Any comments about this approach? Useful? Could some things be done better? I’m listening…

I wonder if making the complex struct a C++ object would break things? Some folks said you can’t use C++ features in CUDA kernels, so was too scared to try ;)

I am getting some “expression has no effect” warnings on scalar assignments when compiling Debug or Release modes for the device, but scalar assignments do work despite the strange warning.

When assigning the scalar 0 to a complex and you receive an error related to ambiguity of the assignment operator, use .0f instead. that seems to work better.

Possible future enhancements:

-magnitude function (abs for complex numbers)

-conversion to polar representation and back

-add a double precision complex type for compute capability 1.3

-provide some trigonometry functions with complex arguments

-create 2x2 complex matrix types and related maths functions

WARNING and DISCLAIMER: This is NOT currently a replacement for the standardized <complex.h> header file which conforms to ISO/IEC 9899:1999(E). This is also not fully verified for mathematical correctness. Use at your own risk.

/*

* Copyright (c) 2008 Christian Buchner <Christian.Buchner@gmail.com>

* All rights reserved.

*

* Redistribution and use in source and binary forms, with or without

* modification, are permitted provided that the following conditions are met:

*     * Redistributions of source code must retain the above copyright

*       notice, this list of conditions and the following disclaimer.

*     * Redistributions in binary form must reproduce the above copyright

*       notice, this list of conditions and the following disclaimer in the

*       documentation and/or other materials provided with the distribution.

*

* THIS SOFTWARE IS PROVIDED BY Christian Buchner ''AS IS'' AND ANY 

* EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED

* WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE

* DISCLAIMED. IN NO EVENT SHALL Christian Buchner BE LIABLE FOR ANY

* DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES

* (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;

* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND

* ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT

* (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS

* SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

*/

#ifndef CUDACOMPLEX_H

#define CUDACOMPLEX_H

// Depending on whether we're running inside the CUDA compiler, define the __host_

// and __device__ intrinsics, otherwise just make the functions static to prevent

// linkage issues (duplicate symbols and such)

#ifdef __CUDACC__

#define HOST __host__

#define DEVICE __device__

#define HOSTDEVICE __host__ __device__

#else

#define HOST static inline

#define DEVICE static inline

#define HOSTDEVICE static inline

#endif

// Struct alignment is handled differently between the CUDA compiler and other

// compilers (e.g. GCC, MS Visual C++ .NET)

#ifdef __CUDACC__

#define ALIGN(x)  __align__(x)

#else

#if defined(_MSC_VER) && (_MSC_VER >= 1300)

// Visual C++ .NET and later

#define ALIGN(x) __declspec(align(x)) 

#else

#if defined(__GNUC__)

// GCC

#define ALIGN(x)  __attribute__ ((aligned (x)))

#else

// all other compilers

#define ALIGN(x) 

#endif

#endif

#endif

// Somehow in emulation mode the code won't compile Mac OS X 1.1 CUDA SDK when the

// operators below make use of references (compiler bug?). So instead we compile

// the code to pass everything through the stack. Slower, but works.

// I am not sure how the Linux CUDA SDK will behave, so currently when I detect

// Microsoft's Visual C++.NET I always allow it to use references.

#if !defined(__DEVICE_EMULATION__) || (defined(_MSC_VER) && (_MSC_VER >= 1300))

#define REF(x) &x

#define ARRAYREF(x,y) (&x)[y]

#else

#define REF(x) x

#define ARRAYREF(x,y) x[y]

#endif

typedef struct ALIGN(8) _cudacomplex {

   float real;

   float img;

   

   // assignment of a scalar to complex

   _cudacomplex& operator=(const float REF(a)) {

      real = a; img = 0;

      return *this;

   };

  // assignment of a pair of floats to complex

   _cudacomplex& operator=(const float ARRAYREF(a,2)) {

      real = a[0]; img = a[1];

      return *this;

   };

} cudacomplex;

// add complex numbers

HOSTDEVICE cudacomplex operator+(const cudacomplex REF(a), const cudacomplex REF(b)) {

    cudacomplex result = { a.real + b.real, a.img  + b.img };

    return result;

}

// add scalar to complex

HOSTDEVICE cudacomplex operator+(const cudacomplex REF(a), const float REF(b)) {

    cudacomplex result = { a.real + b, a.img };

    return result;

}

// add complex to scalar

HOSTDEVICE cudacomplex operator+(const float REF(a), const cudacomplex REF(b)) {

    cudacomplex result = { a + b.real, b.img };

    return result;

}

// subtract complex numbers

HOSTDEVICE cudacomplex operator-(const cudacomplex REF(a), const cudacomplex REF(b)) {

    cudacomplex result = { a.real - b.real, a.img  - b.img };

    return result;

}

// subtract scalar from complex

HOSTDEVICE cudacomplex operator-(const cudacomplex REF(a), const float REF(b)) {

    cudacomplex result = { a.real - b, a.img };

    return result;

}

// subtract complex from scalar

HOSTDEVICE cudacomplex operator-(const float REF(a), const cudacomplex REF(b)) {

    cudacomplex result = { a - b.real, -b.img };

    return result;

}

// multiply complex numbers

HOSTDEVICE cudacomplex operator*(const cudacomplex REF(a), const cudacomplex REF(b)) {

    cudacomplex result = { a.real * b.real - a.img  * b.img,

                       a.img  * b.real + a.real * b.img };

    return result;

}

// multiply complex with scalar

HOSTDEVICE cudacomplex operator*(const cudacomplex REF(a), const float REF(b)) {

    cudacomplex result = { a.real * b, a.img  * b };

    return result;

}

// multiply scalar with complex

HOSTDEVICE cudacomplex operator*(const float REF(a), const cudacomplex REF(b)) {

    cudacomplex result = { a * b.real, a * b.img };

    return result;

}

// divide complex numbers

HOSTDEVICE cudacomplex operator/(const cudacomplex REF(a), const cudacomplex REF(b)) {

    float tmp   = ( b.real * b.real + b.img * b.img );

    cudacomplex result = { (a.real * b.real + a.img  * b.img ) / tmp,

                       (a.img  * b.real - a.real * b.img ) / tmp };

    return result;

}

// divide complex by scalar

HOSTDEVICE cudacomplex operator/(const cudacomplex REF(a), const float REF(b)) {

    cudacomplex result = { a.real / b, a.img  / b };

    return result;

}

// divide scalar by complex

HOSTDEVICE cudacomplex operator/(const float REF(a), const cudacomplex REF(b)) {

    float tmp   = ( b.real * b.real + b.img * b.img );

    cudacomplex result = { ( a * b.real ) / tmp, ( -a * b.img ) / tmp };

    return result;

}

// complex conjugate

HOSTDEVICE cudacomplex operator~(const cudacomplex REF(a)) {

    cudacomplex result = { a.real, -a.img };

    return result;

}

#endif // #ifndef CUDACOMPLEX_H

The previous cudacomplex library (or rather header file) had some performance issues, it wasn’t as fast as cuComplex in some operations, mainly because it wasn’t based on the CUDA-native float2 type. So here is an update that fixes this issue.

The advantage of my class is that it overloads most maths operators, so you don’t have to use the ugly cuCAdd(), cuCMul() etc that cuComplex forces on you.

The code works on all CUDA SDK versions starting with 1.1 (1.0 has never been tested) and it runs on Windows, Linux, Mac OS X.

This object being based on float2, you should be able to typecast between cuComplex and cudacomplex as you like - or simply overload the assignment operators (or copy constructors) to allow for implicit type conversion if you so desire.

/*

* Copyright (c) 2008-2009 Christian Buchner <Christian.Buchner@gmail.com>

* All rights reserved.

*

* Redistribution and use in source and binary forms, with or without

* modification, are permitted provided that the following conditions are met:

*	 * Redistributions of source code must retain the above copyright

*	   notice, this list of conditions and the following disclaimer.

*	 * Redistributions in binary form must reproduce the above copyright

*	   notice, this list of conditions and the following disclaimer in the

*	   documentation and/or other materials provided with the distribution.

*

* THIS SOFTWARE IS PROVIDED BY Christian Buchner ''AS IS'' AND ANY 

* EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED

* WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE

* DISCLAIMED. IN NO EVENT SHALL Christian Buchner BE LIABLE FOR ANY

* DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES

* (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;

* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND

* ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT

* (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS

* SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

*/

#ifndef CUDACOMPLEX_H

#define CUDACOMPLEX_H

#include <vector_types.h>  // required for float2

// Depending on whether we're running inside the CUDA compiler, define the __host_

// and __device__ intrinsics, otherwise just make the functions static to prevent

// linkage issues (duplicate symbols and such)

#ifdef __CUDACC__

#define HOST __host__

#define DEVICE __device__

#define HOSTDEVICE __host__ __device__

#define M_HOST __host__

#define M_HOSTDEVICE __host__ __device__

#else

#define HOST static inline

#define DEVICE static inline

#define HOSTDEVICE static inline

#define M_HOST inline	   // note there is no static here

#define M_HOSTDEVICE inline // (static has a different meaning for class member functions)

#endif

// Struct alignment is handled differently between the CUDA compiler and other

// compilers (e.g. GCC, MS Visual C++ .NET)

#ifdef __CUDACC__

#define ALIGN(x)  __align__(x)

#else

#if defined(_MSC_VER) && (_MSC_VER >= 1300)

// Visual C++ .NET and later

#define ALIGN(x) __declspec(align(x)) 

#else

#if defined(__GNUC__)

// GCC

#define ALIGN(x)  __attribute__ ((aligned (x)))

#else

// all other compilers

#define ALIGN(x) 

#endif

#endif

#endif

// Somehow in emulation mode the code won't compile Mac OS X 1.1 CUDA SDK when the

// operators below make use of references (compiler bug?). So instead we compile

// the code to pass everything through the stack. Slower, but works.

// I am not sure how the Linux CUDA SDK will behave, so currently when I detect

// Microsoft's Visual C++.NET I always allow it to use references.

#if !defined(__DEVICE_EMULATION__) || (defined(_MSC_VER) && (_MSC_VER >= 1300))

#define REF(x) &x

#define ARRAYREF(x,y) (&x)[y]

#else

#define REF(x) x

#define ARRAYREF(x,y) x[y]

#endif

/**

 * A complex number type for use with CUDA, single precision accuracy.

 * This is deliberately designed to use few C++ features in order to work with most

 * CUDA SDK versions. It is friendlier to use than the cuComplex type because it

 * provides more operator overloads.

 * The class should work in host code and in device code and also in emulation mode.

 * Also this has been tested on any OS that the CUDA SDK is available for.

 */

typedef struct ALIGN(8) _cudacomplex {

// float2 is a native CUDA type and allows for coalesced 128 bit access

  // when accessed according to CUDA's memory coalescing rules.

  // x member is real component

  // y member is imaginary component

  float2 value;

// assignment of a scalar to complex

  _cudacomplex& operator=(const float REF(a)) {

	 value.x = a; value.y = 0;

	 return *this;

  };

// assignment of a pair of floats to complex

  _cudacomplex& operator=(const float ARRAYREF(a,2)) {

	 value.x = a[0]; value.y = a[1];

	 return *this;

  };

// return references to the real and imaginary components

  M_HOSTDEVICE float& real() {return value.x;};

  M_HOSTDEVICE float& imag() {return value.y;};

} cudacomplex;

// add complex numbers

HOSTDEVICE cudacomplex operator+(const cudacomplex REF(a), const cudacomplex REF(b)) {

   cudacomplex result = {{ a.value.x + b.value.x, a.value.y  + b.value.y }};

   return result;

}

// add scalar to complex

HOSTDEVICE cudacomplex operator+(const cudacomplex REF(a), const float REF(b)) {

   cudacomplex result = {{ a.value.x + b, a.value.y }};

   return result;

}

// add complex to scalar

HOSTDEVICE cudacomplex operator+(const float REF(a), const cudacomplex REF(b)) {

   cudacomplex result = {{ a + b.value.x, b.value.y }};

   return result;

}

// subtract complex numbers

HOSTDEVICE cudacomplex operator-(const cudacomplex REF(a), const cudacomplex REF(b)) {

   cudacomplex result = {{ a.value.x - b.value.x, a.value.y  - b.value.y }};

   return result;

}

// negate a complex number

HOSTDEVICE cudacomplex operator-(const cudacomplex REF(a)) {

   cudacomplex result = {{ -a.value.x, -a.value.y }};

   return result;

}

// subtract scalar from complex

HOSTDEVICE cudacomplex operator-(const cudacomplex REF(a), const float REF(b)) {

   cudacomplex result = {{ a.value.x - b, a.value.y }};

   return result;

}

// subtract complex from scalar

HOSTDEVICE cudacomplex operator-(const float REF(a), const cudacomplex REF(b)) {

   cudacomplex result = {{ a - b.value.x, -b.value.y }};

   return result;

}

// multiply complex numbers

HOSTDEVICE cudacomplex operator*(const cudacomplex REF(a), const cudacomplex REF(b)) {

   cudacomplex result = {{ a.value.x * b.value.x - a.value.y * b.value.y,

						   a.value.y * b.value.x + a.value.x * b.value.y }};

   return result;

}

// multiply complex with scalar

HOSTDEVICE cudacomplex operator*(const cudacomplex REF(a), const float REF(b)) {

   cudacomplex result = {{ a.value.x * b, a.value.y * b }};

   return result;

}

// multiply scalar with complex

HOSTDEVICE cudacomplex operator*(const float REF(a), const cudacomplex REF(b)) {

   cudacomplex result = {{ a * b.value.x, a * b.value.y }};

   return result;

}

// divide complex numbers

HOSTDEVICE cudacomplex operator/(const cudacomplex REF(a), const cudacomplex REF(b)) {

   float tmp = ( b.value.x * b.value.x + b.value.y * b.value.y );

   cudacomplex result = {{ (a.value.x * b.value.x + a.value.y * b.value.y ) / tmp,

						   (a.value.y * b.value.x - a.value.x * b.value.y ) / tmp }};

   return result;

}

// divide complex by scalar

HOSTDEVICE cudacomplex operator/(const cudacomplex REF(a), const float REF(b)) {

   cudacomplex result = {{ a.value.x / b, a.value.y / b }};

   return result;

}

// divide scalar by complex

HOSTDEVICE cudacomplex operator/(const float REF(a), const cudacomplex REF(b)) {

   float tmp = ( b.value.x * b.value.x + b.value.y * b.value.y );

   cudacomplex result = {{ ( a * b.value.x ) / tmp, ( -a * b.value.y ) / tmp }};

   return result;

}

// complex conjugate

HOSTDEVICE cudacomplex operator~(const cudacomplex REF(a)) {

   cudacomplex result = {{ a.value.x, -a.value.y }};

   return result;

}

// a possible alternative to a cudacomplex constructor

HOSTDEVICE cudacomplex make_cudacomplex(float a, float b)

{

	cudacomplex res;

	res.real() = a;

	res.imag() = b;

	return res;

}

namespace constants

{

	const _cudacomplex zero = make_cudacomplex(0.0f, 0.0f);

	const _cudacomplex one  = make_cudacomplex(1.0f, 0.0f);

	const _cudacomplex I	= make_cudacomplex(0.0f, 1.0f);

};

#endif // #ifndef CUDACOMPLEX_H

In another forum posting I will detail the use of this class for doing complex 2x2 matrix inversions in many threads - maintaining full memory coalescing to achieve the fastest results. Find this thread here.

Christian

awesome work :) :) … this will greatly help me…

Let me know if you need help of some sort in improving this and all…

Also I guess you dont have the mod function ? for complex numbers

thanks

Not yet, but it would be quite trivial to add. I’d suggest naming it abs() and it would do return __sqrtf(value.xvalue.x + value.yvalue.y); Also we could need a function to return the phase of a complex number, simply __atan2(value.y, value.x).

Likewise another make_cudacomplex function could be useful that accepts magnitude and angle, however it needs to be disambiguated from the other make_cudacomplex that expects real and imaginary components.

Further possible improvements could be the addition of constructors, copy constructor, assignment operators, etc… But it needs to be checked whether this breaks compatibility with older SDK versions.

Christian

Third version. You guys like templates much? I do. And that’s why we can now get double precision complex maths as well.

And I added the complex modulus.

The single and double precision types are called singlecomplex and doublecomplex. And if you still have created code that uses the previous “cudacomplex”, I recommend adding this to your code, right after the #include

typedef singlecomplex cudacomplex;

And coming soon: Double single precision mathematics (a.k.a. “double emulation”), for the poor schmucks like myself who don’t have a double precision ALU in their compute <= 1.2 devices. First results show this is anywhere from 30-50 times slower than single precision for massive numbercrunching (4x4 matrix inversions).

DISCLAIMER: I am not quite 100% sure if abs() and phase() would currently return a good double precision result. That depends on whether the __atan2 and __sqrt intrinsics can also accept and return doubles.

Should any of you guys know how I can create generic non-member overloads for scalars where the righthand operator is a cudacomplex with arbitrary template arguments, let me know. Right now I have to repeat (i.e. specialize) the code for each kind of arguments <T2,T> supported by the template. Right now that is double2, double and float2, float.

/*

* Copyright (c) 2008-2009 Christian Buchner <Christian.Buchner@gmail.com>

* All rights reserved.

*

* Redistribution and use in source and binary forms, with or without

* modification, are permitted provided that the following conditions are met:

*	 * Redistributions of source code must retain the above copyright

*	   notice, this list of conditions and the following disclaimer.

*	 * Redistributions in binary form must reproduce the above copyright

*	   notice, this list of conditions and the following disclaimer in the

*	   documentation and/or other materials provided with the distribution.

*

* THIS SOFTWARE IS PROVIDED BY Christian Buchner ''AS IS'' AND ANY 

* EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED

* WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE

* DISCLAIMED. IN NO EVENT SHALL Christian Buchner BE LIABLE FOR ANY

* DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES

* (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;

* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND

* ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT

* (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS

* SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

*/

#ifndef CUDACOMPLEX_H

#define CUDACOMPLEX_H

#include <vector_types.h>  // required for float2, double2

// Depending on whether we're running inside the CUDA compiler, define the __host_

// and __device__ intrinsics, otherwise just make the functions static to prevent

// linkage issues (duplicate symbols and such)

#ifdef __CUDACC__

#define HOST __host__

#define DEVICE __device__

#define HOSTDEVICE __host__ __device__

#define M_HOST __host__

#define M_HOSTDEVICE __host__ __device__

#else

#define HOST static inline

#define DEVICE static inline

#define HOSTDEVICE static inline

#define M_HOST inline	   // note there is no static here

#define M_HOSTDEVICE inline // (static has a different meaning for class member functions)

#endif

// Struct alignment is handled differently between the CUDA compiler and other

// compilers (e.g. GCC, MS Visual C++ .NET)

#ifdef __CUDACC__

#define ALIGN(x)  __align__(x)

#else

#if defined(_MSC_VER) && (_MSC_VER >= 1300)

// Visual C++ .NET and later

#define ALIGN(x) __declspec(align(x)) 

#else

#if defined(__GNUC__)

// GCC

#define ALIGN(x)  __attribute__ ((aligned (x)))

#else

// all other compilers

#define ALIGN(x) 

#endif

#endif

#endif

// Somehow in emulation mode the code won't compile Mac OS X 1.1 CUDA SDK when the

// operators below make use of references (compiler bug?). So instead we compile

// the code to pass everything through the stack. Slower, but works.

// I am not sure how the Linux CUDA SDK will behave, so currently when I detect

// Microsoft's Visual C++.NET I always allow it to use references.

#if !defined(__DEVICE_EMULATION__) || (defined(_MSC_VER) && (_MSC_VER >= 1300))

#define REF(x) &x

#define ARRAYREF(x,y) (&x)[y]

#define PTR(x) *x

#else

#define REF(x) x

#define ARRAYREF(x,y) x[y]

#define PTR(x) *x

#endif

/**

 * A templated complex number type for use with CUDA.

 * This is deliberately designed to use few C++ features in order to work with most

 * CUDA SDK versions. It is friendlier to use than the cuComplex type because it

 * provides more operator overloads.

 * The class should work in host code and in device code and also in emulation mode.

 * Also this has been tested on any OS that the CUDA SDK is available for.

 */

template <typename T2, typename T> struct ALIGN(8) _cudacomplex {

// float2  is a native CUDA type and allows for coalesced 128 byte access

  // double2 is a native CUDA type and allows for coalesced 256 byte access

  // when accessed according to CUDA's memory coalescing rules.

  // x member is real component

  // y member is imaginary component

  T2 value;

// assignment of a scalar to complex

  _cudacomplex<T2, T>& operator=(const T REF(a)) {

	 value.x = a; value.y = 0;

	 return *this;

  };

// assignment of a pair of Ts to complex

  _cudacomplex<T2, T>& operator=(const T ARRAYREF(a,2)) {

	 value.x = a[0]; value.y = a[1];

	 return *this;

  };

// return references to the real and imaginary components

  M_HOSTDEVICE T& real() {return value.x;};

  M_HOSTDEVICE T& imag() {return value.y;};

// add complex numbers

  M_HOSTDEVICE _cudacomplex<T2, T> operator+(const _cudacomplex<T2, T> REF(b)) const {

	_cudacomplex<T2, T> result = {{ value.x + b.value.x, value.y  + b.value.y }};

	return result;

  }

// add scalar to complex

  M_HOSTDEVICE _cudacomplex<T2, T> operator+(const T REF(b)) const {

	_cudacomplex<T2, T> result = {{ value.x + b, value.y }};

	return result;

  }

// subtract complex numbers

  M_HOSTDEVICE _cudacomplex<T2, T> operator-(const _cudacomplex<T2, T> REF(b)) const {

	_cudacomplex<T2, T> result = {{ value.x - b.value.x, value.y  - b.value.y }};

	return result;

  }

// negate a complex number

  M_HOSTDEVICE _cudacomplex<T2, T> operator-() const {

	_cudacomplex<T2, T> result = {{ -value.x, -value.y }};

	return result;

  }

// subtract scalar from complex

  M_HOSTDEVICE _cudacomplex<T2, T> operator-(const T REF(b)) const {

	 _cudacomplex<T2, T> result = {{ value.x - b, value.y }};

	 return result;

  }

// multiply complex numbers

  M_HOSTDEVICE _cudacomplex<T2, T> operator*(const _cudacomplex<T2, T> REF(b)) const {

	_cudacomplex<T2, T> result = {{ value.x * b.value.x - value.y * b.value.y,

									value.y * b.value.x + value.x * b.value.y }};

	return result;

  }

// multiply complex with scalar

  M_HOSTDEVICE _cudacomplex<T2, T> operator*(const T REF(b)) const {

	 _cudacomplex<T2, T> result = {{ value.x * b, value.y * b }};

	 return result;

  }

// divide complex numbers

  M_HOSTDEVICE _cudacomplex<T2, T> operator/(const _cudacomplex<T2, T> REF(b)) const {

	T tmp = ( b.value.x * b.value.x + b.value.y * b.value.y );

	_cudacomplex<T2, T> result = {{ (value.x * b.value.x + value.y * b.value.y ) / tmp,

								(value.y * b.value.x - value.x * b.value.y ) / tmp }};

	return result;

}

// divide complex by scalar

  M_HOSTDEVICE _cudacomplex<T2, T> operator/(const T REF(b)) const {

	 _cudacomplex<T2, T> result = {{ value.x / b, value.y / b }};

	 return result;

  }

// complex conjugate

  M_HOSTDEVICE _cudacomplex<T2, T> operator~() const {

	 _cudacomplex<T2, T> result = {{ value.x, -value.y }};

	 return result;

  }

// complex modulus (complex absolute)

  M_HOSTDEVICE T abs() {

	  T result = __sqrt( value.x*value.x + value.y*value.y );

	  return result;

  }

// complex phase angle

  M_HOSTDEVICE _cudacomplex<T2, T> phase() {

	  T result = __atan2( value.y, value.x );

	  return result;

  }

// a possible alternative to a _cudacomplex constructor

  static M_HOSTDEVICE _cudacomplex<T2, T> make_cudacomplex(T a, T b)

  {

	_cudacomplex<T2, T> res;

	res.real() = a;

	res.imag() = b;

	return res;

  }

// return constant number one

  static M_HOSTDEVICE const _cudacomplex<T2, T> one() {

	  return make_cudacomplex((T)1.0, (T)0.0);

  }

// return constant number zero

  static M_HOSTDEVICE const _cudacomplex<T2, T> zero() {

	  return make_cudacomplex((T)0.0, (T)0.0);

  }

// return constant number I

  static M_HOSTDEVICE const _cudacomplex<T2, T> I() {

	  return make_cudacomplex((T)0.0, (T)1.0);

  }

};

//

// Define the common complex types

//

typedef _cudacomplex<float2, float> singlecomplex;

typedef _cudacomplex<double2, double> doublecomplex;

//typedef _cudacomplex<doublesingle2, doublesingle> doublesinglecomplex;

//

// Non-member overloads for single complex

//

// subtract single complex from scalar

HOSTDEVICE _cudacomplex<float2, float> operator-(const float REF(a), const _cudacomplex<float2, float> REF(b)) {

  _cudacomplex<float2, float> result = {{ a - b.value.x, -b.value.y }};

  return result;

}

// add single complex to scalar

HOSTDEVICE _cudacomplex<float2, float> operator+(const float REF(a), const _cudacomplex<float2, float> REF(b)) {

  _cudacomplex<float2, float> result = {{ a + b.value.x, b.value.y }};

  return result;

}

// multiply scalar with single complex

HOSTDEVICE _cudacomplex<float2, float> operator*(const float REF(a), const _cudacomplex<float2, float> REF(b)) {

  _cudacomplex<float2, float> result = {{ a * b.value.x, a * b.value.y }};

  return result;

}

// divide scalar by single complex

HOSTDEVICE _cudacomplex<float2, float> operator/(const float REF(a), const _cudacomplex<float2, float> REF(b)) {

  float tmp = ( b.value.x * b.value.x + b.value.y * b.value.y );

  _cudacomplex<float2, float> result = {{ ( a * b.value.x ) / tmp, ( -a * b.value.y ) / tmp }};

  return result;

}

//

// Non-member overloads for double complex

//

// subtract double complex from scalar

HOSTDEVICE _cudacomplex<double2, double> operator-(const double REF(a), const _cudacomplex<double2, double> REF(b)) {

  _cudacomplex<double2, double> result = {{ a - b.value.x, -b.value.y }};

  return result;

}

// add double complex to scalar

HOSTDEVICE _cudacomplex<double2, double> operator+(const double REF(a), const _cudacomplex<double2, double> REF(b)) {

  _cudacomplex<double2, double> result = {{ a + b.value.x, b.value.y }};

  return result;

}

// multiply scalar with double complex

HOSTDEVICE _cudacomplex<double2, double> operator*(const double REF(a), const _cudacomplex<double2, double> REF(b)) {

  _cudacomplex<double2, double> result = {{ a * b.value.x, a * b.value.y }};

  return result;

}

// divide scalar by double complex

HOSTDEVICE _cudacomplex<double2, double> operator/(const double REF(a), const _cudacomplex<double2, double> REF(b)) {

  double tmp = ( b.value.x * b.value.x + b.value.y * b.value.y );

  _cudacomplex<double2, double> result = {{ ( a * b.value.x ) / tmp, ( -a * b.value.y ) / tmp }};

  return result;

}

// a possible alternative to a single complex constructor

HOSTDEVICE singlecomplex make_singlecomplex(float a, float b)

{

	singlecomplex res;

	res.real() = a;

	res.imag() = b;

	return res;

}

// a possible alternative to a double complex constructor

HOSTDEVICE doublecomplex make_doublecomplex(double a, double b)

{

	doublecomplex res;

	res.real() = a;

	res.imag() = b;

	return res;

}

#endif // #ifndef CUDACOMPLEX_H

I’ve tried using version 2 (non-template) but I’m getting a strange error message. Google didn’t help either. I’m wondering if anybody has been able to use this for complex expressions. I’ll see if I can pull out a small snippet that demos the error but for now here’s the error message:

/tmp/tmpxft_00001103_00000000-1_Kernel_SMT_ODS.cudafe1.stub.c:8: error: overloaded function with no contextual type information

EDIT:

I figure this out, the error is not related to the complex type, nvcc doesn’t handle namespaces correctly.

In GNU C++ one can do this in the .cpp file:

using namespace FSMT;

and implement all the functions declared in the .h file in the FSMT namespace.

But nvcc produces the error above when “using namespace FSMT;” is used. Instead, pull in the types one as declared explicitly in your .cu file like this:

using FSMT::int32;
using FSMT::float32;

etc. without a “using namespace FSMT” and it will compile without error. Now I can try to test complex math!

Thanks,
Nick