thrust::complex array definition problem

I am trying to rewrite some PyCUDA code in CUDA/C++ and have run into a problem which has stumped me with my first kernel.

I have defined a shared memory array within the kernel code

__shared__ thrust::complex<float> temp_sig[2][4][32];

within a kernel function and it throws the message
warning: dynamic initialization is not supported for a function-scope static shared variable within a device/global function

I also had a constant array defined outside the kernel as follows (and initialised using host code)

__device__ __constant__ thrust::complex<float> wfft_32[16];

which threw a similar message but as as error. I fixed this by using cuComplex instead of thrust::complex but I’d rather have access to the more intuitive maths notation of the thrust version.

I can’t find any examples of code using this type for arrays on the device. Am I doing something wrong? Can I define thrust::complex arrays on the device? If so how?

thrust::complex has a default non-empty constructor. In CUDA, you cannot have that on statically declared variables of type constant, device, or shared.

thrust::complex is usable in ordinary device code, if the data is referenced in global or local memory, for instance passing thrust::complex data to a kernel via kernel argument.

If you want to use it in shared memory, you could use dynamically allocated shared memory. That is probably the cleanest approach. I’m not sure this will cleanly or easily support triple subscripting, but for fixed dimensions (what would be required for a static definition) you could write an indexing helper macro.

Since sizeof(cuComplex) = sizeof(thrust::complex) I imagine there are various casting methods to use it with statically allocated shared memory, with triple subscripting.

There shouldn’t be any difficulty using it with ordinary local arrays in device code.

Here is a compilable example:

#include <thrust/complex.h>
#include <cuComplex.h>

__constant__ cuComplex cdata[32];
__global__ void k(thrust::complex<float> *gdata){

  // shared
  extern __shared__ thrust::complex<float> sdata[];

  sdata[0] = thrust::complex<float>(0,0);
  // constant
  thrust::complex<float> *my_cdata = reinterpret_cast<thrust::complex<float> *>(cdata);
  sdata[1] = my_cdata[0] * sdata[0];
  // global
  gdata[0] = sdata[1] * gdata[0];
  // local
  thrust::complex<float> my_ldata = gdata[1];
  gdata[0] *= my_ldata;

Thanks for the explanation and example. That makes perfect sense.