CUB Repost - I apologize

I apologize for reposting this question, but I think I might have placed this question in the incorrect forum division previously. However this is a question that I have been unable to find any sort of answer to online anywhere else - hopefully someone here will have some idea if this can even be done.

I am working with CUDA UnBound (CUB) and cuFloatComplex numbers such that I would like to reduce an array of cuFloatComplex numbers to a single float value. The pseudo code I would like to implement using cub::DeviceReduce::Reduce operation follows where N is total length of an array of cuFloatComplex numbers:

      cuFloatComplex *array = NULL;
      array = (cuComplexFloat *)malloc(N*sizeof(cuFloatComplex));
      // initialize array to some set of random numbers:
      initArray(array, N);

      // reduce to single float
      float sum = 0.0f;
      for (int i = 0; i < N; ++i)
      {
        sum += (array[i].x * array[i].x + array[i].y * array[i].y);
      }

I have tried creating a custom operation function that will incorporate the above and pass to DeviceReduce in CUB it doesn’t give same results as in following where d_in and d_out are cuFloatComplex values (d_in is array of length N, d_out is scalar of length 1) - result would be stored in real portion of d_out:

struct CplxReduce{
  cuFloatComplex operator()(const cuFloatComplex &a, const cuFloatComplex &b) const {
    cuFloatComplex c;
    c.x = (a.x *a.x + a.y*a.y);
    c.x += (b.x*b.x + b.y*.b.y);
    return c;
}
};
...
FltCplxReduce reduceOp;
  void *tmp = NULL;
  size_t tmp_bytes = 0;
  cuFloatComplex init;
  //init.x = init.y = FLT_MIN;
  init.x = init.y = 0.0f;

  cub::DeviceReduce::Reduce(tmp, tmp_bytes, d_in, d_out, N, reduceOp, init);

  checkCudaErrors(cudaMalloc(&tmp, tmp_bytes));

  cub::DeviceReduce::Reduce(tmp, tmp_bytes, d_in, d_out, N, reduceOp, init);

  if (tmp)
    checkCudaErrors(cudaFree(tmp));

Is what I am trying to do even possible? Any help/hints would be great.

Thanks