I apologize for reposting this question, but I think I might have placed this question in the incorrect forum division previously. However this is a question that I have been unable to find any sort of answer to online anywhere else - hopefully someone here will have some idea if this can even be done.

I am working with CUDA UnBound (CUB) and cuFloatComplex numbers such that I would like to reduce an array of cuFloatComplex numbers to a single float value. The pseudo code I would like to implement using cub::DeviceReduce::Reduce operation follows where N is total length of an array of cuFloatComplex numbers:

```
cuFloatComplex *array = NULL;
array = (cuComplexFloat *)malloc(N*sizeof(cuFloatComplex));
// initialize array to some set of random numbers:
initArray(array, N);
// reduce to single float
float sum = 0.0f;
for (int i = 0; i < N; ++i)
{
sum += (array[i].x * array[i].x + array[i].y * array[i].y);
}
```

I have tried creating a custom operation function that will incorporate the above and pass to DeviceReduce in CUB it doesn’t give same results as in following where d_in and d_out are cuFloatComplex values (d_in is array of length N, d_out is scalar of length 1) - result would be stored in real portion of d_out:

```
struct CplxReduce{
cuFloatComplex operator()(const cuFloatComplex &a, const cuFloatComplex &b) const {
cuFloatComplex c;
c.x = (a.x *a.x + a.y*a.y);
c.x += (b.x*b.x + b.y*.b.y);
return c;
}
};
...
FltCplxReduce reduceOp;
void *tmp = NULL;
size_t tmp_bytes = 0;
cuFloatComplex init;
//init.x = init.y = FLT_MIN;
init.x = init.y = 0.0f;
cub::DeviceReduce::Reduce(tmp, tmp_bytes, d_in, d_out, N, reduceOp, init);
checkCudaErrors(cudaMalloc(&tmp, tmp_bytes));
cub::DeviceReduce::Reduce(tmp, tmp_bytes, d_in, d_out, N, reduceOp, init);
if (tmp)
checkCudaErrors(cudaFree(tmp));
```

Is what I am trying to do even possible? Any help/hints would be great.

Thanks