Hello all,

I am looking for a point to point multiplication API call. I am going to use it to implement a convolution by doing two FFTs, point to point multiplication, then a single IFFT.

I copied my kernel below that I am using. Simply if a = {1 2 3 4} and b = {4 5 6 7} the output of my Kernel would be output = {1*4 2*5 3*6 4*7} = {4 10 18 28}. My total Convolution takes 63ms but my pointMultiply is a whopping 30ms of that time!!!

Is there a cublas API that does this?

```
// Complex multiplication
static __device__ __host__ inline cuComplex ComplexMul(cuComplex a, cuComplex b)
{
cuComplex c;
c.x = a.x * b.x - a.y * b.y;
c.y = a.x * b.y + a.y * b.x;
return c;
}
__global__ void pointMultiply(cuComplex *a, const cuComplex *b, int size)
{
int i = blockIdx.x * blockDim.x + threadIdx.x;
if(i >= size)
return;
a[i] = ComplexMul(a[i], b[i]);
}
```