Complex Pointwise Multiplication

Hi all!

I have found in a sample cuda, this code for a pointwise multiplication of complex number.

Can someone explain me how it works ?

__device__ inline Complex ComplexScale(Complex a, float s)

{

    Complex c;

    c.x = s * a.x;

    c.y = s * a.y;

    return c;

}

__device__ inline Complex ComplexMul(Complex a, Complex b)

{

    Complex c;

    c.x = a.x * b.x - a.y * b.y;

    c.y = a.x * b.y + a.y * b.x;

    return c;

}

__global__ void ComplexPointwiseMulAndScale(Complex* a, const Complex* b, int size, float scale)

{

    const int numThreads = blockDim.x * gridDim.x;

    const int index = blockIdx.x * blockDim.x + threadIdx.x;

    for (int i = index; i < size; i += numThreads)

        a[i] = ComplexScale(ComplexMul(a[i], b[i]), scale);     

}

Why do I must scaling the results ?

In the third function, what does for cycle means ?

thank you for support!

Complex Scale is for multiplying a complex number z = x+ iy by a scalar s. The result is just multiplying the real and imaginary parts by the scalar s, i.e. z = sx + isy.

For the multiplication you have two numbers of the form x+ iy and you multiply both parts of each number by both parts of the other number

(x1 * x2) + (x1 * iy1) + (x2* iy1) + (iy1 * iy2), Factoring out the i, you get x1x2 - y1y2 + i*(x1y1 +x2y2) as in the code shown.

The final function is just doing both of these operations at once on an array of complex numbers.

thank you so much!

i am new to cuda programming and i was developing a code which will just pass some arrays and complex numbers from host to device but when i am compiling it, it gives me errors like this:

error (39): no suitable constructor exists to convert from “void *” to “float2”

error (82): argument of type “float” is incompatible with parameter of type “void *”

error (82): argument of type “float” is incompatible with parameter of type “const void *”

here is my code :

#include <cuda.h>
#include <bits/stdc++.h>
#include <cuComplex.h>
#include <thrust/complex.h>
//#include <helper_functions.h>
//#include <helper_cuda.h>
#include <cuda_runtime.h>

using namespace std;
typedef float2 Complex;

__device__ inline Complex ComplexScale(Complex a, float s)
{
    Complex c;
    c.x = s * a.x;
    c.y = s * a.y;
    return c;
}
#define N 5
/*__global__ void  matrixA(float* A, float* B, float* C){

int i = threadIdx.x;
int j = blockIdx.x;
C[N*j+i] = A[N*j+i] + B[N*j+i];
}*/

int main (void) {
float a[N][5],b[N],c,ay[N][5],by[N],omega=20,damp_fac=0.05;;
float *dev_a, *dev_b, dev_dfac, dev_omega;
Complex dev_pratio, dev_csv, dev_sheer; 
c=1/4;

Complex z=(Complex)malloc(sizeof(Complex));
z.x=1.0;
z.y=damp_fac;
Complex poisson_ratio = (Complex)malloc(sizeof(Complex));
poisson_ratio = ComplexScale(z,0.3);
Complex c_sv = (Complex)malloc(sizeof(Complex));
c_sv = ComplexScale(z,200.00);
Complex sheermodulus = (Complex)malloc(sizeof(Complex));
sheermodulus = ComplexScale(z,7e+7);

cudaMalloc((void **)&dev_a, N * 5 * sizeof(float));
cudaMalloc((void **)&dev_b, N * sizeof(float));
cudaMalloc((void **)&dev_dfac, sizeof(float));
cudaMalloc((void **)&dev_omega, sizeof(float));
cudaMalloc((void **)&dev_pratio, sizeof(Complex));
cudaMalloc((void **)&dev_csv, sizeof(Complex));
cudaMalloc((void **)&dev_sheer, sizeof(Complex));

for (int i = 0; i < N; i++){
    for (int j = 0; j < 5; j++){
	if(i==0 && j==0)    
        a[i][j] = 0.0;
	else if(j==0 && i!=0)
	a[i][j]=a[--i][4];
	else 
	a[i][j]=a[i][--j]+c;
    }
}
for (int i = 0; i < N; i++){
	by[i]=0.0;
    for (int j = 0; j < 5; j++){
	ay[i][j]=0.0;
}
}
for(int i=0;i<N;i++)
    {
    b[i]=a[i][2];
    }

cudaMemcpy(dev_a, a, N * 5 * sizeof(float), cudaMemcpyHostToDevice);
cudaMemcpy(dev_b, b, N * sizeof(float), cudaMemcpyHostToDevice);
cudaMemcpy(dev_dfac, damp_fac, sizeof(float), cudaMemcpyHostToDevice);
cudaMemcpy(dev_omega, omega, sizeof(float), cudaMemcpyHostToDevice);
cudaMemcpy(dev_pratio, poisson_ratio, sizeof(Complex), cudaMemcpyHostToDevice);
cudaMemcpy(dev_csv, c_sv, sizeof(Complex), cudaMemcpyHostToDevice);
cudaMemcpy(dev_sheer, sheermodulus, sizeof(Complex), cudaMemcpyHostToDevice);

//matrixAdd <<<N,5>>> (dev_a, dev_b, dev_c);
//cudaMemcpy(c, dev_c, N * N * sizeof(float), cudaMemcpyDeviceToHost);

return 0; 
}

Please help!

This is not correct, one of several errors of this type in your code:

Complex c_sv = (Complex)malloc(sizeof(Complex));

malloc returns a pointer to the requested allocation. You cannot cast that pointer to an ordinary (non-pointer) type.

This is a lack of understanding of C and has nothing to do with CUDA.

Thank you for pointing out the reason of my error but i am still not able to rectify it
Can you tell me what should i do to rectify it?
like proper syntax?

I wouldn’t doubt that you’re not able to fix it after just a few minutes. It will require learning and understanding on your part.

Proper syntax for that line of code might be something like this:

Complex *c_sv = (Complex *)malloc(sizeof(Complex));

But that will push the problem elsewhere in your code because this is a conceptual misunderstanding of the usage of pointers, basic types, and malloc, in C. So this might be a “fix”:

Complex *c_sv = (Complex *)malloc(sizeof(Complex));
*c_sv = ComplexScale(z,200.00);

But I have no doubt that if you make only that change to your code, it will not fix all the issues. I suggest you study what those changes are, why they make sense, then see if you can apply that knowledge throughout your program. I don’t wish to completely rewrite and debug your code for you, especially since this has nothing to do with CUDA.

If you want to do CUDA programming, it’s strongly recommended that you become a proficient C programmer first.

Thank you for your help
Much appreciated