Please help for using complex numbers in cuda code

Hello all,

I am using some complex variables in my code in which various arithmetic operations such as add, multiply is to be done. I have included cuFloatComplex library in my code. How shall I multiply a scalar with my complex number? I have attached the snippet below in which q is a complex variable , m_1 is a int value and fophil is a float value.

if (mm<0)
{
ex1 = (-q * m_1 * fophi1);
comp_1_w = cexpf(ex1);
}
else
{
ex2 = (q * m_1* fophi1);

When I am runnig the code, it gives following 2 error : no operator “*” matches these operands
operand types are: int * cuFloatComplex

error: no suitable conversion function from “cuFloatComplex” to “float complex” exists

Can someone please help me. Thank you very much.

Thanks for your reply. I have seen that answer. It tells how to multiply a scalar with a complex value. But I also want to multiply a float value with cuFloatComplex value. Can you please tell how to do that. I am getting these 2 errors:
error: no suitable conversion function from “cuDoubleComplex” to “float complex” exists
error: no operator “=” matches these operands, operand types are: cuDoubleComplex = float complex

The code tells you exactly how to do it, All you need to do is to replace the double functions with the corresponding float functions.
Or you can use the fact that cuFloatComplex is defined as typedef float2 cuFloatComplex; and manipulate the “real” part directly. Or extract the “real” part using cuCrealf and construct the result of your multiplication by hand.

@striker159 Thank you so much for your reply. But suppose I have to take exponential of a complex number, so I would have to take it separately for real and complex parts. I have written the following code for that :
cuDoubleComplex my_complex_exp (cuDoubleComplex ex1)
{
cuDoubleComplex comp_1_w;
float s, c;
float e = expf(ex1.x);
sincosf(ex1.y, &s, &c);
comp_1_w.x = c * e;
comp_1_w.y = s * e;
return comp_1_w;
}

But, it is showing the error : expected a “;” and a warning : parsing restarts here after previous syntax error.
Could you please tell what may be the possible solution.

This is code intended for execution on the GPU, correct? In which case the function needs a __device__ attribute.

There also seems to be an inconsistency here. The interface of the function suggests double-precision computation, yet the internal computation is performed in single precision. Probably not what was intended.

@njuffa Thanks for your reply. Actually first I want to run this code in cuda environment without using GPU. That’s why I haven’t use device attribute. How shall I do this?

@njuffa With those modifications also I am getting same errors. This was the modified code:

__device__ cuDoubleComplex my_complex_exp (cuDoubleComplex ex1){
cuDoubleComplex comp_1_w;
double s, c;
double e = exp(ex1.x);
sincosf(ex1.y, &s, &c);
comp_1_w.x = c * e;
comp_1_w.y = s * e;
 return comp_1_w;
}

error: expected a “;”

Your piece of code compiles without any issues. The error must be in a code which you have not shown.

@striker159 Yes, I checked that piece of code is working correctly. But Iike in my code I have to perform various operations such as addition, exponentials , multiplication. so, I have to separately define functions for performing each of these operations or there exists any alternative way for doing above operations?

Well, if the operation is not implemented yet, you need to implement it yourself.
Did you check out thrust::complex type which was also mentioned in above stackoverflow post?

https://thrust.github.io/doc/group__complex__numbers.html

Yes I have checked thrust but as I was using CUDA C so I thought I’d rather not use thrust as some examples suggests that thrust was used in CUDA C++. I may be wrong.

@striker159 @Robert_Crovella @njuffa Thanks for much help. I am getting only 1 error after doing all these modifications. Below is my complete code and that one error. Please help me how to get rid off this error.

#include <stdio.h>

#include <stdlib.h>

#include <string.h>

#include <math.h>

#include <complex.h>

#include <cuComplex.h>

#include <time.h>

void legendre(int lmax,int l, float theta_dummy, float Lmn[11]);

void force_calc(float x11[147], float y11[147], float z11[147], float ForceX[147],float ForceY[147], float ForceZ[147]);

cuDoubleComplex my_complex_exp(cuDoubleComplex ex1);

  int count;         

int main()

{

 int h;

 float x11[147], y11[147], z11[147];

 float ForceX[147],ForceY[147],ForceZ[147]; 

 FILE* fp;

 clock_t start, end;

 double cpu_time_used;

 start = clock();

 /////////////////////////////////////first calc///////////////////////////////////////////////////////

 fp = fopen("XYZ1.txt", "r");

 for(h=0;h<147;h++)

 {

  fscanf(fp, "%f", &x11[h]);

 }

 for(h=147;h<294;h++)

 {

  fscanf(fp, "%f", &y11[h-147]);

 }

 for(h=294;h<441;h++)

 {

  fscanf(fp, "%f", &z11[h-294]);

 }

 fclose(fp);

 force_calc(x11,y11,z11,ForceX,ForceY,ForceZ);

 fp = fopen("forces1.txt", "w");

 for(h=0;h<147;h++)

 {

  fprintf(fp, "%f %f %f\n", ForceX[h], ForceY[h], ForceZ[h]);

 }

 fclose(fp);

 end = clock();

 cpu_time_used = ((double) (end - start)) / CLOCKS_PER_SEC;

 printf("calc took %lf seconds to execute \n", cpu_time_used);

 printf("%d\n",count);

 return 0;

}

void force_calc(float x11[147], float y11[147], float z11[147], float ForceX[147],float ForceY[147], float ForceZ[147])

{

 int natoms=147,etarad=9,etaang=5,lmax=10;

 float pi=3.141592653589793;

 float cons1[10][21];

 float a011[60][30],a121[31][30],a231[30][1];

 float rcut = 8.0;

 float dx,dy,dz,r,d1,diff_1zr;

 float term1,term2,term3,term4,func1,func2,rcutfun,Dcutfun;

 float pcolhea1, fophi1, ficosec_theta1, dicos_theta1;

 float dthetay1, dthetax1, dthetaz1, derphy1,derphx1;

 float g1,g2,g3,g4;

 float G_p,d_Gx_p,d_Gy_p,d_Gz_p;

 float Lmn_all[147][10][11],Lmn[11];

 float cons2;

 float diff_Ytheta2;

 int  p,n,l,m,k1,j,k,h,s1,jj,j11,m_1,b_m,jj1,k2,m1;

 float r2;

 float multf1, multf2, multf3, sub1;

 float rcut_inv = 0.125;

 float Lmmnn;

 float d2Px[natoms],d2Py[natoms],d2Pz[natoms];

 int jt;

 int k12, mm, b_mm, ll, ju;

 float netar[9]={0.005,0.015,0.0230,0.038,0.060,0.090,0.150,0.260,0.480};

 float netaa[5]={0.0028,0.0040,0.0110,0.0280,0.059};

 

 float temp5[9];

 

 cuDoubleComplex p_ch_p;

 cuDoubleComplex tot_cnlm_sum_p[147][5][10][21];

 cuDoubleComplex xt2,xt6;

 cuDoubleComplex useY1;

 cuDoubleComplex ex1;

 cuDoubleComplex ex2,ex3;

 cuDoubleComplex q;

 cuDoubleComplex comp_1_w;

 cuDoubleComplex mult1,mult4;

 //cuDoubleComplex temp4,temp6;

 

 q = make_cuDoubleComplex(0,1);

 

 //////////////////////////////////////////////////////////////////////////////////////////////

 ////////////////////////////variable declaration for NN////////////////////////////////////////////////

 int Lensym=59,LenHD2=30,LenHD1=30;

 int nn,kk,i,z;

 int b1;

 float checkcmx[147][59],checkcmy[147][59],checkcmz[147][59];

 float GiS_p[147][59];

 float result,termm,ress,ter;

 float AG_p,bkAG_p,alOj_x_p,y_p,Oj_y_p;

 float Oj_x_p[147][30];

 float GradHL6[147][30];

 float GradHL5[147][30];

 float GradHL24,SumJJx,SumJJy,SumJJz,tempx1,tempy1,tempz1,GradHL1,GradHL12,GradHL13,GradHL21,GradHL23,GradFx,GradFz,GradFy;

 float gradientE[147][59];

 float tt1,tt2,tt3,tt4;

  

float cons11[100];

FILE * fp1;

fp1 = fopen ("cons11.txt", "r");

 int index = 0;

  while(fscanf(fp1, "%f",&cons11[index])==1)

      index++;

      fclose(fp1);

float meangtot_f[59];  

FILE * fp2;

fp2 = fopen ("meangtot.txt", "r");

  index = 0;

  while(fscanf(fp2, "%f",&meangtot_f[index])==1)

      index++;

      fclose(fp2);

float gtotmin_f[59];

FILE * fp3;

fp3 = fopen ("gtotmin.txt", "r");

  index = 0;

  while(fscanf(fp3, "%f",&gtotmin_f[index])==1)

      index++;

      fclose(fp3);

float gtotmax_f[59];

FILE * fp4;

fp4 = fopen ("gtotmax.txt", "r");

index = 0;

  while(fscanf(fp4, "%f",&gtotmax_f[index])==1)

      index++;

      fclose(fp4);

      

float a01[1800];  

FILE * fp5;

fp5 = fopen ("a01.txt", "r");

index = 0;

  while(fscanf(fp5, "%f",&a01[index])==1)

      index++;

      fclose(fp5);

float a12[930];

FILE * fp6;

fp6 = fopen ("a12.txt", "r");

index = 0;

  while(fscanf(fp6, "%f",&a12[index])==1)

      index++;

      fclose(fp6);

float a23[30];

FILE * fp7;

fp7 = fopen ("a23.txt", "r");

index = 0;

  while(fscanf(fp7, "%f",&a23[index])==1)

      index++;

      fclose(fp7);

for (l=0;l<10;l++)

{

 for(m=0;m<21;m++)

 {

  if(m<2*l+1)

  {

   cons1[l][m] = cons11[(l*l)+m];

   //printf("%d %d %.32f\n", l, m, cons1[l][m]);

  }

  else

  {

   cons1[l][m] = 0.0;

  }

 }

}

for(h=0; h<60; h++)

{

 for(j=0; j<30; j++)

 {

  a011[h][j] = a01[h*30+j];

  //printf("%f\n",a011[h][j]);

 }

}

for(h=0; h<31; h++)

{

 for(j=0; j<30; j++)

 {

  a121[h][j] = a12[h*30+j];

  //printf("%f\n",a12[h*30+j]);

 }

}

for(h=0; h<30; h++)

{

 for(j=0; j<1; j++)

 {

  a231[h][j] = a23[h+j];

  //printf("%f\n",a23[h+j]);

 }

}

///start of main loop /////////////////////////////

 //////////////Pre-calculation of tot_cnlm_sum_p for every atom/////////////////

 for(h=0; h<natoms; h++)

 {

  for(jt=0; jt<natoms;jt++)

  {

   if(h != jt)

   {

  dx=(x11[h]-x11[jt]);

  dy=(y11[h]-y11[jt]);

  dz=(z11[h]-z11[jt]);

  r2 =dx*dx + dy*dy + dz*dz;

  r=sqrtf(r2);

  pcolhea1 = acosf(dz/r);

  for (l=0;l<lmax;l++)

  {

   if (l==0)

   {

    Lmn[0]=1.0;

    Lmn_all[jt][0][0] = 1.0;

   }

   else

   {

    legendre(lmax,l,pcolhea1,Lmn);

   }

     for(b_m=0;b_m<19;b_m++)

   {

    if(b_m<2*l+1)

    {

     m= -l + b_m;

     if (m >= 0)

     {

      Lmn_all[jt][l][m]= Lmn[m];

     }

    }

   }

  }

   }

  }

  cuDoubleComplex temp4,temp6;

  //float temp5[9];

  k2=0;

  for(n=0;n<etaang;n++) // loop 15.2

  {

   for(ll=0;ll<lmax;ll++)

   {

    p_ch_p= make_cuDoubleComplex(0,0);

    for(b_mm=0;b_mm<19;b_mm++)

    {

     if(b_mm<2*ll+1)

     {

      xt2  = make_cuDoubleComplex(0,0);

      for(ju=0; ju<natoms;ju++)

      {

       if(h != ju)

       {

        dx=(x11[h]-x11[ju]);

        dy=(y11[h]-y11[ju]);

        dz=(z11[h]-z11[ju]);

        r2 =dx*dx + dy*dy + dz*dz;

        r=sqrtf(r2);

        fophi1= atan2f(dy,dx);

        if (r <= rcut)

        {

         rcutfun = 0.5*(cosf(pi*r*rcut_inv)+1);

        }

        else

        {

         rcutfun = 0.0;

        }

        func1   = expf(-netaa[n]*r2);

        term1   = func1*rcutfun;

        G_p    = term1;

        mm = -ll+b_mm;

        m_1 = abs(mm);

        if (mm<0)

        {

       cuDoubleComplex temp1 = cuCmul(q, make_cuDoubleComplex(-m_1,0));

       cuDoubleComplex ex1 = cuCmul(temp1, make_cuDoubleComplex(fophi1,0));

       cuDoubleComplex comp_1_w = my_complex_exp(ex1);

       /* __device__ cuDoubleComplex my_complex_exp(cuDoubleComplex ex1){

          cuDoubleComplex comp_1_w;

          double s, c;

          double e = exp(ex1.x);

          sincos(ex1.y, &s, &c);

          comp_1_w.x = c * e;

          comp_1_w.y = s * e;

          return comp_1_w;

        }*/

       //comp_1_w = cexpf(ex1);

        }

        else

        {

       cuDoubleComplex temp2 = cuCmul(q, make_cuDoubleComplex(m_1,0));

       cuDoubleComplex ex2 = cuCmul(temp2, make_cuDoubleComplex(fophi1,0));

       cuDoubleComplex ex3 = my_complex_exp (ex2);

       /* __device__ cuDoubleComplex my_complex_exp (cuDoubleComplex ex2){

            cuDoubleComplex ex3;

            double s, c;

            double e = exp(ex2.x);

            sincos(ex2.y, &s, &c);

            ex3.x = c * e;

            ex3.y = s * e;

            return ex3;

          }*/

       //ex3 = cexpf(ex2);

       float result = 1.0;

       int zz;

       for(zz=9; zz>=1; zz--)

       {

        if(zz<=mm)

        {

          result *= (-1.0);

        }

       }

         cuDoubleComplex comp_1_w = cuCmul(ex3, make_cuDoubleComplex(result, 0));

        }

         cuDoubleComplex mult1 = cuCmul(comp_1_w, make_cuDoubleComplex(cons1[11][b_mm], 0));

        Lmmnn = Lmn_all[ju][ll][m_1];

         cuDoubleComplex useY1 = cuCmul(mult1, make_cuDoubleComplex(Lmmnn, 0));

         cuDoubleComplex mult4 = cuCmul(useY1, make_cuDoubleComplex(G_p, 0));

        cuDoubleComplex xt2 = cuCadd(xt2,mult4);

        //xt2 = xt2 +  mult4;

       }

      }

      xt6 = cuConj(xt2);

      cuDoubleComplex temp3 = cuCmul(xt2,xt6);

      p_ch_p = cuCadd(p_ch_p,temp3);

      //p_ch_p = p_ch_p + (xt2 * xt6);

      //tot_cnlm_sum_r[(n*30870)+(ll*3087)+(b_mm*147)+h] = crealf(xt2);

      //tot_cnlm_sum_i[(n*30870)+(ll*3087)+(b_mm*147)+h] = cimagf(xt2);

      //memcpy((float*)(inpt+(n*30870)+(ll*3087)+(b_mm*147)+h), tot_cnlm_sum_r, 1*sizeof(float));

      //memcpy((float*)(inpt+154350+(n*30870)+(ll*3087)+(b_mm*147)+h), tot_cnlm_sum_i, 1*sizeof(float));

      tot_cnlm_sum_p[h][n][ll][b_mm] = xt2;

     }

    }

    

  sub1 = 1/(gtotmax_f[k2]-gtotmin_f[k2]);

    cons2 = ((4*pi)/(float)(2*ll+1));

     temp4 = cuCmul(p_ch_p,make_cuDoubleComplex(cons2,0));

    temp5[k2] = meangtot_f[k2] + gtotmin_f[k2];

    temp6 = cuCsub(temp4, make_cuDoubleComplex(temp5[k2],0));

    cuDoubleComplex GiS_p[h][k2] = cuCmul(temp6, make_cuDoubleComplex(sub1,0));

    //GiS_p[h][k2] = ((cons2 * p_ch_p)- meangtot_f[k2]-gtotmin_f[k2])*(sub1);

    k2=k2+1;

    count++;

    

   }

  }

 }

}

 cuDoubleComplex my_complex_exp(cuDoubleComplex ex1){

          cuDoubleComplex comp_1_w;

          double s, c;

          double e = exp(ex1.x);

          sincos(ex1.y, &s, &c);

          comp_1_w.x = c * e;

          comp_1_w.y = s * e;

          return comp_1_w;

}

          

void legendre(int lmax, int l, float theta_dummy, float Lmn[11])

{

 int u,i,size_x,jj,hh,ind,nind;

 int m_5,j;

 float c,tol;

 float htol=2.2251e-308;

 float inf = 1e+10;

 float x,rootn[21],twocot;

 float s,P[11],sn;

 float tmp1,tmp2,tmp3;

 float trm1, trm2, trm3, trm4, trm5;

 float t,t11;

 for (i=0;i<11;i++)

 {

   Lmn[i]=0.0;

 }

 x=cosf(theta_dummy);

 float vec_u[21] = {0.0,1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0,9.0,10.0,11.0,12.0,13.0,14.0,15.0,16.0,17.0,18.0,19.0,20.0};

 for (u = 0; u<21; u++)

 {

    rootn[u] = 0.0;

 }

 for (u = 0; u<19; u++)

 {

   if (u<2*l+1){

     rootn[u] = sqrtf(vec_u[u]);}

 }

 s = sqrtf(1-(x*x));

 for (i = 0; i<9; i++)

 {

   if (i<l){

     P[i] = 0.0;}

 }

 twocot = x;

 if(x == -1)

 {

  twocot = inf;

 }

 else if(x == 1)

 {

  twocot = -inf;

 }

 else if(x !=  0)

 {

  twocot = -2*x/s;

 }

 sn = 0.0;

 float xyz=1.0;

 int sss;

 for(sss=1; sss<=9; sss++)

 {

   if(sss<=l){

     xyz *= (-s);}

 }

 sn = xyz;

 tol = sqrtf(htol);

 hh=0;

 nind=0;

 if ((x != 1) && ((float)(fabsf(sn)) >= tol))

 {

  nind = hh+1;

  hh = hh + 1;

 }

 if (hh > 0)

 {

    float d[15] = {0.5,0.25,0.166666667,0.125,0.1,0.083333333,0.071428571,0.0625,0.055555556,0.05,0.045454545,0.041666667,0.038461538,0.035714286,0.033333333};

    float dd;

  c = 1;

  for (i =0; i<9; i++)

  {

    if (i<l){

      //dd = (1-1/d[i]);

      dd = (1-d[i]);

      c *= dd;

    }

  }

  P[l] = sqrtf(c)*sn;  //10

  P[l-1] = P[l]*twocot*l/rootn[2*l]; //9

  tmp1 = P[l];

  tmp2 = P[l-1];

  for (m_5 = 7; m_5>=0; m_5+=-1)

  {

    if (m_5<=l-2)

    {

      trm1 = tmp2*twocot*(m_5+1);

      trm2 = tmp1*rootn[l+m_5+2];

      trm3 = rootn[l-m_5-1];

      trm4 = rootn[l+m_5+1];

      trm5 = rootn[l-m_5];

      tmp3 = (trm1-trm2*trm3)/(trm4*trm5);

      P[m_5] = tmp3;

      tmp1 = tmp2;

      tmp2 = tmp3;

      }

  }

 }

  for (i=0; i<10; i++)

  {

    if (i<l+1)

    {

      t = P[i];

      Lmn[i] = t;

    }

  }

  float abc=1.0;

  int uuu;

  for(uuu=1; uuu<=9; uuu++){

  if(uuu<=l){

    abc *= x;}

  }

   if (s == 0)

   {

      Lmn[0] = abc;

   }

  for (m_5=1; m_5<=8; m_5++)

  {

   if (m_5<=(l-1))

   {

       c = 1;

       for (j = 1; j<=16; j++)

       {

           if (j<=2*m_5)

           {

              c*=rootn[l-m_5+2+j-2];

           }

       }

       t = Lmn[m_5];

       t11 = c*t;

       Lmn[m_5] = t11;

   }

  }

  c = 1;

  for (i = 0; i<18; i++)

  {

      if (i<2*l)

      {

      c *= rootn[i+1];

      }

  }

  Lmn[l] *= c;

}

ERROR is error: initialization with “{…}” expected for aggregate object . Corresponding line in which error is coming is:
temp6 = cuCsub(temp4, make_cuDoubleComplex(temp5[k2],0));

This code cannot be used as is. For example, dxdx is not defined. You should use the code tags to post code to avoid unwanted formatting. (preformatted text (ctrl+e) in the editor).
What’s the command you use to compile the program?

Your GiS_p variable is defined here:

float GiS_p[147][59];

The actual error is arising from this line:

cuDoubleComplex GiS_p[h][k2] = cuCmul(temp6, make_cuDoubleComplex(sub1,0));

which from my perspective doesn’t make any sense at all, probably for several reasons. ( 1. You are attempting to set a variable; you shouldn’t add a variable type/definition there. 2. you are (attempting to) set a variable that isn’t used anywhere else in your posted code. However I digress. )

It appears you can “fix” this compilation error by changing the two lines as follows:

cuDoubleComplex GiS_p[147][59];

and

GiS_p[h][k2] = cuCmul(temp6, make_cuDoubleComplex(sub1,0));

I have no idea if this makes sense for your application. This is just based on what I would consider “sensible” usage of C or C++.

@striker159 Actually I am running this program on google collab. and now it is working perfectly fine without any errors. This code is written in CUDA without parallelisation. Now, I want some part of my code(involving for loops) to run on GPU. Can you please give your valuable suggestions for how to proceed further?
Thank you so much for replying.

@Robert_Crovella Yes, I also tried doing this and it worked this way and now code is running without any errors. As this code is not involving any utilization of GPU, I want to run some part of this code(involving for loops ) on GPU. How shall I proceed further?
Thanks for much help.
Really Appreciate that.

You may wish to learn CUDA programming. Here is an online course that is a good starting point.

@Robert_Crovella Thanks for sharing this link, But I have already parallelised some Codes using CUDA C such as matrix multiplication, distance calculation etc. As this code is big so I was confused from where should I start parallelize it.
Can you please give just a basic idea in context with this code.

When considering the porting process for a CPU code that you are unfamiliar with, a recommended first step is to profile the code (perhaps using a CPU/host profiler, although you can use nsight systems for this also) and find the “hotspots” and generally characterize the code for its performance behavior. Many profilers can give you a pareto list of function calls and the amount of time spent in each.

That is a possible first step. Chapter 2 of the tutorial here gives the general idea, but you could do something similar with various profilers.