not able to fill in the array of floats allocated on the gpu

Please, help me. I have the following code:


#include <accelmath.h>
#include <openacc.h>

const long int G=100000;
const unsigned int GL=100000;
const long int K=G;
const int LE=1.0f;

struct Particle
{
float x;
float rs;
};
Particle particles[GL];
int sort[GL];
int ind01[GL];
long int MAX_ELEMENT=1;
int POSITION1;
int POSITION0;
int LIFE=0;

bool start=true;

int mini;
int count0;
int count1;
int GL1;
int js;

#pragma acc declare device_resident(K,particles,sort,ind01,POSITION0,POSITION1,mini,GL1,js,MAX_ELEMENT,count0,count1)
#pragma acc declare create(LIFE,particles,sort)

unsigned int Rand32(unsigned int xn)
{
u_quad_t a=0x5DEECE66D;
u_quad_t c=0xB;
return (unsigned int)((a*xn+c) & 0xFFFFFFFF);
}
double rndv(unsigned int xn)
{
return (double) xn / (double) 0x100000000LL;
}

#pragma acc update host(LIFE,particles) async
std::cout<<“LIFE after=”<<LIFE<<std::endl;
for(int i=0; i<LIFE; ++i) std::cout<<" particles: “<<particles_.x<<std::endl;
#pragma acc loop vector reduction(+:count0,count1)
for(int i=0; i<LIFE; ++i)
{
if(particles.x>=LE)
{
sort=1;
count1=count1+1;
}
else
{
sort=0;
count0=count0+1;
}
}
#pragma acc update host(LIFE,sort) async
std::cout<<“LIFE after 1=”<<LIFE<<” c0="<<count0<<" c1="<<count1<<std::endl;
for(int i=0; i<LIFE; ++i) std::cout<<"sort: "<<sort<<std::endl;
#pragma acc parallel num_gangs(1) vector_length(1)
{
GL1=LIFE-1;
count0=GL1;
count1=0;
}
#pragma acc loop seq
for(int i=0; i<LIFE; ++i)
{
if(sort==1)
{
ind01[count1++]=i;
}
else
{
ind01[count0–]=i;
}
}_

#pragma acc parallel num_gangs(1) vector_length(1)
{
mini=GL1-count0;
if(count1<mini) mini=count1;
js=0;
}
#pragma acc loop reduction(+:js)
for(int j=0; j<mini; ++j) js+=int(ind01[count1-1-j]>ind01[GL1-j]);
#pragma acc loop
for(int j=0; j<js; ++j) std::swap(particles[ind01[count1-1-j]],particles[ind01[GL1-j]]);

#pragma acc update device(LIFE) async
#pragma acc parallel num_gangs(1) vector_length(1)
{
POSITION0=GL1-count0;
POSITION1=count1;
count0=0;
count1=0;
LIFE-=POSITION0;
LIFE+=POSITION1;
}
#pragma acc parallel loop
for(int i=0; i<LIFE; ++i)
{
particles[i+POSITION1].x=particles_.x;
particles[i+POSITION1].rs=MAX_ELEMENT+i;
}
MAX_ELEMENT+=POSITION1;
#pragma acc loop
for(int i=0; i<POSITION1; ++i)
{
particles.rs=Rand32(particles.rs);
float part=rndv(particles.rs);
float x=particles.x;
float x1=xpart;
float x2=x
(1.0f-part);
particles.x=x1;
particles[i+POSITION1].x=x2;
}
#pragma acc update host(LIFE) async
}

int main(int argc, char **argv)
{
acc_init(acc_device_nvidia);
int step=1;
while(start==true || LIFE>0)
{
std::cout<<" LIFE="<<LIFE<<std::endl;
start=false;
function_device();
std::cout<<"MAIN LOOP # "<<step<<std::endl;
++step;
}
}

Here i fill in:
particles[LIFE].x=5.0;
the array “particles” allocated on the gpu. Bu the prints give the output that particles[0].x=0although LIFE=1 after this operation.

  1. How to fill in the array properly?
  2. Why the #pragma acc update(LIFE) async does not work without async?
  3. i try to launch a serial code on the gpu like this
    #pragma acc parallel num_gangs(1) vector_length(1)
    {
    POSITION0=GL1-count0;
    POSITION1=count1;
    count0=0;
    count1=0;
    LIFE-=POSITION0;
    LIFE+=POSITION1;
    }
    Is it correct?
  4. My task is to make all the opeartions except for output prints on the gpu(put all the variablesa nd arrays in the device_resident clause). If you have any advice how to fulfill it better, please, tell me._

Hi @and,

  1. How to fill in the array properly?

The formatting of code is a bit off and is incomplete so I can’t run your code, but one problem I see is that you’re missing “parallel” for the “loop” directive on the POSITION1 loop. “loop” directives must be contained within a compute directive or combined.

#pragma acc parallel loop 
 for(int i=0; i<POSITION1; ++i)

Also, I don’t see where you’ve updated the host copy of “particles”. Without an update, the values you print will be what ever host copy happens to contain.


  1. Why the #pragma acc update(LIFE) async does not work without async?

Using async shouldn’t matter here. Something else is causing your error.

  1. i try to launch a serial code on the gpu like this

Yes, this is the current method to create a serial compute region. The OpenACC 2.6 standard which was just ratified will introduce a “serial” region. PGI will add this sometime in the near future.

  1. My task is to make all the opeartions except for output prints on the gpu(put all the variablesa nd arrays in the device_resident clause). If you have any advice how to fulfill it better, please, tell me.

Can you post or send the complete code to PGI Customer Service (trs@pgroup.com) and ask them to forward it to me? Other that some minor errors, I think you’re on the right track, but without a complete example it’s difficult for me to tell if there are other issues. Having a reproducing example will help.

Thanks,
Mat

FYI, I just saw your post on StackOverflow with the complete code. I have posted a fix over there. https://stackoverflow.com/questions/47395226/not-able-to-fill-the-array-allocated-on-the-gpu