Hey all, I am trying to integrate thermal fluctuations to my code.
For that I need random numbers, lots of them. I do not want to compute them beforehand because it would be a gigantic array.
So the problem is, I need the numbers generated within an #pragma acc kernels region. As I learned there is no straightforward method for doing so… So I took the: $PGI/linux86-64/2017/examples/CUDA-Libraries/cuRAND/test_rand_oacc_ftn/trand8.cpp" as a starting point.
What I do is something like:
#pragma acc routine vector
void please_give_me_rands(float *restrict a, int n);
...
int main(int argc, char** argv)
{...
curandState_t state;
/* Create arrays on the GPU */
#pragma acc data copy(...)\
copyin(..., state)\
create(..., yahoo[0:6])
{...
#pragma acc kernels device_type(nvidia)
{
#pragma acc loop independent \
device_type(nvidia)
for(int k=0; k < 9; k++)
{
#pragma acc loop independent \
device_type(nvidia)
for(int i=0; i < Lx; i++)
{
#pragma acc loop independent \
device_type(nvidia)
for(int j=0; j < Ly; j++)
{
please_give_me_rands(yahoo, 5);
#pragma acc loop \
reduction(+:stress_ghost) device_type(nvidia)
for(int m=3; m < 9; m++)
{
stress_ghost += w[k] * cc[m*9 + k] / sqrt(ww[m]) * yahoo[m-3];
}
zeta[ind(k,i,j)] = thermal_factor * sqrtf(rho[idx(i,j)]) * stress_ghost;
}
}
}
}
and the subroutine is given by trand8,
void please_give_me_rands(float *restrict a, int n) /*, float *restrict b*/
{
unsigned long long seed;
unsigned long long seq;
unsigned long long offset;
curandState_t state;
#pragma acc parallel num_gangs(1) pcopy(a[0:n]) private(state)
{
seed = 12345ULL; //4294967296ULL^time(NULL)
seq = 0ULL;
offset = 0ULL;
curand_init(seed, seq, offset, &state);
#pragma acc loop seq
for (int i = 0; i < n; i++) {
a[i] = curand_uniform(&state);
//b[i] = curand_normal(&state);
}
}
}
and compile this with something like
...
CXXFLAGS = -fast -ta=tesla,nollvm -Mcuda=cuda8.0 -Minfo=accel -Minline
CC = pgc++
WHERE = test
WHAT = Readme.dat
.PHONY: all
all: Swalbe.bin
Swalbe.bin: Swalbe.cpp initial.cpp fields.cpp
$(CC) $(CXXFLAGS) -o file.bin file.cpp
.PHONY: run srun clean
otherun: file.bin
ACC_DEVICE_NUM=1 ./file.bin $(WHAT) $(WHERE)
srun: file.bin
srun ./file.bin
clean:
rm -f file.bin *.o
Due to Minfo everything seems to be parallelized nicely however, there is one thing I do not understand,
PGCC-S-0155-Unsupported nested compute construct in compute construct or acc routine (file.cpp: 887)
PGCC-S-0155-Accelerator region ignored; see -Minfo messages (file.cpp)
please_give_me_rands(float *, int):
0, Accelerator region ignored
887, Accelerator restriction: invalid loop
PGCC/x86 Linux 17.5-0: compilation completed with severe errors
Makefile:23: recipe for target 'file.bin' failed
Sadly Minfo does not offer more information than this.
Is there a workaround for generating random numbers inside openacc kernels?
Best and thanks for the help,
Stefan