What is the best way to port code that generates random number on the fly using random_number() to the GPU using OpenACC ? PGI fails with:
PGF90-S-0155-Call to PGI runtime function not supported - pghpf_rnum
so i am guessing the OpenACC compiler doesn’t have something like the cuRAND library. The PGI compiler is v17.7
The code actually needs random numbers satisfying a Poisson distribution given a mean ($\lambda$) value. I can not generate the random numbers on CPU and copy them to the GPU to be used later because of the mean values are known only at run-time. I am not sure if there is a normalized version of the Poisson distribution that will avoid the need for the mean value to generate a random number.
You’ll need to call cuRAND directly from device code. An example on how to do this can be found in the compiler installation directory. “$PGI/linux86-64/2017/examples/CUDA-Libraries/cuRAND/test_rand_oacc_ftn/trand5.f90”
There is a performance cost in creating each random number generator, so I’d recommend having multiple threads share a generator (such as one per gang). However, having too many threads share it can cause contention. It may take some experimentation to determine the right mix for your program.
Note that you will need compile with the “-ta=tesla:nollvm” since we need to compile using the CUDA C backend, as opposed to the LLVM back-end, in order to include the required CUDA header files. Please see trand5’s Makefile for an example.