Random Numbers in OpenACC

I am trying to generate uniform random numbers inside an OpenAcc parallel loop. Using rand() gives an error and I saw several posts in the forum that suggest either using cuRAND or generating the numbers on the host and then passing it to the GPU.

The issue with generating numbers on the host is, that it is a little complicated to determine how many random numbers I will need since my code is a bit complex. I all need is a function that can generate uniform random numbers between a range [A, B].

How do I use CUDA code like cuRand with OpenAcc cpp code? If I can use this, can I use the random functionality in the thrust library with OpenAcc code? Is there any resource that explains how to link and include these libraries with OpenAcc code (preferably with CMake)?

1 Like

Parallelizing an RNG is problematic in that there’s a shared state hence “rand” can’t be used. This isn’t just an OpenACC issue, but true for OpenMP or other parallel models on the host.

If you are using relatively few random numbers, you may consider pre-computing a set which can then be looked-up in the parallel region.

For larger sets, you can do this with the device side cuRAND (examples include with the compilers under the “examples/CUDA-Libraries/cuRAND” directory), however the cost of maintaining state for each iteration is high. Plus, you need to pass in a set of randomly generated seeds so each instance of the RNG is unique.

I had the opportunity to work with Johan Carlsson on a pure OpenACC device side RNG implementation. Like cuRAND, you’d want to use it if your generating many random number per loop iteration. Though unlike cuRAND Johan’s DES PRNG implementation is much lighter weight so has less overhead. For full details on DES PRNG see: Pseudo Random Number Generation by Lightweight Threads | OpenACC

can I use the random functionality in the thrust library with OpenAcc code?

I’ve never tried this before so not sure. Conceptionally it should be ok though I don’t know details nor what the overhead would be versus DES PRNG.

-Mat

1 Like

@MatColgrove Thank you for your reply.

Can you point me to more resources that use OpenAcc with CUDA code such as OpenAcc + cuRand or OpenAcc with Thrust? I am new to the HPC SDK and still trying to learn how to include and link the libraries with OpenAcc directives.

Also, I am considering computing a list of random numbers on the host and just passing that to the device but the issue is, that the code is a bit complex and it is not clear how many random numbers I will need. There is also the issue of indexing/accessing the random numbers. Is there a way to get the threadIdx like in CUDA in OpenAcc parallel loops?

For cuRAND, see the examples in the “examples/CUDA-Libraries/cuRAND”.

I don’t think there’s any example using Thrust in OpenACC code.

I’d encourage you to look at DES PRNG. It fairly straight forward to use, gives good random number distribution, and has a very small overhead in terms of the memory needed to maintain each thread’s state.

Though, pre-computing the set is the easiest method, provided the algorithm allows it.

There is also the issue of indexing/accessing the random numbers. Is there a way to get the threadIdx like in CUDA in OpenAcc parallel loops?

OpenACC itself doesn’t given it would be system specific, but we do have some undocumented API extensions you can used in the “openacc.h” header:

extern int __pgi_gangidx(void);
extern int __pgi_workeridx(void);
extern int __pgi_vectoridx(void);
extern int __pgi_blockidx(int);
extern int __pgi_threadidx(int);

We generally discourage folks from using them given they are extensions, but there are available.

Can you share the link to where “examples/CUDA-Libraries/cuRAND” is?

Are there examples where normal CUDA code is used with OpenAcc? My main objective is to learn how to use OpenAcc with CUDA and how to build the project and link the appropriate libraries.

Is there any example code that uses DES PRNG? I am reading the readme and it says to use make to create the libdesprng.a but it would be helpful if there was a sample code that showcases how to use this.

Can you share the link to where “examples/CUDA-Libraries/cuRAND” is?

Oh, sorry. This is the directory in the NVHPC SDK installation, for example:

/opt/nvhpc/Linux_x86_64/23.11/examples/CUDA-Libraries/cuRAND/

Adjust the base path to root install, the architecture, and compiler version accordingly.

Are there examples where normal CUDA code is used with OpenAcc? My main objective is to learn how to use OpenAcc with CUDA and how to build the project and link the appropriate libraries.

Most examples and tutorials are on CUDA/OpenACC interoperability, but might be helpful.

Doing a web search for “CUDA OpenACC interoperability”, here’s some links.

Keep in mind these are all about mixing CUDA, Thrust, and OpenACC in the same code and how to share device data between them. It’s not about including CUDA code within OpenACC compute regions.

nvc++ does support a limited amount of CUDA, primarily enough to build Thrust since we use Thrusts as the basis for our Standard language parallelism on GPUs. This is why I think conceptionally you might be able to use Thrust inside an OpenACC compute construct, but it’s not been tested nor supported.

Is there any example code that uses DES PRNG? I am reading the readme and it says to use make to create the libdesprng.a but it would be helpful if there was a sample code that showcases how to use this.

Have you looked at toypicmcc.c?