I’m trying to generate normally distributed random numbers in an OpenACC parallel loop, but since the C++ STL <random>
library doesn’t work with parallel code, I had a look at the cuRAND library. However, after looking around at various websites and guides, there seem to be many ways to use cuRAND, all of which look quite complicated (thread IDs, block IDs, mallocs, deviceptr directives, etc.). As such, I thought I’d give a simple example of using the <random>
library, and ask if there’s an equivalently simple way of doing this with cuRAND. The following program generates a vector of random numbers using the <random>
library:
#include <random>
#include <vector>
#include <iostream>
#include <chrono>
class RandomVectorGenerator {
public:
RandomVectorGenerator(const size_t num_elements, const float initial_value);
float add_random_noise_to_num(const float x);
void fill_vector_with_random_nums();
private:
size_t num_elements_;
std::vector<float> random_nums_;
std::random_device seed_;
std::ranlux24_base random_engine_;
};
RandomVectorGenerator::RandomVectorGenerator(const size_t num_elements, const float initial_value) {
num_elements_ = num_elements;
random_nums_ = std::vector<float>(num_elements_, initial_value);
random_engine_ = std::ranlux24_base(seed_());
#pragma acc enter data copyin(this)
}
// #pragma acc routine seq
float RandomVectorGenerator::add_random_noise_to_num(const float x) {
float mean = 0.0f;
float std_dev = 10.0f;
std::normal_distribution<float> random_num_sampler(mean, std_dev);
return x + random_num_sampler(random_engine_);
}
void RandomVectorGenerator::fill_vector_with_random_nums() {
float *random_nums_ptr = random_nums_.data();
// #pragma acc parallel loop copy(random_nums_ptr)
for(size_t i = 0; i < num_elements_; ++i) {
random_nums_ptr[i] = add_random_noise_to_num(random_nums_ptr[i]);
}
}
int main() {
size_t num_elements = 10000000;
float initial_value = 10.0f;
RandomVectorGenerator random_vector_generator(num_elements, initial_value);
auto start = std::chrono::system_clock::now();
random_vector_generator.fill_vector_with_random_nums();
auto end = std::chrono::system_clock::now();
std::chrono::duration<float> diff = end - start;
std::cout << "Random vector generation time: " << diff.count() << " seconds\n";
}
Where the seed and random number generator are instantiated in the RandomVectorGenerator
constructor as follows:
std::random_device seed_;
std::ranlux24_base random_engine_;
Furthermore, the add_random_noise_to_num()
function is used to produce random numbers using a normal distribution as follows:
std::normal_distribution<float> random_num_sampler(mean, std_dev);
return x + random_num_sampler(random_engine_);
Finally, the random numbers are generated and timed in the main()
function, where compiling and running the program produces:
$ nvc++ -O3 -acc -Minfo=accel random_gen_test.cpp
$ ./a.out
Random vector generation time: 1.75203 seconds
As such, I was wondering if there’s an equally simple way of generating normally distributed random numbers in a parallel loop using the cuRAND library instead of the STL <random>
library. Any help would be appreciated.