NVC++-S-1061-Procedures called in a compute region must have acc routine information

Hello you’ll, I am trying to parallelize this For loop but I can’t do it with OpenACC Kernels, This is the error that I am getting, I think that everything is because of the " default_random_engine" class but I am still struggling, this is the For loop (it is inside of a function):


#pragma acc kernels
for(int j1=0; j1 < n_ions; j1++) {
	un[j1] = uc[j1] * (1.0 - fac1) + fx[j1] * fac1 + msvnd * normrand(generator);
		double delx = fac4 * (uc[j1] + un[j1] - 2.0 * fx[j1]) + fx[j1] * dt + msdnd * normrand(generator);
		x[j1] += delx;
		x[j1] -= boxf * round(x[j1] / boxf);

		vn[j1] = vc[j1] * (1.0 - fac1) + fy[j1] * fac1 + msvnd * normrand(generator);
		double dely = fac4 * (vc[j1] + vn[j1] - 2.0 * fy[j1]) + fy[j1] * dt + msdnd * normrand(generator);
		y[j1] += dely;
		y[j1] -= boxf * round(y[j1] / boxf);
		
		wn[j1] = wc[j1] * (1.0 - fac1) + fz[j1] * fac1 + msvnd * normrand(generator);
		double delz = fac4 * (wc[j1] + wn[j1] - 2.0 * fz[j1] )+ fz[j1] * dt + msdnd * normrand(generator);
		z[j1] += delz;
		z[j1] -= boxf * round(z[j1] / boxf);
	}

The error that I am having is : NVC+±S-1061-Procedures called in a compute region must have acc routine information - normrand(std::linear_congruential_engine<unsigned long, (unsigned long)16807, (unsigned long)0, (unsigned long)2147483647> *)

Can you help me with this please? thank you so much

Hi j_menendezsigaran,

In order to call a routine on the device, there needs to be device version of that routine. Here the compiler is telling you that no device routine exists for “normrand”.

Normally, you’d solve this by decorating the prototype and definition of the routine using an OpenACC “routine” directive. This will have the compiler create a device callable version. Here’s a blog post on how to use the routine directive. It’s from 2013 so a bit old and some of the material dated but hopefully still useful for the basic concept.

Is “normrand” part of your application or from a library? If in a library, this may be difficult to add.

Also, often RNGs are not parallelizable given they carry a static shared state variable. I don’t know how this one is implemented but you may need to look precomputing the random numbers, or using something like cuRand. I can also recommend the following pRNG implementation which I advised:

Pseudo Random Number Generation by Lightweight Threads | OpenACC.

Hope this helps,
Mat

Hello Mat, thank you so much for your help, “normrand” is a function inside of my application, but it is using the class " default_random_engine". I will take a look on the information, because I am trying to apply the “routine” and it still giving me the error.

Hello Mat, now I am having this error and I am trying different flags but still not working, could you help me pleas?
These are the flags that I am using and I don’t know if it is something related with the order:
CC := pgc++
CCFLAGS := -Wall --std=c++11 -lm -acc -gpu=cuda11.6 -Mcudalib=curand -Minfo=accel
DBGFLAGS := -g
CCOBJFLAGS := $(CCFLAGS) -c

compilation:
CLEAN bin/out debug/out obj/hic_aerosol_field_charging.o obj/main.o obj/util.o debug/hic_aerosol_field_charging.o debug/main.o debug/util.o
pgc++ -Wall --std=c++11 -lm -acc -gpu=cuda11.6 -Mcudalib=curand -Minline -Minfo=accel -c -o obj/hic_aerosol_field_charging.o src/hic_aerosol_field_charging.cpp
“src/hic_aerosol_field_charging.cpp”, line 74: warning: variable “ro” was set but never used
double ro;
^

get_r(double, double, double):
6, include “util.h”
25, Generating implicit acc routine seq
Generating acc routine seq
Generating NVIDIA GPU code
normrand2(std::linear_congruential_engine<unsigned long, (unsigned long)16807, (unsigned long)0, (unsigned long)2147483647> *):
1974, Accelerator restriction: unsupported operation: X87TODP
Accelerator restriction: datatype not supported: _in_10926
langevin_ions(double *, double *, double *, double *, double *, double *, double *, double *, double *, int, double, double, double, std::linear_congruential_engine<unsigned long, (unsigned long)16807, (unsigned long)0, (unsigned long)2147483647> *, double *, double *, double *, double, double, double):
16, Accelerator restriction: unsupported operation: X87TODP
adjust_pb(double *, double *, double *, double):
301, Generating implicit acc routine seq
Generating acc routine seq
Generating NVIDIA GPU code
f_coulomb(double *, double *, double *, double, double, double, double, double *, double, int, double *, double *, double *):
352, Generating implicit copy(r) [if not already present]
Generating implicit copyin(y[:n_ions-1],z[:n_ions-1],x[:n_ions-1],r_min) [if not already present]
Generating implicit copy(fz[:n_ions-1],fx[:n_ions-1],fy[:n_ions-1]) [if not already present]
364, Generating NVIDIA GPU code
364, #pragma acc loop seq
366, #pragma acc loop seq
366, Accelerator restriction: size of the GPU copy of fx,fy,fz,z,y,x is unknown
std::uniform_real_distribution::uniform_real_distribution(double, double):
2, include “algorithm”
10, include “algorithm”
62, include “stl_algo.h”
65, include “random”
50, include “random.h”
1916, Generating implicit acc routine seq
Generating acc routine seq
Generating NVIDIA GPU code
std::uniform_real_distribution::uniform_real_distribution(double, double) [subobject]:
0, Generating implicit acc routine seq
Generating acc routine seq
Generating NVIDIA GPU code
NVC+±W-0277-Cannot inline function _ZNSt25uniform_real_distributionIdEclISt26linear_congruential_engineImLm16807ELm0ELm2147483647EEEEdRT_RKNS0_10param_typeE - data type mismatch (src/hic_aerosol_field_charging.cpp: 1974)
double std::uniform_real_distribution::operator ()<std::linear_congruential_engine<unsigned long, (unsigned long)16807, (unsigned long)0, (unsigned long)2147483647>>(T1 &):
2, include “algorithm”
10, include “algorithm”
62, include “stl_algo.h”
65, include “random”
50, include “random.h”
1974, Generating implicit acc routine seq
Generating acc routine seq
Generating NVIDIA GPU code
NVC+±W-0277-Cannot inline function _ZNKSt25uniform_real_distributionIdE10param_type1bEv - data type mismatch (src/hic_aerosol_field_charging.cpp: 1983)
NVC+±W-0277-Cannot inline function _ZNKSt25uniform_real_distributionIdE10param_type1aEv - data type mismatch (src/hic_aerosol_field_charging.cpp: 1983)
NVC+±W-0277-Cannot inline function _ZNKSt25uniform_real_distributionIdE10param_type1aEv - data type mismatch (src/hic_aerosol_field_charging.cpp: 1983)
double std::uniform_real_distribution::operator ()<std::linear_congruential_engine<unsigned long, (unsigned long)16807, (unsigned long)0, (unsigned long)2147483647>>(T1 &, const std::uniform_real_distribution::param_type &):
188, Accelerator restriction: unsupported operation: X87TODP
Accelerator restriction: datatype not supported: _in_13424
std::uniform_real_distribution::param_type::param_type(double, double):
2, include “algorithm”
10, include “algorithm”
62, include “stl_algo.h”
65, include “random”
50, include “random.h”
1884, Generating implicit acc routine seq
Generating acc routine seq
Generating NVIDIA GPU code
NVC+±W-0277-Cannot inline function _ZNSt25uniform_real_distributionIdE10param_typeC1Edd - data type mismatch (src/hic_aerosol_field_charging.cpp: 1886)
std::uniform_real_distribution::param_type::a() const:
2, include “algorithm”
10, include “algorithm”
62, include “stl_algo.h”
65, include “random”
50, include “random.h”
1890, Generating implicit acc routine seq
Generating acc routine seq
Generating NVIDIA GPU code
std::uniform_real_distribution::param_type::b() const:
2, include “algorithm”
10, include “algorithm”
62, include “stl_algo.h”
65, include “random”
50, include “random.h”
1894, Generating implicit acc routine seq
Generating acc routine seq
Generating NVIDIA GPU code
std::linear_congruential_engine<unsigned long, (unsigned long)16807, (unsigned long)0, (unsigned long)2147483647>::operator ()():
2, include “algorithm”
10, include “algorithm”
62, include “stl_algo.h”
65, include “random”
50, include “random.h”
336, Generating implicit acc routine seq
Generating acc routine seq
Generating NVIDIA GPU code
std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, (unsigned long)16807, (unsigned long)0, (unsigned long)2147483647>, double>::_Adaptor(std::linear_congruential_engine<unsigned long, (unsigned long)16807, (unsigned long)0, (unsigned long)2147483647> &):
2, include “algorithm”
10, include “algorithm”
62, include “stl_algo.h”
65, include “random”
50, include “random.h”
170, Generating implicit acc routine seq
Generating acc routine seq
Generating NVIDIA GPU code
std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, (unsigned long)16807, (unsigned long)0, (unsigned long)2147483647>, double>::operator ()():
3467, Accelerator restriction: datatype not supported: _in_13977
3477, Accelerator restriction: unsupported operation: X87TODP
std::__detail::_Mod<unsigned long, (unsigned long)2147483647, (unsigned long)16807, (unsigned long)0, (bool)1, (bool)1>::__calc(unsigned long):
2, include “algorithm”
10, include “algorithm”
62, include “stl_algo.h”
65, include “random”
50, include “random.h”
139, Generating implicit acc routine seq
Generating acc routine seq
Generating NVIDIA GPU code
T1 std::__detail::__mod<unsigned long, (unsigned long)2147483647, (unsigned long)16807, (unsigned long)0>(T1):
2, include “algorithm”
10, include “algorithm”
62, include “stl_algo.h”
65, include “random”
50, include “random.h”
150, Generating implicit acc routine seq
Generating acc routine seq
Generating NVIDIA GPU code
std::log(long double):
1, include “cmath”
15, include “cmath”
362, Accelerator restriction: unsupported statement type: opcode=X87RETURN
6, include “util.h”
std::basic_ostream<T1, T2> & std::endl<char, std::char_traits>(std::basic_ostream<T1, T2> &):
6, include “util.h”
const T1 & std::min(const T1 &, const T1 &):
2, include “algorithm”
10, include “algorithm”
61, include “stl_algobase.h”
194, Generating implicit acc routine seq
Generating acc routine seq
Generating NVIDIA GPU code
T1 std::generate_canonical<double, (unsigned long)53, std::linear_congruential_engine<unsigned long, (unsigned long)16807, (unsigned long)0, (unsigned long)2147483647>>(T3 &):
2, include “algorithm”
10, include “algorithm”
62, include “stl_algo.h”
65, include “random”
52, include “random.tcc”
3469, Accelerator restriction: datatype not supported: __r
3478, Accelerator restriction: unsupported operation: X87TODP
const T1 & std::min(const T1 &, const T1 &):
2, include “algorithm”
10, include “algorithm”
61, include “stl_algobase.h”
194, Generating implicit acc routine seq
Generating acc routine seq
Generating NVIDIA GPU code
const T1 & std::max(const T1 &, const T1 &):
2, include “algorithm”
10, include “algorithm”
61, include “stl_algobase.h”
217, Generating implicit acc routine seq
Generating acc routine seq
Generating NVIDIA GPU code
nvvmCompileProgram error 9: NVVM_ERROR_COMPILATION.
Error: /tmp/pgaccCk-Tuhc38vgA.gpu (848, 23): parse use of undefined value ‘@_ZNSt25uniform_real_distributionIdEclISt26linear_congruential_engineImLm16807ELm0ELm2147483647EEEEdRT_RKNS0_10param_typeE’
NVC+±W-0155-Compiler failed to translate accelerator region (see -Minfo messages): Device compiler exited with error status code (src/hic_aerosol_field_charging.cpp: 1)
NVC++/x86-64 Linux 22.3-0: compilation completed with warnings
pgc++ -Wall --std=c++11 -lm -acc -gpu=cuda11.6 -Mcudalib=curand -Minline -Minfo=accel -c -o obj/main.o src/main.cpp
“src/main.cpp”, line 113: warning: variable “flim” was declared but never referenced
unsigned int flim = 1000000; // Limit the file to certain size in kB
^

“src/main.cpp”, line 114: warning: variable “flim_col” was declared but never referenced
unsigned int flim_col = 100000;
^

Looks like the compiler is unable to create several of device routines due to the use of “long double” (x87 80-bit) data types. (the X87TODP operation converts “long double” to “double”). “long double” isn’t supported by NVIDIA GPUs.

While I don’t have your code, I’m guessing you’re using “std::random”? “std::random” uses “long double” so isn’t supported. Also unless each thread has it’s own generator, you’d end-up having race conditions.

Again, I highly recommend you look to other solutions to generating random numbers on the device.