I have a C++ code that uses many 2d and 3d std::vectors. Is the only way to get this to work in OpenACC to convert them to C-style arrays? Maybe use C++11 smart/shared pointers?
Performance-wise, you may be better off converting to C-like arrays since they can be made to be contiguous in memory, but you can use std::vectors in OpenACC compute regions. However, vectors are not thread-safe so you should avoid using push or pop routines, resize, insert, etc. Basically, you need to build the vector on the host and only use the access operator, “[i]”, on the device.
Also, I would suggest using CUDA Unified Memory (pgc++: “-ta=tesla:managed” or nvc++: “-gpu=managed”) since vectors, especially vectors of pointers, can be difficult to manually managed.
Performance-wise, you may be better off converting to C-like arrays since they can be made to be contiguous in memory
you mean if I convert them to C-like arrays, the code will run faster than if the code worked with vectors?
I tried the code below:
int main() {
std::vector< std::vector< std::vector<double> > > vec { {{1,2},{3,4}, {5,6},{7,8}},
{{9,10}, {11,12}},
{{13,14}, {15,16}, {17,18}} };
#pragma acc parallel loop
for (int i=0; i<3; i++) {
for (int j=0; j<2; j++) {
std::cout<<vec[i][j];
}
}
return 0;
}
it works serially, but in OpenACC when I compile with
pgc++ -fast -ta=tesla:cuda9.2,managed -Minfo=accel -o runEx runEx.cpp && ./runEx
I get
> PGCC-S-0155-Procedures called in a compute region must have acc routine information: std::basic_ostream<char, std::char_traits<char>>::operator <<(int)
main:
PGCC-S-0155-Accelerator region ignored
7, accelerator region ignored
780, Accelerator restriction: call to ‘std::basic_ostream<char, std::char_traits>::operator <<(int)’ with no acc routine information
PGCC/x86-64 Linux 19.10-0: compilation completed with severe errors
you mean if I convert them to C-like arrays, the code will run faster than if the code worked with vectors?
Accessing contiguous data across an OpenACC “vector” loop will lead to better performance over accessing non-contiguous data. I’m meaning that in C you have greater control over the data layout and there for better able to get the data in a contiguous format. Not that you can’t in C++, nor that you can’t write an non-contiguous data structure in C, only that you have greater control.
The error in the program is due to using “cout”. There’s only limited I/O available from the device, so switching to using printf will work around the problem.
% cat tst.cpp
#include <stdio.h>
#include <vector>
#include <iostream>
int main() {
std::vector< std::vector< std::vector<double> > > vec { {{1,2},{3,4}, {5,6},{7,8}},
{{9,10}, {11,12}},
{{13,14}, {15,16}, {17,18}} };
#pragma acc parallel loop
for (int i=0; i<3; i++) {
for (int j=0; j<2; j++) {
// std::cout<<vec[i][j][0]<<std::endl;
printf("%d %d %f\n",i,j,vec[i][j][0]);
}
}
return 0;
}
% pgc++ -Minfo=accel tst.cpp -ta=tesla:managed ; a.out
main:
6, Generating Tesla code
11, #pragma acc loop gang /* blockIdx.x */
12, #pragma acc loop vector(128) /* threadIdx.x */
6, Generating implicit copy(vec) [if not already present]
12, Loop is parallelizable
std::vector<std::vector<std::vector<double, std::allocator<double>>, std::allocator<std::vector<double, std::allocator<double>>>>, std::allocator<std::vector<std::vector<double, std::allocator<double>>, std::allocator<std::vector<double, std::allocator<double>>>>>>::operator [](unsigned long):
2, include "vector"
57, include "vector"
10, include "stl_vector.h"
1041, Generating implicit acc routine seq
Generating acc routine seq
Generating Tesla code
std::vector<std::vector<double, std::allocator<double>>, std::allocator<std::vector<double, std::allocator<double>>>>::operator [](unsigned long):
2, include "vector"
57, include "vector"
10, include "stl_vector.h"
1041, Generating implicit acc routine seq
Generating acc routine seq
Generating Tesla code
std::vector<double, std::allocator<double>>::operator [](unsigned long):
2, include "vector"
57, include "vector"
10, include "stl_vector.h"
1041, Generating implicit acc routine seq
Generating acc routine seq
Generating Tesla code
0 0 1.000000
0 1 3.000000
1 0 9.000000
1 1 11.000000
2 0 13.000000
2 1 15.000000