Easiest way to replace multi-dimensional std::vectors?

I have a C++ code that uses many 2d and 3d std::vectors. Is the only way to get this to work in OpenACC to convert them to C-style arrays? Maybe use C++11 smart/shared pointers?

Performance-wise, you may be better off converting to C-like arrays since they can be made to be contiguous in memory, but you can use std::vectors in OpenACC compute regions. However, vectors are not thread-safe so you should avoid using push or pop routines, resize, insert, etc. Basically, you need to build the vector on the host and only use the access operator, “[i]”, on the device.

Also, I would suggest using CUDA Unified Memory (pgc++: “-ta=tesla:managed” or nvc++: “-gpu=managed”) since vectors, especially vectors of pointers, can be difficult to manually managed.

Performance-wise, you may be better off converting to C-like arrays since they can be made to be contiguous in memory

you mean if I convert them to C-like arrays, the code will run faster than if the code worked with vectors?

I tried the code below:

int main() {
    std::vector< std::vector< std::vector<double> > > vec { {{1,2},{3,4}, {5,6},{7,8}}, 
    {{9,10}, {11,12}}, 
    {{13,14}, {15,16}, {17,18}} };

    #pragma acc parallel loop
    for (int i=0; i<3; i++) {
        for (int j=0; j<2; j++) {
            std::cout<<vec[i][j];
        }
    }
    return 0;
}

it works serially, but in OpenACC when I compile with

pgc++ -fast -ta=tesla:cuda9.2,managed -Minfo=accel -o runEx runEx.cpp && ./runEx

I get

> PGCC-S-0155-Procedures called in a compute region must have acc routine information: std::basic_ostream<char, std::char_traits<char>>::operator <<(int) 

main:
PGCC-S-0155-Accelerator region ignored
7, accelerator region ignored
780, Accelerator restriction: call to ‘std::basic_ostream<char, std::char_traits>::operator <<(int)’ with no acc routine information
PGCC/x86-64 Linux 19.10-0: compilation completed with severe errors

you mean if I convert them to C-like arrays, the code will run faster than if the code worked with vectors?

Accessing contiguous data across an OpenACC “vector” loop will lead to better performance over accessing non-contiguous data. I’m meaning that in C you have greater control over the data layout and there for better able to get the data in a contiguous format. Not that you can’t in C++, nor that you can’t write an non-contiguous data structure in C, only that you have greater control.

The error in the program is due to using “cout”. There’s only limited I/O available from the device, so switching to using printf will work around the problem.

% cat tst.cpp
#include <stdio.h>
#include <vector>
#include <iostream>

int main() {
    std::vector< std::vector< std::vector<double> > > vec { {{1,2},{3,4}, {5,6},{7,8}},
    {{9,10}, {11,12}},
    {{13,14}, {15,16}, {17,18}} };

    #pragma acc parallel loop
    for (int i=0; i<3; i++) {
        for (int j=0; j<2; j++) {
//            std::cout<<vec[i][j][0]<<std::endl;
              printf("%d %d %f\n",i,j,vec[i][j][0]);
        }
    }
    return 0;
}


% pgc++ -Minfo=accel tst.cpp -ta=tesla:managed ; a.out
main:
      6, Generating Tesla code
         11, #pragma acc loop gang /* blockIdx.x */
         12, #pragma acc loop vector(128) /* threadIdx.x */
      6, Generating implicit copy(vec) [if not already present]
     12, Loop is parallelizable
std::vector<std::vector<std::vector<double, std::allocator<double>>, std::allocator<std::vector<double, std::allocator<double>>>>, std::allocator<std::vector<std::vector<double, std::allocator<double>>, std::allocator<std::vector<double, std::allocator<double>>>>>>::operator [](unsigned long):
      2, include "vector"
          57, include "vector"
               10, include "stl_vector.h"
                  1041, Generating implicit acc routine seq
                        Generating acc routine seq
                        Generating Tesla code
std::vector<std::vector<double, std::allocator<double>>, std::allocator<std::vector<double, std::allocator<double>>>>::operator [](unsigned long):
      2, include "vector"
          57, include "vector"
               10, include "stl_vector.h"
                  1041, Generating implicit acc routine seq
                        Generating acc routine seq
                        Generating Tesla code
std::vector<double, std::allocator<double>>::operator [](unsigned long):
      2, include "vector"
          57, include "vector"
               10, include "stl_vector.h"
                  1041, Generating implicit acc routine seq
                        Generating acc routine seq
                        Generating Tesla code
0 0 1.000000
0 1 3.000000
1 0 9.000000
1 1 11.000000
2 0 13.000000
2 1 15.000000