When I use nvc++ to compile code file with openACC, I got the error information:

#include <opencv2/opencv.hpp>
#include <opencv2/core.hpp>
#include <opencv2/imgcodecs.hpp>
#include <opencv2/highgui.hpp>

using namespace std;
using namespace cv;

int main(){

 cv::Mat srcImg=cv::imread("/home/testSpace/images/blue-mountains.jpg");

 cout<<"The file is not loaded or does not exist"<<endl;
 return -1;


 cout<<"Matrix"<<srcImg.rows<<" "<<srcImg.cols<<endl;

Mat duplicate(srcImg.rows,srcImg.cols, CV_8UC1,Scalar::all(255) );

int i,j=0;

#pragma acc enter data create(srcImg[0:srcImg.rows][0:srcImg.cols])

 #pragma acc update device(srcImg[:srcImg.rows][:srcImg.cols])
 #pragma acc parallel loop 
   #pragma acc loop



#pragma acc data copyout(srcImg[:srcImg.rows][:srcImg.cols])



// return 0;


40, Generating enter data create(srcImg)
45, Generating update device(srcImg)
Generating NVIDIA GPU code
45, #pragma acc loop gang /* blockIdx.x /
47, #pragma acc loop vector(128) /
threadIdx.x */
45, Generating implicit copyin(srcImg.step.p[:1],srcImg) [if not already present]
47, Loop is parallelizable
59, Generating copyout(srcImg) [if not already present]
T1 & cv::Mat::at(int, int):
1, include “opencv.hpp”
52, include “core.hpp”
59, include “mat.hpp”
3724, include “mat.inl.hpp”
1140, Generating implicit acc routine seq
Generating acc routine seq
Generating NVIDIA GPU code
cvflann::anyimpl::big_any_policy<std::__cxx11::basic_string<char, std::char_traits, std::allocator>>::print(std::basic_ostream<char, std::char_traits>&, void *const *):
1, include “opencv.hpp”
std::basic_ostream<T1, T2> & std::endl<char, std::char_traits>(std::basic_ostream<T1, T2> &):
1, include “opencv.hpp”
52, include “core.hpp”

My question is 1)why I got error information ’ Generating copyout(srcImg) [if not already present]', and 2) the result is Matrix810 1440 libgomp: TODO.

Hi glaciya2018,

2) the result is Matrix810 1440 libgomp: TODO.

I’m assuming you’re the same person who posted this question on stack overflow:

Same answer that I gave there in that you’ve most likely linked against the GNU OpenACC runtime (libgomp). Best guess is that it’s being brought in when you use ‘pgk-config’ to list the opencv2 libraries. I’d recommend you avoid doing this and only link with the libraries you need.

I also noticed that you added the “-nomp” flag. Presumably since we implicitly include our OpenMP runtime library, you were getting multiple defined references to the OpenMP runtime. If correct, then this is another sign that you’ve linked with libgomp. (GNU uses libgomp for both OpenMP and OpenACC).

1)why I got error information ’ Generating copyout(srcImg) [if not already present]’

Since you’re using unstructured data regions, the compiler needs to add the implicit copy clauses. Given OpenACC’s “present_or” semantics, at runtime the variable will be present and no data movement will occur.

However, you can remove the implicit copies by adding a “present” clause on the parallel loop, which tells the compiler to expect it to be present on the device. For example:

#pragma acc parallel loop present(srcImg)

Now this message may indicate another issue that you’ll encounter once you get your linking correct:

45, Generating implicit copyin(srcImg.step.p[:1],srcImg) [if not already present]

I don’t know the underlying structure of the “cv::Mat” type, but the compiler is seeing it as a class or struct. Hence my guess is that "srcImg[ ][ ]" may actually be an operator, and srcImg isn’t a 2D array. Hence how you’re using “create” and “update” may be problematic. Instead, you’d need to do a manual deep copy of the class in order to get the device copy to have the correct data layout.

Now given you probably don’t know the underlying data layout of a “cv:Mat”, it would be challenging to perform a deep copy to the device. I’d recommend you use CUDA Unified Memory by adding the “-gpu=managed” flag to you’re compilation and link flags. UM will have the CUDA driver implicitly manage data movement for all dynamically allocated data. Static objects still need to be managed via data directives, so you may still need to include “srcImg” in a data directive (without the brackets), but you wont need to manage the class’ underlying dynamic data.


Hello Mat,

Many thanks for your help. 1)With your suggestion, I reviewed the code and found the image should be transformed into 2d matrix instead of 3d matrix.
2)And when I removed -nomp the error still comes out. I just have one doubt whether there is any problem on the openacc directive. I will make more tests .

Hello Mat,

After several tests, there is no luck in the result. with or without -nomp it doesn’t change the final output. And I made a test which is about adding pkg-config opencv4 --cflags --libs or without adding pkg-config opencv4 --cflags --libs. It turns out that without pkg-config opencv4 --cflags --libs the result can show the value of variables.

Based on test above,could I think whether I should use nvc++ or pgc++ to compile the opencv during installation? Because I took a look at the reference library of opencv, all of the library files point to gnu, I guess that the opencv is built by GCC or g++ by using GNU software.

I wish you could provide any suggestions.

This is the test I use with and without opencv lib

#define N 5

using namespace std;

int main(){

int a[N];
int i,j=0;

#pragma acc data copyin(a[:N],i,j) //present(i,j)


    #pragma kernels present(a[:N],j,i)
for(int i=0;i<N;i++){
      // #pragma acc loop //present(i,j)
	for(int j=0;j<N;j++){


#pragma acc data copyout(a[:N],j,i)



Thanks in advance.

You can, but it shouldn’t be necessary. We’re object compatible with GNU, so you should be able to just link in the OpenCV libraries.

Again, what I’m guess is happening is that when you run the pkg-config command, it’s returning more than just the OpenCV libraries, but also libgomp. Since this is put before our OpenACC runtime on the link line, the linker is resolving the OpenACC symbols from the GNU OpenACC runtime.

What I suggest you try is running the “pkg-config opencv4 --cflags --libs” by itself, then explicitly add just the needed libraries to your link line.

Can you please post the output from “pkg-config opencv4 --cflags --libs”? Again, I just guessing here, but if I can see what the output is, it would help to confirm my theory.


Hello Mat,

Thanks for your reply.

Please take a look at the result by using “pkg-config opencv4 --cflags --libs” :

$ pkg-config opencv4 --cflags --libs


Thanks. I’m not seeing that the GOMP library is being added directly, but there may be a dependency in there someplace.

Does the output from the command “ldd <binary_name>” so any references to

I’m not sure what it would entail, but you can try building OpenCV with nvc++.

Hello Mat,

Please take a look at the information below :

From the information: => /lib/x86_64-linux-gnu/ (0x00007fba7d91e000

Lots of dependent libraries there. Not a big deal, but libgomp is definitely there. I have no idea as to why it’s getting in there, but presume there’s some dependency with one or more of the OpenCV libs.

This is long shot, but I’m wondering if you can use LD_PRELOAD to force the loader to load our runtime before libgomp. Something like:

LD_PRELOAD=/opt/nvidia/hpc_sdk/Linux_x86_64/22.7/compilers/lib/ <your_exe_name> <args>

Otherwise, you may need to rebuild OpenCV with the NVHPC compilers, or possibly still with GNU if you can disable OpenMP. Though, I’ve not built OpenCV myself, so wont be much help here.

Hello Mat,

Thanks for your reply.

I tried to add LD_PRELOAD before exe file,but unfortunately, the issue still exists.

 $ LD_PRELOAD=/opt/nvidia/hpc_sdk/Linux_x86_64/22.7/compilers/lib/ ./testi_opencv

libgomp: TODO

Then I tried to load the environment of nvhpc by following your hints.
I used module loading to solve the issue:

export MODULEPATH=$MODULEPATH:/opt/nvidia/hpc_sdk/modulefiles
module load nvhpc

$ export MODULEPATH=$MODULEPATH:/opt/nvidia/hpc_sdk/modulefiles
$ module load nvhpc
$ sh
     16, Generating copyin(a[:][:],temp) [if not already present]
     20, Generating present(a[:][:],temp)
     23, Loop is parallelizable
     25, Loop is parallelizable
         Generating NVIDIA GPU code
         23, #pragma acc loop gang, vector(128) collapse(2) /* blockIdx.x threadIdx.x */
         25,   /* blockIdx.x threadIdx.x auto-collapsed */
     43, Generating update self(temp,a[:])
$ ./testi_opencv

It works in the end.