FATAL ERROR: data in PRESENT clause was not found on device 1

Hello,

When I run the execution file,the error of result shows:FATAL ERROR: variable in data clause is partially present on the device: name=cols

Detail:
cols lives at 0x7ffd48eeff74 size 8 partially present
Present table dump for device[1]: NVIDIA Tesla GPU 0, compute capability 8.6, threadid=1
host:0x7ffd48eefea0 device:0x7fb79d6fa000 size:16 presentcount:0+1 line:5 name:antimat
deleted block   device:0x7fac3e000000 size:9695744 threadid=1 
FATAL ERROR: variable in data clause is partially present on the device: name=cols
Code part:

Array2D &WatershedAlg::antiInverseImage(Array2D&antimat,int &rows,int &cols) {

 #pragma acc data copyin(antimat,antimat.matImg[:rows][:cols])
 #pragma acc parallel loop collapse(2) 
    for(int i = 0; i < rows; i++) {
        for(int j = 0; j < cols; j++) {
            if( antimat(i,j) == ZERO ) {
                antimat(i,j) = ZERO;
            } else {
                antimat(i,j) = ONE;
            }
        }
    }
  #pragma acc update self(antimat,antimat.matImg[:rows][:cols])
  #pragma acc exit data delete(antimat,antimat.matImg[:rows][:cols])
  return antimat;
}
I don't understand why the antimat can not exist in the device,I have used data copyin()
Could someone provide any hint?Thanks in advance.

Hi glaciya2018,

A partially present typically means that another variable of a different size has the part of the same host to device address mapping. Often occurs when a variable is included in a enter data directive but missing from a exit data directive.

There not enough information here to determine specifically why you’re getting this error, but given it’s a stack variable, the same stack address could have been used to map a variable in another function that was not deleted on the device.

Though, do you need rows and cols passed by reference? If you can, have these variables passed in by value (i.e. change “int &” to “int”). Passing them by reference causes the compiler to have to implicitly copy them to the device (since they are now pointers) while passing them by value make them scalars that can then be “firstprivate”.

-Mat

I see that you posted the same question over on StackOverflow but included a link to your source. What’s possibly happening is in “WatershedAlg::thresholdMeasure” you have the following code:

auto *startImg=image.data;
 int imgrows=image.rows;
 int imgcols=image.cols;
 #pragma acc enter data copyin(startImg,startImg[:imgrows*imgcols], threshmat,threshmat.matImg[:imgrows][:imgcols])
... cut ...
    #pragma acc exit data delete (startImg[:imgrows*imgcols])

In the enter data directive you have “startimg”, “threshmat” and “threshmat.matImg”, but don’t include them in the corresponding exit data directive. Given these variables are on the stack, when you call “antiInverseImage” later and the compiler needs to implicitly copy “cols”, the same stack address is being reused and the sizes are different. Variables included in an enter data directive should also be in a corresponding exit data directive.

The correct directives would look like this:

#pragma acc enter data copyin(startImg[:imgrows*imgcols], threshmat,threshmat.matImg[:imgrows][:imgcols])
 ...
#pragma acc exit data delete(startImg,threshmat.matImg,threshmat)

Note that while you need the deep copy for threshmat since it’s a class, startimg is a simple array so can use a shallow copy.

-Mat

1 Like

Hello Mat,

Thanks for your suggestion.I 've updated the parallel region by adding threshmat in exit data part.

But I still have one question about the parallel part in thresholdMeasure.When I changed the code and ran the bin file,the warning said :Complex loop carried dependence of threshmat->,startImg-> prevents parallelization

I used __restrict to initialize the pointer *startImg;

Link:GitHub - Lancelof2019/wlgOpenACC01

Detail:

WatershedAlg::thresholdMeasure(cv::Mat &, Array2D<int> &):
     16, Generating enter data copyin(startImg[:imgcols*imgrows],threshmat->matImg[:threshmat->arows][:threshmat->acols],threshmat[:1])
         Generating NVIDIA GPU code
         19, #pragma acc loop gang, vector(128) /* blockIdx.x threadIdx.x */
         20, #pragma acc loop seq
     16, Generating default present(startImg[:],threshmat)
     20, Complex loop carried dependence of threshmat->,startImg-> prevents parallelization
     23, Generating update self(threshmat->matImg[:threshmat->arows][:threshmat->acols])
         Generating exit data delete(startImg[:1],threshmat[:1],threshmat->matImg[:threshmat->arows][:threshmat->acols])
Array2D<int>::operator ()(int, int):
      1, include "WatershedAlg.h"
           3, include "buffer.cpp"
               23, Generating implicit acc routine seq
                   Generating acc routine seq
                   Generating NVIDIA GPU code

Could you please provide any suggestion or hint?

The message is for the loop at line 20 and means that the compiler can’t implicitly auto-parallelize the loop due to the potential dependency.

If you’re wanting to parallelize this loop, you’ll need to either explicit add a “acc loop” directive to the inner loop or add “collapse(2)” or a “tile” clause to the outer parallel loop directive.

If you don’t want it parallelized, you can ignore the message.

-Mat

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.