OpenACC (Fortran): Descriptor partially present?

I have dealt in the past with arrays “partially present” but I don’t quite understand the run-time error about a “descriptor” partially present.

FATAL ERROR: variable in data clause is partially present on the device: name=descriptor

It happens in

!$acc data present_or_copyin( … )

It’s very difficult to see what is happening since the array name is not given. Anybody knows how to get further information that could help to fix the problem?

Interestingly, this run time error happens on one system (Quadro K2200) and not the other (Quadro M4000) that I have access to. Also, I didn’t have this error with PGI 15 but I have it with PGI 16.

I appreciate any comments.

Hi OmidMehdizadeh,

Anybody knows how to get further information that could help to fix the problem?

There’s a couple of things you can do. One is to set the environment variable “PGI_ACC_DEBUG=1” to have the OpenACC emit debugging information about all calls to the runtime.

Also, you can add calls to “acc_present_dump” to have the compiler print out the contents of the present table. The present table keeps track of the host to device address mapping as well as the size of each of the variables.

A “partial present” error occurs when the size indicated in the present table is smaller than the size used in the data clause.

As to why it works on one device but not the other and with PGI 15 but not 16 is unclear. You might try installing PGI 17.10 in case it’s a compiler issue and has been fixed.

Hope this helps,
Mat

Thanks, Mat!

Indeed, acc_present_dump and PGI_ACC_DEBUG helped me to find a workaround for the problem that seems to be compiler related.

It seems that when several arrays that point to the same location on the host memory are created in a data region, some of their descriptors still exist in the “present table” even after the data region ends. These “zombie” descriptors later cause the “partially present” error elsewhere in the code when a new data region is created.

I have not yet tried with PGI 17.10; that would be the next step.

Best,

Omid

Hi Omid,

If it is a compiler issue, please either post or send a reproduce example to PGI Customer Service (trs@pgroup.com) and we’ll investigate.

Thanks,
Mat

Hi Mat,

Here is a simple way to reproduce the problem, and also to test the workaround (splitting the data region).

After calling the first subroutine (TEST1), the present table is empty as expected (the workaround). However, after calling the second subroutine (TEST2), the present table still contains a descriptor, which seems to cause the “partially present” error when the third subroutine(TEST3) is called.


REAL LARGE_ARRAY(1000)

CALL TEST1(LARGE_ARRAY(200),LARGE_ARRAY(200),LARGE_ARRAY(200))
CALL acc_present_dump()

CALL TEST2(LARGE_ARRAY(200),LARGE_ARRAY(200),LARGE_ARRAY(200))
CALL acc_present_dump()

CALL TEST3(LARGE_ARRAY(100),LARGE_ARRAY(100),LARGE_ARRAY(100))
CALL acc_present_dump()

SUBROUTINE TEST1(ARRAY1,ARRAY2,ARRAY3)
REAL ARRAY1(100),ARRAY2(100),ARRAY3(100)
!$acc data
!$acc& present_or_create(ARRAY1)
!$acc data
!$acc& present_or_create(ARRAY2)
!$acc data
!$acc& present_or_create(ARRAY3)
CALL acc_present_dump()
!$acc end data
!$acc end data
!$acc end data
END

SUBROUTINE TEST2(ARRAY1,ARRAY2,ARRAY3)
REAL ARRAY1(100),ARRAY2(100),ARRAY3(100)
!$acc data
!$acc& present_or_create(ARRAY1,ARRAY2,ARRAY3)
CALL acc_present_dump()
!$acc end data
END

SUBROUTINE TEST3(ARRAY1,ARRAY2,ARRAY3)
REAL ARRAY1(5,5,5),ARRAY2(5,5,5),ARRAY3(5,5,5)
!$acc data
!$acc& present_or_create(ARRAY1,ARRAY2,ARRAY3)
CALL acc_present_dump()
!$acc end data
END

Output

Present table dump for device[1]: NVIDIA Tesla GPU 0, compute capability 2.0, threadid=1
host:0x5fdd3cc device:0x2300200000 size:400 presentcount:3+0 line:2275 name:array1
host:0x7fffa2c1b0e8 device:0x2300200200 size:128 presentcount:1+0 line:2275 name:descriptor
allocated block device:0x2300200000 size:512 thread:1
allocated block device:0x2300200200 size:512 thread:1

Present table dump for device[1]: NVIDIA Tesla GPU 0, compute capability 2.0, threadid=1
…empty…
deleted block device:0x2300200000 size:512 thread 1
deleted block device:0x2300200200 size:512 thread 1

Present table dump for device[1]: NVIDIA Tesla GPU 0, compute capability 2.0, threadid=1
host:0x5fdd3cc device:0x2300200000 size:400 presentcount:3+0 line:2289 name:array1
host:0x7fffa2c1afd8 device:0x2300200200 size:128 presentcount:1+0 line:2289 name:descriptor
allocated block device:0x2300200000 size:512 thread:1
allocated block device:0x2300200200 size:512 thread:1

Present table dump for device[1]: NVIDIA Tesla GPU 0, compute capability 2.0, threadid=1
host:0x7fffa2c1afd8 device:0x2300200200 size:128 presentcount:1+0 line:2289 name:descriptor
allocated block device:0x2300200200 size:512 thread:1
deleted block device:0x2300200000 size:512 thread 1

Present table dump for device[1]: NVIDIA Tesla GPU 0, compute capability 2.0, threadid=1
host:0x5fdd23c device:0x2300200000 size:500 presentcount:3+0 line:2297 name:array1
host:0x7fffa2c1ae78 device:0x2300200400 size:224 presentcount:1+0 line:2297 name:descriptor
host:0x7fffa2c1afd8 device:0x2300200200 size:128 presentcount:1+0 line:2289 name:descriptor
allocated block device:0x2300200000 size:512 thread:1
allocated block device:0x2300200200 size:512 thread:1
allocated block device:0x2300200400 size:512 thread:1

descriptor lives at 0x7fffa2c1b048 size 224 partially present

Present table dump for device[1]: NVIDIA Tesla GPU 0, compute capability 2.0, threadid=1
host:0x7fffa2c1ae78 device:0x2300200400 size:224 presentcount:1+0 line:2297 name:descriptor
host:0x7fffa2c1afd8 device:0x2300200200 size:128 presentcount:1+0 line:2289 name:descriptor
allocated block device:0x2300200200 size:512 thread:1
allocated block device:0x2300200400 size:512 thread:1
deleted block device:0x2300200000 size:512 thread 1

FATAL ERROR: variable in data clause is partially present on the device: name=descriptor
file:main.F test3 line:2300

Hi Omid,

I tried a variety of systems and compiler versions but was unable to recreate your issue. All cases succeeded for me.

Here’s an example. Is there anything I need to change to get this to fail in the same way as your run?

-Mat


% cat test.f
        REAL LARGE_ARRAY(1000)

        CALL TEST1(LARGE_ARRAY(200),LARGE_ARRAY(200),LARGE_ARRAY(200))
        CALL acc_present_dump()

        CALL TEST2(LARGE_ARRAY(200),LARGE_ARRAY(200),LARGE_ARRAY(200))
        CALL acc_present_dump()

        CALL TEST3(LARGE_ARRAY(100),LARGE_ARRAY(100),LARGE_ARRAY(100))
        CALL acc_present_dump()
        end
         SUBROUTINE TEST1(ARRAY1,ARRAY2,ARRAY3)
         REAL ARRAY1(100),ARRAY2(100),ARRAY3(100)
!$acc data present_or_create(ARRAY1)
!$acc data present_or_create(ARRAY2)
!$acc data present_or_create(ARRAY3)
         CALL acc_present_dump()
!$acc end data
!$acc end data
!$acc end data
         END
         SUBROUTINE TEST2(ARRAY1,ARRAY2,ARRAY3)
         REAL ARRAY1(100),ARRAY2(100),ARRAY3(100)
!$acc data present_or_create(ARRAY1,ARRAY2,ARRAY3)
         CALL acc_present_dump()
!$acc end data
         END
         SUBROUTINE TEST3(ARRAY1,ARRAY2,ARRAY3)
         REAL ARRAY1(5,5,5),ARRAY2(5,5,5),ARRAY3(5,5,5)
!$acc data present_or_create(ARRAY1,ARRAY2,ARRAY3)
         CALL acc_present_dump()
!$acc end data
         END
% pgf90 -V16.10 -ta=tesla:cc20 test.f -fast -Minfo=accel ; a.out
test1:
     14, Generating create(array1(:))
     15, Generating create(array2(:))
     16, Generating create(array3(:))
test2:
     24, Generating create(array1(:),array2(:),array3(:))
test3:
     30, Generating create(array1(:,:,:),array2(:,:,:),array3(:,:,:))
Present table dump for device[1]: NVIDIA Tesla GPU 0, compute capability 2.0, threadid=1
host:0x603b3c device:0x1303440000 size:400 presentcount:3+0 line:14 name:array1
host:0x7fffffffe018 device:0x1303440200 size:72 presentcount:1+0 line:14 name:descriptor
allocated block device:0x1303440000 size:512 thread:1
allocated block device:0x1303440200 size:512 thread:1
Present table dump for device[1]: NVIDIA Tesla GPU 0, compute capability 2.0, threadid=1
...empty...
deleted block   device:0x1303440000 size:512 thread 1
deleted block   device:0x1303440200 size:512 thread 1
Present table dump for device[1]: NVIDIA Tesla GPU 0, compute capability 2.0, threadid=1
host:0x603b3c device:0x1303440000 size:400 presentcount:3+0 line:24 name:array1
host:0x7fffffffe018 device:0x1303440200 size:72 presentcount:1+0 line:24 name:descriptor
allocated block device:0x1303440000 size:512 thread:1
allocated block device:0x1303440200 size:512 thread:1
Present table dump for device[1]: NVIDIA Tesla GPU 0, compute capability 2.0, threadid=1
...empty...
deleted block   device:0x1303440000 size:512 thread 1
deleted block   device:0x1303440200 size:512 thread 1
Present table dump for device[1]: NVIDIA Tesla GPU 0, compute capability 2.0, threadid=1
host:0x6039ac device:0x1303440000 size:500 presentcount:3+0 line:30 name:array1
host:0x7fffffffdfa8 device:0x1303440200 size:120 presentcount:1+0 line:30 name:descriptor
allocated block device:0x1303440000 size:512 thread:1
allocated block device:0x1303440200 size:512 thread:1
Present table dump for device[1]: NVIDIA Tesla GPU 0, compute capability 2.0, threadid=1
...empty...
deleted block   device:0x1303440000 size:512 thread 1
deleted block   device:0x1303440200 size:512 thread 1

Hi Mat,

There are slight differences in the compiler version and options; could you try these in case it is related?

pgfortran 16.9-0 64-bit target on x86-64 Linux -tp nehalem

-acc -Minfo=acc -ta=nvidia:cuda8.0

On my side, I’ve tried on 4 different machines and it fails on 2 but it works on the other. And, as mentioned, it works on all machines with version 15.9. I’m planning to test with 17.10 next week. Will let you know the result.

Best,

Omid

Hi Omid,

When using 16.9, I do see the behavior. Though, it must have been a specific issue with 16.9 since the problem does not occur in 16.7, 16.10, or other later versions of the compiler. Moving to 17.10 or 18.1 should work as well.

-Mat

Hi Mat,

Thanks a lot for your support and the investigation. We’ll upgrade the compiler version.

Best,

Omid