Accelerator compiler bug with sequential rewriting matrices.

Ankhazam · December 20, 2010, 8:19am

Hi,
below code run on a GTX480 with CC30 results in an upredictable rewriting of values from copiedInArray from host memory to an local temporary array only on GPU.
The arrays are real*4 and have the same dimensions x=90, y=90, z=1500 (probably the z dimension is the matter here)

!$acc region
      do k=2,z
        do j=1,y
          do i=1,x
            localGPUArray(i,j,k) = copiedInArray(i,j,k)
          enddo
        end do
      end do
!$acc end region

It appears that the compiler divides the job in a weird matter between computation units on GPU (90x90x1499).

A fast fix to this problem, so that values in both arrays are the same on the same indexes was to make any of these loops sequential. However the compiler nor profiler have not shown any hint that without the !$acc do seq these calculations may work undesired.

!$acc region
      do k=2,z
        do j=1,y
!$acc do seq
          do i=1,x
            localArray(i,j,k) = copiedInArray(i,j,k)
          enddo
        end do
      end do
!$acc end region

If You know any better way to fill an local GPU array with host-uploaded data please let me know. I hope that You will be able to recreate this problem and address it with a fix :)

Regards,
Nicolas Dobski

MatColgrove · December 21, 2010, 4:23pm

Hi Nicolas,

It appears that the compiler divides the job in a weird matter between computation units on GPU (90x90x1499).

This makes sense given that your k loop starts at 2. The compiler will only allocate the minimum amount of space, hence in this case 1499. You can override this behavior using the copy and local clauses.

Can you post a reproducing example? Here’s my attempt to recreate the issue, but my simple example works fine.

% cat copy3d.f90


program copy3d

real, allocatable, dimension(:,:,:) :: A,B
integer :: i,j,k
integer :: x,y,z

x=90
y=90
z=1500

allocate(A(x,y,z), B(x,y,z))

do i=1,x
  do j = 1,y
    do k=1, z
       A(i,j,k)=real(i*j)/real(k)
    enddo
  enddo
enddo


!$acc region
do k=2, z
  do j = 1,y
    do i=1,x
       B(i,j,k) = A(i,j,k)
    enddo
  enddo
enddo
!$acc end region

print *, A(1,1,2), A(1,1,1500)
print *, B(1,1,2), B(1,1,1500)

end program copy3d

% pgf90 copy3d.f90 -ta=nvidia -Minfo=accel -V10.9 ; a.out
copy3d:
     24, Generating copyin(a(1:90,1:90,2:1500))
         Generating copyout(b(1:90,1:90,2:1500))
         Generating compute capability 1.0 binary
         Generating compute capability 1.3 binary
     25, Loop is parallelizable
     26, Loop is parallelizable
     27, Loop is parallelizable
         Accelerator kernel generated
         25, !$acc do parallel, vector(4)
         26, !$acc do parallel, vector(4)
         27, !$acc do vector(16)
             CC 1.0 : 8 registers; 24 shared, 52 constant, 0 local memory bytes; 100 occupancy
             CC 1.3 : 8 registers; 24 shared, 52 constant, 0 local memory bytes; 100 occupancy
   0.5000000       6.6666666E-04
   0.5000000       6.6666666E-04

Topic		Replies	Views
PGI Acc: Matrix-matrix-multiplication Legacy PGI Compilers	3	5218	September 10, 2010
Output incorrect when using accelerator Legacy PGI Compilers	3	4080	September 18, 2010
Complex loop carried dependence of 'd' Legacy PGI Compilers	5	20444	September 29, 2009
Matrix multiplication parallelizing Legacy PGI Compilers	4	6948	June 1, 2010
Privatization of array Legacy PGI Compilers	9	17664	July 14, 2010
Vector array assignments within a $acc parallel region Legacy PGI Compilers	13	11012	November 27, 2013
OPENACC changes value of array Legacy PGI Compilers	12	9782	May 17, 2016
Strangely long loop execution time Legacy PGI Compilers	5	6156	February 18, 2011
do seq Legacy PGI Compilers	1	2882	October 6, 2010
Fortran accelerator problem Legacy PGI Compilers	2	3385	November 18, 2011

Accelerator compiler bug with sequential rewriting matrices.

Related topics