compiles but crashes at run-time

Hi,

Here’s a short code that compiles very happily yet it crashes at run-time with no error at all.

program main
	integer :: nums(3,3), vals(3,3)
	
	do j = 1,3
		do jj = 1,3
			nums(i,ii) = (i - 1) * 3 + ii
		enddo
	enddo
	
	!$acc region
	do i = 1,3
		do ii = 1,3
			vals(i,ii) = nums(i,ii) * 3
		enddo
		do ii = 1,3
			nums(i,ii) = vals(i,ii)
		enddo
	enddo
	!$acc end region
	
	do j = 1,3
		do jj = 1,3
			print *, nums(j,jj)
		enddo
	enddo
end program

All it does is set up nums to values 1…9, have vals be the holder of 3*nums and send vals back to nums. I understand that changing the accelerator region to the following would fix the problem,

	!$acc region
	do i = 1,3
		do ii = 1,3
			nums(i,ii) = nums(i,ii) * 3
		enddo
	enddo
	!$acc end region

but I am mostly curious as to why the first program does not work.

Hi WmBruce,

You have an error in source. In the first loop you use “i” and “ii” for the array index, but “j” and “jj” for the loop index. Changing “i” and “ii” to “j” and “jj” fixes the issue.

  • Mat

Wow, I apologize for the simple mistake. I was trying to simplify the problem by mimicking the code that crashed, and in my haste made that mistake.

So here is the portion of my actual code that caused the crash.

program main
	real :: particles(20,3), garbage(20,3)
	real, parameter :: PI = 4 * atan(1.0)
	
		do j = 1, 20
			particles(j,1) = 4000
			particles(j,2) = 0
			particles(j,3) = 0
		end do
	
	!$acc region
		do j = 1, 10
			do jj = 1, 20
				garbage(jj,1) = particles(jj,1) + 0.002
				garbage(jj,2) = particles(jj,2) + 2 * PI
				garbage(jj,3) = (particles(jj,1) + 1) * sin(particles(jj,2) * 1000)
			end do
			do jj = 1, 20
				particles(jj,1) = garbage(jj,1)
				particles(jj,2) = garbage(jj,2)
				particles(jj,3) = garbage(jj,3)
			end do
		enddo
	!$acc end region 	
end program

also im running PGI Worstation 10.6 (64bit) on Windows7 x64 and the the device is a Tesla C2050

Wow, I apologize for the simple mistake. I was trying to simplify the problem by mimicking the code that crashed, and in my haste made that mistake.

Not a problem.

The code is seg faulting because of a ‘copyin’ and ‘copyout’ mismatch for the particle array.

pgf90 -V10.6 test2.f90 -Minfo=accel -ta=nvidia
main:
     11, Generating copyin(particles(1:20,1:2))
         Generating copyout(particles(1:20,1:3))
         Generating copyout(garbage(1:20,1:3))

The easy fix is to change “!$acc region” to “!$acc region copy(particle)” so the compiler will copy in all of particles.


While doesn’t matter much for this program, for larger codes you might want to consider using data regions to help cut down on the number of data transfers. For example:

program main
   real :: particles(20,3), garbage(20,3)
   real, parameter :: PI = 4 * atan(1.0)

!$acc data region local(garbage), copyout(particles)

!$acc region
      do j = 1, 20
         particles(j,1) = 4000
         particles(j,2) = 0
         particles(j,3) = 0
      end do

      do j = 1, 10
         do jj = 1, 20
            garbage(jj,1) = particles(jj,1) + 0.002
            garbage(jj,2) = particles(jj,2) + 2 * PI
            garbage(jj,3) = (particles(jj,1) + 1) * sin(particles(jj,2) * 1000)
         end do
         do jj = 1, 20
            particles(jj,1) = garbage(jj,1)
            particles(jj,2) = garbage(jj,2)
            particles(jj,3) = garbage(jj,3)
         end do
      enddo
!$acc end region
!$acc end data region

print *, particles(1,1)

end program

Hope this helps,
Mat

I copy-pasted your program and compiled it and it still crashed.

Thanks for the data directive advice. It is actually going to be something I am going to be using a lot.

Sorry about that. I was using our soon to be released 10.8 version, not 10.6. The crash you’re seeing is due to a bug in 10.6. The work around (below) is to move the accelerator region into the body of the j loop.

  • Mat
% cat particle.f90
program main
   real :: particles(20,3), garbage(20,3)
   real, parameter :: PI = 4 * atan(1.0)

!$acc data region local(garbage), copyout(particles)

!$acc region
      do j = 1, 20
         particles(j,1) = 4000
         particles(j,2) = 0
         particles(j,3) = 0
      end do
!$acc end region

      do j = 1, 10
!$acc region
         do jj = 1, 20
            garbage(jj,1) = particles(jj,1) + 0.002
            garbage(jj,2) = particles(jj,2) + 2 * PI
            garbage(jj,3) = (particles(jj,1) + 1) * sin(particles(jj,2) * 1000)
         end do
         do jj = 1, 20
            particles(jj,1) = garbage(jj,1)
            particles(jj,2) = garbage(jj,2)
            particles(jj,3) = garbage(jj,3)
         end do
!$acc end region
      enddo
!$acc end data region

print *, particles(1,1)

end program
% pgf90 -ta=nvidia,keepgpu -V10.6 -fast -Minfo=accel -o particle.out particle.f90 -Mkeepasm -Manno
main:
      6, Generating local(garbage(:,:))
         Generating copyout(particles(:,:))
      8, Generating compute capability 1.0 binary
         Generating compute capability 1.3 binary
      9, Loop is parallelizable
         Accelerator kernel generated
          9, !$acc do parallel, vector(20)
             CC 1.0 : 4 registers; 20 shared, 16 constant, 0 local memory bytes; 33 occupancy
             CC 1.3 : 4 registers; 20 shared, 16 constant, 0 local memory bytes; 25 occupancy
     18, Generating compute capability 1.0 binary
         Generating compute capability 1.3 binary
     19, Loop is parallelizable
         Accelerator kernel generated
         19, !$acc do parallel, vector(20)
             Cached references to size [20x2] block of 'particles'
             CC 1.0 : 12 registers; 180 shared, 172 constant, 28 local memory bytes; 33 occupancy
             CC 1.3 : 12 registers; 180 shared, 172 constant, 28 local memory bytes; 25 occupancy
     24, Loop is parallelizable
         Accelerator kernel generated
         24, !$acc do parallel, vector(20)
             CC 1.0 : 5 registers; 20 shared, 72 constant, 0 local memory bytes; 33 occupancy
             CC 1.3 : 5 registers; 20 shared, 72 constant, 0 local memory bytes; 25 occupancy
% particle.out
    4000.020