Hi all,
I am now having a confusing problem when using OpenACC to accelerate a multi-block CFD case. I cannot attach the research CFD code here so I formulated a simple code to mimic the problem as far as I can. The simple formulated code (attached below) contains one main and one module subroutine. The main has a nested loop with three layers, which is accelerated by adding one “!$acc loop collapse(3)” clause and enclosed by one “!$acc parallel” clause. The loop size can be adjusted. Inside the loop, a routine is used to test printouts and the routine is defined as a module subroutine on the device. As you can see, the code is pretty simple. I compiled the code using “pgfortran -acc -ta=tesla:cc60,cuda8.0 -Minfo=accel loop_size_test.f90 -o loop_size_test”, and ran the code using “./loop_size_test > printout_ijk.dat” to pipe the outputs to a file. Then I checked the number of printouts in the file. If length = [72, 48, 3], then the total number of printouts should be 72483=10368, however the output file only has 4096 printouts. I also imported the data file into an excel file and ordered the data, and found some printouts are lost. If length = [20, 20, 3] (which I commented), then the number of printouts is correct. I tested many times with different loop sizes and found the maximum number of printouts is 4096. I am wondering why!
I am using a Nvidia P100 GPU, PGI/17.5 compiler. The code is attached here:
module printout
implicit none
contains
subroutine printout_ijk(i, j, k)
!$acc routine seq
integer, intent(in) :: i, j, k
continue
!Do nothing except for printing out i, j and k
print *, i, j, k
end subroutine printout_ijk
end module printout
program loop_size_test
use printout, only : printout_ijk
implicit none
integer, dimension(3) :: length
integer :: i, j, k
continue
length = [72, 48, 3]
!length = [20, 20, 3]
!$acc data copyin(length(1:3))
!$acc parallel
!$acc loop independent collapse(3)
do k = 1, length(3)
do j = 1, length(2)
do i = 1, length(1)
call printout_ijk(i, j, k)
end do
end do
end do
!$acc end parallel
!$acc end data
end program loop_size_test
Best Regards,
Weicheng Xue
PS: If I compile the formulated code with “pgfortran -acc -Minfo=accel loop_size_test.f90 -o loop_size_test”, then this code works fine. However, my CFD code requires me to specify the option “-ta=tesla:cc60”, otherwise there would be runtime errors. Therefore, I need to add the option because getting my CFD code to run correctly is my primary purpose. Also, I tested the code on an older GPU with compute capability 2.0, this code does not work correctly even without “-ta=tesla:cc60”