Dear Mat,
I am trying to get host pinned memory to work on a sample program before implementing it in our actual application.
The program is:
[sindimo@superbeast100]$ cat pinnedMemory2.f
module myModule
contains
subroutine MM (A,B,C)
use accel_lib
use cudafor
integer dimm1, dimm2, dimm3
parameter (dimm1 = 10000, dimm2 = 10000, dimm3 = 10000)
real start, finish
real*8 :: A(:,:), B(:,:), C(:,:)
call cpu_time(start)
!$acc region
do j = 1, dimm3
do i = 1, dimm1
C(i, j) = 0
enddo
do k = 1, dimm2
do i = 1, dimm1
C(i, j) = C(i, j) + A(i, k)*B(k, j)
enddo
enddo
enddo
!$acc end region
call cpu_time(finish)
write(*,*) 'Time ',finish - start,' s'
end subroutine MM
end module myModule
program main
use myModule
use accel_lib
use cudafor
integer dimm1, dimm2, dimm3, seed
parameter (dimm1 = 10000, dimm2 = 10000, dimm3 = 10000)
real*8, allocatable, pinned :: A(:,:), B(:,:), C(:,:)
! real*8, allocatable :: A(:,:), B(:,:), C(:,:)
allocate( A(dimm1,dimm2), B(dimm2,dimm3), C(dimm1,dimm3) )
seed=7654321
!populate 2 random matrices
do i = 1, dimm1
do j = 1, dimm2
A(i, j) = ran(seed)
enddo
enddo
do i = 1, dimm2
do j = 1, dimm3
B(i, j) = ran(seed)
enddo
enddo
do i = 1, 1
call MM(A,B,C)
enddo
end program main
I compile it using the below and I get an error:
[sindimo@superbeast100]$ pgfortran -fast -Mcuda -ta=nvidia,time -Minfo=accel -mcmodel=medium -Minline pinnedMemory2.f
mm:
17, Generating copyin(a(1:10000,1:10000))
Generating copyin(b(1:10000,1:10000))
Generating copyout(c(1:10000,1:10000))
Generating compute capability 1.3 binary
18, Loop is parallelizable
19, Loop is parallelizable
Accelerator kernel generated
18, !$acc do parallel, vector(16)
19, !$acc do parallel, vector(16)
CC 1.3 : 6 registers; 24 shared, 44 constant, 0 local memory bytes; 100 occupancy
22, Loop carried reuse of 'c' prevents parallelization
23, Loop is parallelizable
Accelerator kernel generated
18, !$acc do parallel, vector(16)
22, !$acc do seq
Cached references to size [16x16] block of 'a'
Cached references to size [16x16] block of 'b'
23, !$acc do parallel, vector(16)
Using register for 'c'
CC 1.3 : 23 registers; 4120 shared, 60 constant, 0 local memory bytes; 50 occupancy
/tmp/pgfortrani0sdyrzEsTaX.o(.text+0x8ef): In function `main':
./pinnedMemory2.f:53: undefined reference to `pgf90_pinned_alloc03_i8'
/tmp/pgfortrani0sdyrzEsTaX.o(.text+0x9c1):./pinnedMemory2.f:53: undefined reference to `pgf90_pinned_alloc03_i8'
/tmp/pgfortrani0sdyrzEsTaX.o(.text+0xa92):./pinnedMemory2.f:53: undefined reference to `pgf90_pinned_alloc03_i8'
If I remove the “pinned” attribute in the deceleration, it works fine.
What am I missing here? I am already using the -Mcuda flag and “use cudafor”.
Thank you for your help.