Host Pinned Memory Allocation

Dear Mat,

I am trying to get host pinned memory to work on a sample program before implementing it in our actual application.

The program is:
[sindimo@superbeast100]$ cat pinnedMemory2.f

module myModule
         contains
         subroutine MM (A,B,C) 

         use accel_lib
         use cudafor


         integer dimm1, dimm2, dimm3
         parameter (dimm1 = 10000, dimm2 = 10000, dimm3 = 10000)
         real start, finish
         real*8 :: A(:,:), B(:,:), C(:,:)

      call cpu_time(start)


!$acc region
        do j = 1, dimm3
        do i = 1, dimm1
          C(i, j) = 0
        enddo
        do k = 1, dimm2
          do i = 1, dimm1
            C(i, j) = C(i, j) + A(i, k)*B(k, j)
          enddo
        enddo
       enddo
!$acc end region


      call cpu_time(finish)

       write(*,*) 'Time ',finish - start,' s'
     
      end subroutine MM
      end module myModule
        



         program main
         use myModule
         use accel_lib
         use cudafor


         integer dimm1, dimm2, dimm3, seed
         parameter (dimm1 = 10000, dimm2 = 10000, dimm3 = 10000)

         real*8, allocatable, pinned :: A(:,:), B(:,:), C(:,:)
!         real*8, allocatable :: A(:,:), B(:,:), C(:,:)

         allocate( A(dimm1,dimm2), B(dimm2,dimm3), C(dimm1,dimm3) ) 

          seed=7654321

              !populate 2 random matrices
                do i = 1, dimm1
                do j = 1, dimm2
                  A(i, j) = ran(seed)
               enddo
               enddo
               do i = 1, dimm2
               do j = 1, dimm3
               B(i, j) = ran(seed)
               enddo
               enddo

           do i = 1, 1
             call MM(A,B,C)
           enddo

         end program main

I compile it using the below and I get an error:
[sindimo@superbeast100]$ pgfortran -fast -Mcuda -ta=nvidia,time -Minfo=accel -mcmodel=medium -Minline pinnedMemory2.f

mm:
     17, Generating copyin(a(1:10000,1:10000))
         Generating copyin(b(1:10000,1:10000))
         Generating copyout(c(1:10000,1:10000))
         Generating compute capability 1.3 binary
     18, Loop is parallelizable
     19, Loop is parallelizable
         Accelerator kernel generated
         18, !$acc do parallel, vector(16)
         19, !$acc do parallel, vector(16)
             CC 1.3 : 6 registers; 24 shared, 44 constant, 0 local memory bytes; 100 occupancy
     22, Loop carried reuse of 'c' prevents parallelization
     23, Loop is parallelizable
         Accelerator kernel generated
         18, !$acc do parallel, vector(16)
         22, !$acc do seq
             Cached references to size [16x16] block of 'a'
             Cached references to size [16x16] block of 'b'
         23, !$acc do parallel, vector(16)
             Using register for 'c'
             CC 1.3 : 23 registers; 4120 shared, 60 constant, 0 local memory bytes; 50 occupancy
/tmp/pgfortrani0sdyrzEsTaX.o(.text+0x8ef): In function `main':
./pinnedMemory2.f:53: undefined reference to `pgf90_pinned_alloc03_i8'
/tmp/pgfortrani0sdyrzEsTaX.o(.text+0x9c1):./pinnedMemory2.f:53: undefined reference to `pgf90_pinned_alloc03_i8'
/tmp/pgfortrani0sdyrzEsTaX.o(.text+0xa92):./pinnedMemory2.f:53: undefined reference to `pgf90_pinned_alloc03_i8'

If I remove the “pinned” attribute in the deceleration, it works fine.

What am I missing here? I am already using the -Mcuda flag and “use cudafor”.

Thank you for your help.

Hi sindimo,

The problem here is that CUDA Fortran doesn’t yet support the medium memory. Removing the flag “-mcmodel=medium” will work around the error. This is a known limitation and has been logged as TPR#16947.

On a side note, mixing CUDA Fortran with the PGI Accelerator Model is not officially supported. Instead, you may wish to use the “cuf” kernel directive. (See the second part of http://www.pgroup.com/lit/articles/insider/v2n3a1.htm)

Hope this helps,
Mat