Host Pinned Memory Allocation

Dear Mat,

I am trying to get host pinned memory to work on a sample program before implementing it in our actual application.

The program is:
[sindimo@superbeast100]$ cat pinnedMemory2.f

module myModule
         contains
         subroutine MM (A,B,C) 

         use accel_lib
         use cudafor


         integer dimm1, dimm2, dimm3
         parameter (dimm1 = 10000, dimm2 = 10000, dimm3 = 10000)
         real start, finish
         real*8 :: A(:,:), B(:,:), C(:,:)

      call cpu_time(start)


!$acc region
        do j = 1, dimm3
        do i = 1, dimm1
          C(i, j) = 0
        enddo
        do k = 1, dimm2
          do i = 1, dimm1
            C(i, j) = C(i, j) + A(i, k)*B(k, j)
          enddo
        enddo
       enddo
!$acc end region


      call cpu_time(finish)

       write(*,*) 'Time ',finish - start,' s'
     
      end subroutine MM
      end module myModule
        



         program main
         use myModule
         use accel_lib
         use cudafor


         integer dimm1, dimm2, dimm3, seed
         parameter (dimm1 = 10000, dimm2 = 10000, dimm3 = 10000)

         real*8, allocatable, pinned :: A(:,:), B(:,:), C(:,:)
!         real*8, allocatable :: A(:,:), B(:,:), C(:,:)

         allocate( A(dimm1,dimm2), B(dimm2,dimm3), C(dimm1,dimm3) ) 

          seed=7654321

              !populate 2 random matrices
                do i = 1, dimm1
                do j = 1, dimm2
                  A(i, j) = ran(seed)
               enddo
               enddo
               do i = 1, dimm2
               do j = 1, dimm3
               B(i, j) = ran(seed)
               enddo
               enddo

           do i = 1, 1
             call MM(A,B,C)
           enddo

         end program main

I compile it using the below and I get an error:
[sindimo@superbeast100]$ pgfortran -fast -Mcuda -ta=nvidia,time -Minfo=accel -mcmodel=medium -Minline pinnedMemory2.f

mm:
     17, Generating copyin(a(1:10000,1:10000))
         Generating copyin(b(1:10000,1:10000))
         Generating copyout(c(1:10000,1:10000))
         Generating compute capability 1.3 binary
     18, Loop is parallelizable
     19, Loop is parallelizable
         Accelerator kernel generated
         18, !$acc do parallel, vector(16)
         19, !$acc do parallel, vector(16)
             CC 1.3 : 6 registers; 24 shared, 44 constant, 0 local memory bytes; 100 occupancy
     22, Loop carried reuse of 'c' prevents parallelization
     23, Loop is parallelizable
         Accelerator kernel generated
         18, !$acc do parallel, vector(16)
         22, !$acc do seq
             Cached references to size [16x16] block of 'a'
             Cached references to size [16x16] block of 'b'
         23, !$acc do parallel, vector(16)
             Using register for 'c'
             CC 1.3 : 23 registers; 4120 shared, 60 constant, 0 local memory bytes; 50 occupancy
/tmp/pgfortrani0sdyrzEsTaX.o(.text+0x8ef): In function `main':
./pinnedMemory2.f:53: undefined reference to `pgf90_pinned_alloc03_i8'
/tmp/pgfortrani0sdyrzEsTaX.o(.text+0x9c1):./pinnedMemory2.f:53: undefined reference to `pgf90_pinned_alloc03_i8'
/tmp/pgfortrani0sdyrzEsTaX.o(.text+0xa92):./pinnedMemory2.f:53: undefined reference to `pgf90_pinned_alloc03_i8'

If I remove the “pinned” attribute in the deceleration, it works fine.

What am I missing here? I am already using the -Mcuda flag and “use cudafor”.

Thank you for your help.

Hi sindimo,

The problem here is that CUDA Fortran doesn’t yet support the medium memory. Removing the flag “-mcmodel=medium” will work around the error. This is a known limitation and has been logged as TPR#16947.

On a side note, mixing CUDA Fortran with the PGI Accelerator Model is not officially supported. Instead, you may wish to use the “cuf” kernel directive. (See the second part of Account Login | PGI)

Hope this helps,
Mat