hello,
For some reasons I want to register several arrays in my code as pinned memory. I’d like to mix cudaFortran and ACC.
In the simplest case when I have
real (8), dimension(:,:), allocatable :: a, b, c
integer n,m
n = 2000
m = 2000
allocate( a(n,m) )
allocate( b(n,m) )
allocate( c(n,m) )
do i = 1,n
do j = 1,m
a(i,j) = 100*i + j
b(i,j) = dsin(i*j*3.1415d0)
enddo
enddo
!$acc data create(a,b,c)
!$acc update device(a,b) async
...
and compiler’s options
pgfortran -acc -ta=nvidia,cc35,pin -Mcuda -Minfo=accel test_ord.F -o test_ord
everyting works fine and I see ‘a’ and ‘b’ arrays are passed to GPU in two data movement ops. Without copying them to an internal buffer.
If I modify the code as
1 use cudafor
2 real (8), dimension(:,:), allocatable :: a, b, c
3 integer n,m,ierr,f
4
5 n = 2000
6 m = 2000
7
8 ierr = cudaSetDeviceFlags(cudaDeviceMapHost)
9 allocate( a(n,m) )
10 allocate( b(n,m) )
11 allocate( c(n,m) )
12
13 ierr = cudaHostRegister(C_LOC(a),n*m*8,cudaHostRegisterPortable)
14 ierr = cudaHostRegister(C_LOC(b),n*m*8,cudaHostRegisterPortable)
15 ierr = cudaHostRegister(C_LOC(c),n*m*8,cudaHostRegisterPortable)
16
17 do i = 1,n
18 do j = 1,m
19 a(i,j) = 100*i + j
20 b(i,j) = dsin(i*j*3.1415d0)
21 enddo
22 enddo
23
24
25 !$acc data create(a,b,c)
26 !$acc update device(a,b) async
and use command line
pgfortran -acc -ta=nvidia,cc35 -Mcuda -Minfo=accel test_ord_cuda.F -o test_ord_cuda
I see each array passed with two or three operations (depends on PGI_ACC_BUFFERSIZE)
Why does cudaHostRegister() won’t turn array memeory into pinned one?
What do I do wrong?