Dear NVIDIA-team,
please regard the following test code calling cublasDgemm:
OpenACC:
subroutine test(A,B,C)
use cublas
use openacc
implicit none
real(kind=8) :: A(:,:), B(:,:), C(:,:)
!$acc data copyin(A,B), copyout(C)
!$acc host_data use_device(A,B,C)
call cublasDgemm('N','N', size(A,1), size(A,1), size(A,1), 1.0d0, A, size(A,1), B, size(B,1), 0.0d0, C, size(C,1))
!$acc end host_data
!$acc end data
end subroutine test
OpenMP:
subroutine test(A,B,C)
use cublas
use openacc
implicit none
real(kind=8) :: A(:,:), B(:,:), C(:,:)
!$omp target data map(to:A,B) map(from:C)
!$omp target data use_device_addr(A,B,C)
call cublasDgemm('N','N', size(A,1), size(A,1), size(A,1), 1.0d0, A, size(A,1), B, size(B,1), 0.0d0, C, size(C,1))
!$omp end target data
!$omp end target data
end subroutine test
When the codes are compiled without use_device* the compiler reports:
14, Generating copyin(a(:,:)) [if not already present]
Generating copyout(c(:,:)) [if not already present]
Generating copyin(b(:,:)) [if not already present]
16, Possible copy in and copy out of c in call to cublasdgemmcu_hpm
Possible copy in and copy out of b in call to cublasdgemmcu_hpm
Possible copy in and copy out of a in call to cublasdgemmcu_hpm
When the host_data use_device directive is used in OpenACC the Possible copy warning disappears. But in the OpenMP case the warning remains even when target data use_device_ptr or target data use_device_addr directives are used. What is the reason for this?
On the other hand, nsys shows that regardless whether use_device is used only 2 copyin and 1 copyout operation is generated.
So I am wondering whether use_device makes sense in these cases at all.
Thanks a lot and regards,
Rene’