I am attempting to convert a large simulation CPU code to GPU. In the CPU code, I have many derived types that need to be transferred to the GPU before the kernel call. Therefore, I used OPENACC for data transfer before the Kernel call. However, I come across an error when transferring derived types which are single instances (not arrays). I have developed an example that produces same error. The error is as follows.
> Failing in Thread:1
> call to cuCtxSynchronize returned error 700: Illegal address during kernel execution
The source files are given below.
Module 1
module header
type :: t_sim_params
real :: time
end type t_sim_params
end module header
Module 2
module global
use header
type (t_sim_params), allocatable, target, save :: parameters
end module global
Module 3
module mathOps
use global
use header
contains
attributes(global) subroutine test_derived(d_parameters)
implicit none
type (t_sim_params) :: d_parameters
integer :: j,k
integer :: i
real :: seconds
i = threadIdx%x
write(*,*) 'I am thread ',i
seconds = (d_parameters % time )*2
write(*,*) seconds
end subroutine test_derived
end module mathOps
program test_deep_copy
use mathOps; use global; use cudafor; use openacc
implicit none
allocate(parameters)
parameters % time = 0.1
!$acc enter data copyin(parameters)
!$acc data present(parameters)
call test_derived<<<1, 32>>>(parameters)
!$acc end data
!$acc wait
!$acc exit data copyout(parameters)
write(*,*) parameters % time
end program test_deep_copy
For completeness, I am writing the compilation command as below.
nvfortran -fast -acc -cuda -gpu=cc86,deepcopy -cpp -Minfo -o=test_derived header.f90 global.f90 deeptester.f90
The HPC SDK version is 22.5. I have following questions:
1. What is the possible reason for the cuCtxSynchronize error?
2. For a derived type that contains simulation parameters, what is the best way to keep only one copy of the type on device that is shared by all the threads?
3. Is it correct to use the keyword save for a derived data type that needs to be transferred to the device?