CuCtxSynchronize error 700 for dervied type

I am attempting to convert a large simulation CPU code to GPU. In the CPU code, I have many derived types that need to be transferred to the GPU before the kernel call. Therefore, I used OPENACC for data transfer before the Kernel call. However, I come across an error when transferring derived types which are single instances (not arrays). I have developed an example that produces same error. The error is as follows.

> Failing in Thread:1
> call to cuCtxSynchronize returned error 700: Illegal address during kernel execution

The source files are given below.

Module 1

 module header
    type  :: t_sim_params
          real             ::    time       
   end type t_sim_params
 end module header

Module 2

 module global
 use header      
      type (t_sim_params), allocatable, target, save    :: parameters
 end module global

Module 3

 module mathOps
  use global
  use header   

 contains
  attributes(global) subroutine test_derived(d_parameters)
  
    implicit none
    type (t_sim_params)    :: d_parameters
    integer :: j,k
    integer :: i
    real    :: seconds

    i = threadIdx%x
    write(*,*) 'I am thread ',i
    seconds = (d_parameters % time )*2
    write(*,*) seconds

  end subroutine test_derived 
 end module mathOps



 program test_deep_copy
   use mathOps; use global; use cudafor; use openacc
   implicit none
   
   allocate(parameters)
   parameters % time = 0.1
 
   !$acc enter data copyin(parameters)
      !$acc data present(parameters)
          call test_derived<<<1, 32>>>(parameters)
      !$acc end data
      !$acc wait
   !$acc exit data copyout(parameters)
   write(*,*) parameters % time
 end program test_deep_copy

For completeness, I am writing the compilation command as below.

nvfortran -fast -acc -cuda -gpu=cc86,deepcopy -cpp  -Minfo -o=test_derived header.f90 global.f90 deeptester.f90

The HPC SDK version is 22.5. I have following questions:

1. What is the possible reason for the cuCtxSynchronize error?
2. For a derived type that contains simulation parameters, what is the best way to keep only one copy of the type on device that is shared by all the threads?
3. Is it correct to use the keyword save for a derived data type that needs to be transferred to the device?

Hi Khokhar,

What is the possible reason for the cuCtxSynchronize error?

You should use the “host_data” directive to pass an OpenACC device variable to a CUDA Fortran kernel.

% cat test.cuf
 module header
    type  :: t_sim_params
          real             ::    time
   end type t_sim_params
 end module header

 module global
 use header
      type (t_sim_params), allocatable, target, save    :: parameters
 end module global

 module mathOps
  use global
  use header

 contains
  attributes(global) subroutine test_derived(d_parameters)

    implicit none
    type (t_sim_params)    :: d_parameters
    integer :: j,k
    integer :: i
    real    :: seconds

    i = threadIdx%x
    write(*,*) 'I am thread ',i
    seconds = (d_parameters % time )*2
    write(*,*) seconds

  end subroutine test_derived
 end module mathOps

 program test_deep_copy
   use mathOps; use global; use cudafor; use openacc
   implicit none
   
   allocate(parameters)
   parameters % time = 0.1

   !$acc enter data copyin(parameters)
   !$acc host_data use_device(parameters)
          call test_derived<<<1, 32>>>(parameters)
   !$acc wait
   !$acc end host_data
   !$acc exit data copyout(parameters)
   write(*,*) parameters % time
 end program test_deep_copy
% nvfortran -acc -Minfo=accel test.cuf -V22.5; a.out
test_deep_copy:
     40, Generating enter data copyin(parameters)
     45, Generating exit data copyout(parameters)
 I am thread             1
 I am thread             2
 I am thread             3
 I am thread             4
 I am thread             5
 I am thread             6
 I am thread             7
 I am thread             8
 I am thread             9
 I am thread            10
 I am thread            11
 I am thread            12
 I am thread            13
 I am thread            14
 I am thread            15
 I am thread            16
 I am thread            17
 I am thread            18
 I am thread            19
 I am thread            20
 I am thread            21
 I am thread            22
 I am thread            23
 I am thread            24
 I am thread            25
 I am thread            26
 I am thread            27
 I am thread            28
 I am thread            29
 I am thread            30
 I am thread            31
 I am thread            32
   0.2000000
   0.2000000
   0.2000000
   0.2000000
   0.2000000
   0.2000000
   0.2000000
   0.2000000
   0.2000000
   0.2000000
   0.2000000
   0.2000000
   0.2000000
   0.2000000
   0.2000000
   0.2000000
   0.2000000
   0.2000000
   0.2000000
   0.2000000
   0.2000000
   0.2000000
   0.2000000
   0.2000000
   0.2000000
   0.2000000
   0.2000000
   0.2000000
   0.2000000
   0.2000000
   0.2000000
   0.2000000
   0.1000000

For a derived type that contains simulation parameters, what is the best way to keep only one copy of the type on device that is shared by all the threads?

How you have it is fine. “parameters” is global so all threads will be able to read from it and there’s only one copy.

Is it correct to use the keyword save for a derived data type that needs to be transferred to the device?

Doesn’t matter. Module variables have the save attribute by default so adding it just makes it explicit.

Hope this helps,
Mat

Thank you so much for elaborate answers. I was using host_data previously but for some reason (cannot recall) I omitted it. Thank you for pointing it out.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.