Hi,
I have a code in which reduction operation needs to be performed inside kernel function. Similar reduction is need in some other files. So what i am trying to do is having a common module for reduction functions which will be included in any files wherever required and will be called in those files
structure of code is something like this
common module file containing reduction subroutines
MODULE OPS_FORTRAN_CUDA_REDUCTIONS
USE ISO_C_BINDING
USE CUDAFOR
IMPLICIT NONE
CONTAINS
ATTRIBUTES (DEVICE) SUBROUTINE ReductionReal8(sharedReal8, reduction_res, input_value, reduction_op)
! Reduction operation
END SUBROUTINE ReductionReal8
END MODULE OPS_FORTRAN_CUDA_REDUCTIONS`
File which will contain call to reduction function
MODULE APPLY_STENCIL_KERNEL_MODULE
USE ISO_C_BINDING
USE CUDAFOR
USE OPS_FORTRAN_CUDA_REDUCTIONS
contains
attributes (device) subroutine apply_stencil_kernel_gpu(A, Anew, error)
....................
error = max( error, abs( Anew() - A() ))
end subroutine
attributes (global) subroutine apply_stencil_kernel_wrap(......)
..........................
call apply_stencil_kernel_gpu(......)
call ReductionFloat8(.....)
...............................
end subroutine
attributes (host) subroutine apply_stencil_kernel_host(....)
.........................
call apply_stencil_kernel_wrap <<<grid,tblock,nshared>>> (....)
.........................
end subroutine
The documentation says, for the Fortran “Relocatable device code is generated by default.”
Although i am getting error
NVFORTRAN-S-1253-Calls from device code to a host subroutine/function are not allowed - reductionfloat8
if i place the function ReductionFloat8 in the same file where it is called, this works.
Please help me to solve this