Dear all,
I am not sure to get what are the constraints on the shared array declaration in CUDA Fortran. It seems that if the value for the dimension is an input variable of the kernel, it crashes with a out-of-bounds error. If the value is “hardcoded” (with an integer), it works. I realized that it can or cannot work if the variable is defined in a module, depending on where the module is located…
What are the rules to declare properly a shared array? How does it work exactly? Why does it work this way?
Few examples to illustrate my point:
Does not work:
- with the dimension passed as argument of the kernel:
attributes(global) subroutine mykernel(nxi)
implicit none
integer, value, intent(in) :: nxi
real(dp), shared, dimension(nxi) :: shared_error
(...)
end subroutine mykernel
- with the dimension from an outside module:
attributes(global) subroutine mykernel(nxi)
use external_commons, only: nx
implicit none
integer, value, intent(in) :: nxi
real(dp), shared, dimension(nx) :: shared_error
(...)
end subroutine mykernel
Work:
- with the dimension from a module defined in the same modulefile:
module commons
integer, parameter :: nx=128
end module
[...]
attributes(global) subroutine mykernel(nxi)
use commons, only: nx
implicit none
integer, value, intent(in) :: nxi
real(dp), shared, dimension(nx) :: shared_error
(...)
end subroutine mykernel
- with the dimension “hardcoded”:
attributes(global) subroutine mykernel(nxi)
implicit none
integer, value, intent(in) :: nxi
real(dp), shared, dimension(128) :: shared_error
(...)
end subroutine mykernel
Thank you for your answer!