I have been experimenting with changing devices for a MPI’zed CUDA-Fortran code. For some time I ran into a seg fault when trying to transfer certain arrays to the GPU after changing device.
It turns out that not only does one have to reallocate all device memory (logical, we are clearing the GPU), but also pinned memory has to be reallocated. Is that the expected behavior ?
The seg fault happens when trying to access the pinned data in any way, both copying to the device or accessing it on the host side.
The array is still marked as allocated though, and maintains it shape.
I believe the correct behavior is either that the pinned array should be marked as unallocated or that the data should still be available.
I tested with version 10.8 of the compiler. My workaround is to select the device just at the very beginning of the program, but it would be nice with a consistent state of data (i.e. either unaffected by the resetting of the device or automatically unallocated).
For illustration, the following program works fine :
PROGRAM test_set_device
USE cudafor
real, pinned, allocatable, dimension(:) :: x
real, device, allocatable, dimension(:) :: gx
integer :: ierr
ierr = cudaThreadExit(); if (ierr > 0) print *,cudaGetErrorString(ierr)
ierr = cudaSetDevice(0); if (ierr > 0) print *,cudaGetErrorString(ierr)
allocate( x(10))
allocate(gx(10))
print *, allocated(x), shape(x)
gx = x
END
while this one seg faults at the “gx=x” line :
PROGRAM test_set_device
USE cudafor
real, pinned, allocatable, dimension(:) :: x
real, device, allocatable, dimension(:) :: gx
integer :: ierr
allocate( x(10))
ierr = cudaThreadExit(); if (ierr > 0) print *,cudaGetErrorString(ierr)
ierr = cudaSetDevice(0); if (ierr > 0) print *,cudaGetErrorString(ierr)
allocate(gx(10))
print *, allocated(x), shape(x)
gx = x
END
and this one seg faults at the “y=x(1)” line :
PROGRAM test_set_device
USE cudafor
real :: y
real, pinned, allocatable, dimension(:) :: x
real, device, allocatable, dimension(:) :: gx
integer :: ierr
allocate( x(10))
x(1) = 1
ierr = cudaThreadExit(); if (ierr > 0) print *,cudaGetErrorString(ierr)
ierr = cudaSetDevice(0); if (ierr > 0) print *,cudaGetErrorString(ierr)
allocate(gx(10))
print *, allocated(x), shape(x)
y = x(1)
END