Perform managed and device arrays on different streams

I tried the example code in but got an error: Bus error (core dumped).
The code is here:

module mCuda
  integer :: num_cm
  type tCM
    integer, allocatable, device :: fine(:)
    real, allocatable, device :: mat_matrix(:,:,:)
    real, allocatable, device :: src_matrix(:,:,:)
  end type
  type(tCM), allocatable, managed :: cm_list(:)
end module

program test
use mCuda
use cudafor
integer(kind=cuda_stream_kind) :: str_mng, str_dev
integer i
istat = cudaStreamCreate(str_mng)
istat = cudaStreamCreate(str_dev)
istat = cudaforSetDefaultStream(cm_list, str_mng)
do i=1, num_cm
  istat = cudaforSetDefaultStream(cm_list(i)%fine, str_dev)

Thanks for your attention!

What type of system are you running on? Usually when you use managed memory and get a bus error, it means the host and device is accessing managed memory at the same time.

The hardware is Tesla K80, and the compiler platform is PrgEnv-pgi/16.1. Thanks!

Brent’s intuition is correct. For older cards such as a K80, if managed memory is accessed from both the host and device at the same time, you’ll get a bus error. The same issue does not occur on new P100 or V100 devices.

Here, the problem line is “cm_list(i)%fine=10” since you’re accessing managed memory on the host via “cm_list(i)”, but also accessing it on the device when assigning to device array "fine.

The simple solution is to make the “device” arrays “managed”. This will have the effect of updating the array on the host, but have them accessible on the device.


Thanks guys for the clarification!