CUDA Fortran host device=device assignment

Just to clarify…A host program cannot copy device array to device array?
I will write a “copy kernel”, but am I misunderstanding this part of the “CUDA Fortran Programming Guide & Reference”, v.12 section 3.4.1 or is this a bug?

An assignment statement with a device variable or device array or array section on both sides of the assignment statement will copy data between two device variables or arrays.

pgfortran 10.3-0 64-bit target on x86-64 Linux -tp nehalem-64

:~ pgfortran -c foo.CUF
PGF90-S-0155-more than one device-resident object in assignment  (foo.CUF: 5)
  0 inform,   0 warnings,   1 severes, 0 fatal for test
:~ cat foo.CUF

program test
  use cudafor	
  real, device, dimension(100) :: ha, wha
  call foo(ha,wha)
  ha = wha
  call foo(ha,wha)
  stop
end program

Hi Sarah,

This is one of the cases where the implementation is behind the spec. We’re adding this support now with it becoming available sometime later this year.

Sorry,
Mat