i’m playing with the matrix multiplication example given in CUDA Fortran Programming Guide & Ref, p. 43-44 and i can’t get the data back from the device. what methods are currently supported?
btw, i had to add “device” to line 8 (p.43)
real, device :: A(N,M), B(M,L), C(N,L)
to compile the example. Unfortunately when run it it breaks with
copyout Memcpy FAILED:4
i tried using cudaMemcpy2D to do explicit data transfer but still get errors (stat > 0)