I have a fortran 90 code to try with CUDA Fortran (2.3). It has a lot of array aliasing, where they thought it would be nice to refer to double complex 5d arrays sometimes with double reduced-dimension arrays. This was done with argument aliasing, for example:
real array(18, N, 2, 4)
call subr( array )
…
subroutine subr( array )
complex array(3,3,N,2,4)
This doesn’t work with fortran modules, and that seems to be the best way to write sets of CUDA device arrays and routines that use the arrays. There is much whining about formal parameter mismatches, etc.
It doesn’t help that EQUIVALENCE is documented as unsupported for CUDA 2.3 … it does compile without error, but it doesn’t seem to work. Of course, I could have another error.
So, my long winded questions are
Is there a way to relax type conformance on array copies? ( host = device, etc ).
I have tried using the cudaMemcpy() routine, but it fails with errors like:
copyin Memcpy FAILED:4 Perhaps it is because these arrays are multi-dimensional.
The runtime routines are not real well documented. Is there a somewhere more details are provided on things like “cudaMemcpyToArray”? For instance, what are the dstx, dsty parameters?
Is device array equivalence really supposed to work? Section 3.2.1 of the v1.2 CUDA Fortran guide says no.
Is there a way to relax type conformance on array copies? ( host = device, etc ).
Sorry, no. Though, this is not specifically a CUDA Fortran issue. Rather, you would have the same problem with any module.
The work around is to write your CUDA Fortran module using complex, and have “subr” call the kernel. Something like:
module bar
contains
attributes(global) subroutine foo_kernel(x)
complex, device, dimension(*) :: x
i = threadidx%x
x(i) = cmplx(i*2.0-1.0,i*2.0)
return
end
end module bar
subroutine foo(x)
use bar
complex, device, dimension(*) :: x
call foo_kernel<<<1,50>>> (x)
end subroutine foo
program test
use cudafor
interface
subroutine foo(x)
real, device, dimension(*) :: x
end subroutine
end interface
real, allocatable, device, dimension(:) :: a
real ha(100)
allocate(a(100))
ha=-1
a = 0.0
call foo(a)
ha = a
print *,ha(1),ha(2),ha(99),ha(100)
end
It doesn’t help that EQUIVALENCE is documented as unsupported for CUDA 2.3 … it does compile without error, but it doesn’t seem to work. Of course, I could have another error.
The compiler should catch this and give an error if EQUIVALENCE is used. I’ve submitted a problem report (TPR#16726) to have this fixed.
Any pointers on documentation for the CUDA Fortran run time memory copy routines? The manual, even the newest one, is a little thin there. I’ll try matching them up with the CUDA C equivalents to decode the “dstx” and “dsty” arguments and so on.
Perhaps a runtime call to do copies would solve my problem.
Any pointers on documentation for the CUDA Fortran run time memory copy routines? The manual, even the newest one, is a little thin there. I’ll try matching them up with the CUDA C equivalents to decode the “dstx” and “dsty” arguments and so on.
Yes, the most frequent complaint we get from users is the lack of documentation and examples. It will get better over time.
Well then, at the risk of my questions getting dumber and dumber…
How does one go about specifying and calling CUDA C kernels from a CUDA Fortran program?
Since there is no linker … does one specify the source.cu for ‘pgfortran’ and it just works?
Yes, a lot of these questions will go away with a couple concrete examples being provided.
It would help more to support EQUIVALENCE. I have tested it, and for the uses I have, it does seem to work. I can understand why device COMMON is not supported, but given all the other limitations EQUIVALENCE would be very nice to have.
Well then, at the risk of my questions getting dumber and dumber…
How does one go about specifying and calling CUDA C kernels from a CUDA Fortran program?
Sarah
To answer my own question, here is an example. It was fairly easy, and yes, it was a dumb question.
nvcc -c csub.cu
pgfortran afort.CUF csub.o
afort.CUF
program afort
real, dimension(1000) :: this
real, device, dimension(1000) :: dthis
integer i
do i = 1, 1000
this(i)= i
enddo
dthis= this
call addone( dthis )
this = dthis
print *, this
return
end
To answer my own question, here is an example. It was fairly easy, and yes, it was a dumb question.
It’s actually one of our more common questions, so not dumb at all. Most users make it more difficult then it really is. As your example shows, it’s not much different then standard Fortran to C interoperability. For another example, I wrote an article (http://www.pgroup.com/lit/articles/insider/v2n1a4.htm) that has a CUDA Fortran program calling a CUDA C random number generator.
As for Equivalence, we do have a feature request in for it (TPR#16198). I’ll add a note that more users are asking for it and bump up it’s priority. It’s my understanding that it will be fairly difficult to implement since there isn’t a equivalent method to perform this in C.