It appears that pgi cuda fortran does not allow “print” in gpu. So I tried to use emulate mode -Mcuda=emu to compile the code. However, when I ran my code, I got this error message:
Error in cudaMemcpy … 1
This is the subroutine containing such error message:
subroutine wrap_cudaMemcpyHostToDevice(dstPtr, srcPtr, n)
c input parameters
real, device, allocatable, dimension(:) :: dstPtr
real, dimension(:) :: srcPtr
integer :: n
c temporary
integer error
c cudaMemcpy
error = cudaMemcpy(dstPtr, srcPtr, n, cudaMemcpyHostToDevice)
c error checking
if (error.ne.0) then
print *, “Error in cudaMemcpy … 1”
c print *, “Error in cudaMemcpy …”, cudaGetErrorString(error)
stop
endif
end subroutine
I’m not sure why you’re getting this error since we support cudaMemcpy in emulation mode. I’d need a reproducing example to tell. Though, I’d remove the “allocatable” argument from the definition of “dstPtr” since this can cause problems (in general not just for CUDA Fortran).
Also, I’m wondering why you’re using cudaMemcpy at all. You can simply use the assignment operator “dstPtr = srcPtr” to do the same thing.
FYI, we did add printing from device kernels in a later compiler version (though you need a cc20 device or newer).
I used dstPtr=srcPtr as you suggestion (see the code below). It passed that point but crashed.
What is the place I can post the whole code to make this emulation mode work?
Thanks a lot,
SJZ
pass dstPtr=srcPtr, Error in cudaMemcpy … 1
pass dstPtr=srcPtr, Error in cudaMemcpy … 1
pass dstPtr=srcPtr, Error in cudaMemcpy … 1
pass dstPtr=srcPtr, Error in cudaMemcpy … 1
pass dstPtr=srcPtr, Error in cudaMemcpy … 1
before calling soluvGPU
Segmentation fault
c-----------------------------------------------
c cudaMemcpy wrapper
c-----------------------------------------------
subroutine wrap_cudaMemcpyHostToDevice(dstPtr, srcPtr, n)
c input parameters
c real, device, allocatable, dimension(:) :: dstPtr
real, device, dimension(:) :: dstPtr
real, dimension(:) :: srcPtr
integer :: n
c temporary
integer error
c cudaMemcpy
c error = cudaMemcpy(dstPtr, srcPtr, n, cudaMemcpyHostToDevice)
dstPtr=srcPtr
print *, “pass dstPtr=srcPtr, Error in cudaMemcpy … 1”
c error checking
c if (error.ne.0) then
c print *, “Error in cudaMemcpy … 1”
c print *, “Error in cudaMemcpy …”, cudaGetErrorString(error)
c stop
c endif
end subroutine