Hi,
I have recently run into a strange bug in the program I am working on. The error occurs at a device-to-host array assignment, but it does not happen every time I run the program. The code for the relevant subroutine is below:
subroutine CubicInterpVec3D(coords, result)
real(real_kind), dimension(:,:) :: coords
real(real_kind), dimension(:) :: result
integer :: nCoords, dimGrid, dimBlock
real(real_kind), device, allocatable, dimension(:,:) :: coordsDev
real(real_kind), device, allocatable, dimension(:) :: resultDev
if(allocFlag==1) then
nCoords = size(result)
if(size(coords,1) .ne. nCoords) then
print *, 'Number of coordinates is not equal to the number of desired interpolated values!'
stop 'Program terminated by cubic_bspline_interp_3D_mod:CubicVec3D'
endif
print *, 'Attempting to allocate device memory...'
allocate( coordsDev(nCoords, 3), resultDev(nCoords) )
print *, 'Attempting to copy test points to device...'
coordsDev = coords(1:nCoords, 1:3)
print *, 'Attempting to call the kernel...'
dimBlock = 16
dimGrid = max(1,nCoords/dimBlock+1)
call CubicInterpVec3D_kernel<<<dimGrid,dimBlock>>>(coordsDev,resultDev,nCoords)
print *, 'Attempting to copy results back to host...'
result=resultDev(1:nCoords)
!istat = cudaMemcpy(result,resultDev,nCoords)
print *, 'Deallocating device memory...'
deallocate(coordsDev,resultDev)
else
print *, 'Coefficient matrix not allocated on device yet!'
stop 'Program terminated by cubic_bspline_interp_3D_mod:CubicInterpVec3D'
endif
end subroutine CubicInterpVec3D
The error occurs at
result=resultDev(1:nCoords)
As you can see I have also tried using cudaMemcpy, but the same intermittent error shows up. The error is:
copyout Memcpy (host=0x16edf00, dev=0x1f94b00, size=200) FAILED:4
I am running a 9800GT on 64-bit Ubuntu Linux. Any help would be appreciated. I can post the full code if anyone needs it, but it doesn’t seem to be relevant as the error shows up at the very end of the program, after all the kernel calls and other stuff.