I have recently run into a strange bug in the program I am working on. The error occurs at a device-to-host array assignment, but it does not happen every time I run the program. The code for the relevant subroutine is below:
subroutine CubicInterpVec3D(coords, result) real(real_kind), dimension(:,:) :: coords real(real_kind), dimension(:) :: result integer :: nCoords, dimGrid, dimBlock real(real_kind), device, allocatable, dimension(:,:) :: coordsDev real(real_kind), device, allocatable, dimension(:) :: resultDev if(allocFlag==1) then nCoords = size(result) if(size(coords,1) .ne. nCoords) then print *, 'Number of coordinates is not equal to the number of desired interpolated values!' stop 'Program terminated by cubic_bspline_interp_3D_mod:CubicVec3D' endif print *, 'Attempting to allocate device memory...' allocate( coordsDev(nCoords, 3), resultDev(nCoords) ) print *, 'Attempting to copy test points to device...' coordsDev = coords(1:nCoords, 1:3) print *, 'Attempting to call the kernel...' dimBlock = 16 dimGrid = max(1,nCoords/dimBlock+1) call CubicInterpVec3D_kernel<<<dimGrid,dimBlock>>>(coordsDev,resultDev,nCoords) print *, 'Attempting to copy results back to host...' result=resultDev(1:nCoords) !istat = cudaMemcpy(result,resultDev,nCoords) print *, 'Deallocating device memory...' deallocate(coordsDev,resultDev) else print *, 'Coefficient matrix not allocated on device yet!' stop 'Program terminated by cubic_bspline_interp_3D_mod:CubicInterpVec3D' endif end subroutine CubicInterpVec3D
The error occurs at
As you can see I have also tried using cudaMemcpy, but the same intermittent error shows up. The error is:
copyout Memcpy (host=0x16edf00, dev=0x1f94b00, size=200) FAILED:4
I am running a 9800GT on 64-bit Ubuntu Linux. Any help would be appreciated. I can post the full code if anyone needs it, but it doesn’t seem to be relevant as the error shows up at the very end of the program, after all the kernel calls and other stuff.