Illegal Memory Access in CUDA Fortran Code

Hi,

I encountered an error message of “an illegal memory access” inside a kernel subroutine. I was trying to define a variable on the register memory in order to mitigate the bottleneck of device memory access. The subroutine looks as the following (with some omission):

type photon_pack
    integer(lg) :: seed
    integer(sm) :: i, j
    real(pk) :: energy, x, y, vx, vy
  end type photon_pack

  type energy_grid
    real(pk) :: energy, xgrid, ygrid, dx, dy
  end type energy_grid

  attributes(global) subroutine shoot_photon( base_seed, grid, dt, t_end, t_damp )
    use cudafor
    implicit none
    type(photon_pack) :: ptmp
    type(energy_grid), device :: grid(:,:)
    real(pk), device :: dt, t_end, t_damp
    real(pk) :: vx, vy, vmag, etmp, etrans, tratio
    integer(lg), device :: base_seed
    integer(lg) :: blockid, lph, nph, seed
    integer(sm) :: cur_corr(2), aft_corr(2), istat, itime, ntime, gridsize(2), n, m
    logical :: move_forw, move_back

    blockid = BlockIdx%x + GridDim%x * (BlockIdx%y + GridDim%y * (BlockIdx%z - 1) - 1)
    lph = ThreadIdx%x + BlockDim%x * (blockid - 1)
    seed = base_seed
    seed = seed + lph
    gridsize = size(grid)
    n = gridsize(1)
    m = gridsize(2)
    ntime = t_end/dt
    tratio = dt/t_damp

    call init_photon_dev(seed, n, m, ptmp)
    aft_corr(1) = ptmp%i; aft_corr(2) = ptmp%j

    do itime = 0, ntime - 1
      vx = 1.0; vy = 0.0          ! randomly picked
      vmag = sqrt(vx*vx + vy*vy)
      vx = vx * ptmp%energy / vmag
      vy = vy * ptmp%energy / vmag

      ptmp%x = ptmp%x + dt*vx
      ptmp%y = ptmp%y + dt*vy

      cur_corr = aft_corr
      ...... [omission]
      aft_corr(1) = ptmp%i; aft_corr(2) = ptmp%j

      istat = 0
      etmp = ptmp%energy * (1.0_pk - exp(-tratio))
!
! Illegal memory access if Atomic add is used...
!
      if (cur_corr(1) == aft_corr(1) .and. cur_corr(2) == aft_corr(2)) then
        istat = atomicadd(grid(cur_corr(1),cur_corr(2))%energy, etmp)
      else
        etmp = etmp / 2.0_pk
        istat = atomicadd(grid(cur_corr(1),cur_corr(2))%energy, etmp)
        istat = atomicadd(grid(aft_corr(1),aft_corr(2))%energy, etmp)
      end if 
!
! Illegal memory access if any of the elements of ptmp is modified?
! 
      etrans = ptmp%energy - etmp
      ptmp%energy = etrans

    end do
    return
  end subroutine shoot_photon

The purpose is to add the variables “etmp” into a device-memory variable “grid%energy” using atomic add. This caused the code to fail with an error of an illegal memory access. If I commented out the atomic part, the code could run (without doing anything); but if I add another two lines, the code also failed at the second line (“ptmp%energy = etrans”).

It seems to me that the subroutine failed when

  • attempting to add variables from register memory to device memory (?), or
  • attempting to copy values back to device memory.

Could you please let me know if any more options can be used to understand the origin of the error? Please also let me know if I can send in the whole code (a short one in one file) in case that helps.

Thank you very much for your help!

Jimmy

It seems like you have it pretty well nailed down. You can use cuda-memcheck to get the line number of the illegal memory access.

There may be some subtle issue with how the derived type is defined. Or we may have a compiler bug. Please send the shortest working code you can create that still shows the issue to support@pgroup.com and we’ll take a look.

  • Brent

Jimmy sent in this code and I took a closer look. The problem being his use of the “size” intrinsic. Without a DIM argument, the intrinsic will return the full extend of the array, not the extent of a single dimension. Hence “n” and “m” are 16384 not 128 as he expected and the illegal memory access is because he’s access in the grid array beyond it’s bounds.

-Mat

Jimmy sent in this code and I took a closer look. The problem being his use of the “size” intrinsic. Without a DIM argument, the intrinsic will return the full extend of the array, not the extent of a single dimension. Hence “n” and “m” are 16384 not 128 as he expected and the illegal memory access is because he’s access in the grid array beyond it’s bounds.

-Mat

Did you solve this problem? I get same problem. How can i solve?

Hi poyrazkocak,

“illegal memory access” is a generic error which occurs when accessing a protected or bad address. In this case, the problem was with the incorrect use of the “SIZE” intrinsic causing a memory access beyond the end of an array. Though, it can occur in many other circumstances.

Can you please provide more details on your issue?

-Mat

 !< CUDA variables
            integer :: istat
            integer :: MaxNumThreads                                                     
            integer :: MaxGridThreads                                                       
            integer, dimension(3) :: MaxGrid                                                !

            !< kernel variables
            real*8, dimension(NumParamVectors, NumberFreeParam), device :: ParameterVectorSet_d
            real*8, allocatable, dimension(:, :), device :: ModelFunctionListLocal_d
            real*8, dimension(NumParamVectors), device :: chi2ValuesVector_d


            type (cudaDeviceProp) :: prop                                                  
            type(dim3) :: grid, tBlock                                                    


            !< make settings for current GPU
            MaxNumThreads = prop%maxThreadsPerBlock                                         
            MaxNumThreads = 1024
            MaxGrid = prop%maxGridSize                                                   


            MaxGridThreads = min(NumParamVectors, MaxGrid(0))


            !< define
            tBlock = dim3(MaxNumThreads, 1, 1)
            grid = dim3(ceiling(real(NumParamVectors)/tBlock%x), 1, 1)


            chi2ValuesVector = chi2ValuesVector_d
            CALL CPU_TIME(t1)
            chi2ValuesVector_d = chi2ValuesVector    
            ParameterVectorSet = ParameterVectorSet_d
            call ModelCalcSpectrumGPU<<<  grid, tBlock >>>(ParameterVectorSet_d, chi2ValuesVector_d, ModelFunctionListLocal_d, NumParamVectors)
            CALL CPU_TIME(t2)
            print '("Time: t2 - t1 = ", F25.10, " sec")', t2 - t1
            istat = cudaDeviceSynchronize()
            chi2ValuesVector = chi2ValuesVector_d
            ParameterVectorSet = ParameterVectorSet_d
          '

I get this error in bolded two lines.
chi2ValuesVector = chi2ValuesVector_d
ParameterVectorSet = ParameterVectorSet_d
I can not do data transfer device to host.

It’s more likely occurring in your kernel. Unless you explicitly to test for an error after the kernel, the error will seemingly occur the next time the device is access, i.e. when you’re trying to copy the data back from the device.

Note, that you should also move your timer to after you’ve synchronized the device given the kernel will immediately return execution back to the host.

            call ModelCalcSpectrumGPU<<<  grid, tBlock >>>(ParameterVectorSet_d, chi2ValuesVector_d, ModelFunctionListLocal_d, NumParamVectors)             istat = cudaDeviceSynchronize() 
            ierr = cudaGetLastError()
            if (ierr .ne. 0) then
                print *, cudaGetErrorString(ierr)
                stop
             endif
            CALL CPU_TIME(t2) 
            print '("Time: t2 - t1 = ", F25.10, " sec")', t2 - t1

In “ModelCalcSpectrumGPU”, look for out-of-bounds accesses. Also given Fortran by default passes arguments by reference, make sure “NumParamVectors” is passed by value (i.e. add the “value” to it’s local definition in the kernel).

Hope this helps,
Mat