CUDA-GDB: Variables are optimized out

Hello,
I am trying to debug some CUDA-Fortran code with CUDA-GDB. When trying to print out variables at a given breakpoint some of those are given as . In other forums I read that this is due to compiler optimization when several variables have the same value and if I do not want this to happen then I will have to turn off optimization with ‘-O0’.
As I read in the PGI Compiler User’s Guide the option ‘-g’ already implicitly sets the optimization level to zero. In any case, using ‘-g’ or ‘-g -O0’ does still not fix the issue of variables being optimized out.
Anything I can do to avoid this?

-Nils

I’m having the same issue right now. Any updates?

Thanks,
Ron

The issue needs more context. One thing the user can do is to add

-Mcuda=debug

and see if the behavior changes. But without an example, some information about whether the variables in question are GPU code variables or CPU code variables, and whether they are scalar or array type variables, is needed for an educated response.

Post an example that demonstrates variables ‘optimized out’ and how
you built and ran it with CUDA GDB. If you do not wish to make your example public, send it to trs@pgroup.com

dave

I have the same question still now, and -Mcuda is out dated. See the following example code:

module datastructures

use cudafor
!use nvtx_mod

INTEGER, PARAMETER :: EB = SELECTED_REAL_KIND(12) !< Precision of “Eight Byte” reals

TYPE BOUNDARY_COORD_TYPE

INTEGER :: II=0 !< Ghost cell x index
INTEGER :: JJ=0 !< Ghost cell y index
INTEGER :: KK=0 !< Ghost cell z index
INTEGER :: IIG=0 !< Gas cell x index
INTEGER :: JJG=0 !< Gas cell y index
INTEGER :: KKG=0 !< Gas cell z index
INTEGER :: IOR=0 !< Index of orientation of the WALL cell

REAL(EB) :: X !< x coordinate of boundary cell center
REAL(EB) :: Y !< y coordinate of boundary cell center
REAL(EB) :: Z !< z coordinate of boundary cell center
REAL(EB) :: X1 !< Lower x extent of boundary cell (m)
REAL(EB) :: X2 !< Upper x extent of boundary cell (m)
REAL(EB) :: Y1 !< Lower y extent of boundary cell (m)
REAL(EB) :: Y2 !< Upper y extent of boundary cell (m)
REAL(EB) :: Z1 !< Lower z extent of boundary cell (m)
REAL(EB) :: Z2 !< Upper z extent of boundary cell (m)

END TYPE BOUNDARY_COORD_TYPE

TYPE WALL_TYPE

REAL(EB) :: DUNDT=0._EB !< \f$ \partial u_n / \partial t \f$
REAL(EB) :: Q_LEAK=0._EB !< Heat production of leaking gas (W/m3)
REAL(EB) :: V_DEP=0._EB !< Deposition velocity (m/s)
REAL(EB) :: VEL_ERR_NEW=0._EB !< Velocity mismatch at mesh or solid boundary (m/s)

INTEGER :: BC_INDEX=0 !< Index within the array BOUNDARY_COORD
INTEGER :: OD_INDEX=0 !< Index within the array BOUNDARY_ONE_D
INTEGER :: TD_INDEX=0 !< Index within the array BOUNDARY_THR_D
INTEGER :: B1_INDEX=0 !< Index within the array BOUNDARY_PROP1
INTEGER :: B2_INDEX=0 !< Index within the array BOUNDARY_PROP2
INTEGER :: BR_INDEX=0 !< Index within the array BOUNDARY_RADIA
INTEGER :: SURF_INDEX=0 !< Index of the SURFace conditions
INTEGER :: BACK_INDEX=0 !< WALL index of back side of obstruction or exterior wall cell
INTEGER :: BACK_MESH=0 !< Mesh number on back side of obstruction or exterior wall cell
INTEGER :: BACK_SURF=0 !< SURF_INDEX on back side of obstruction or exterior wall cell
INTEGER :: BOUNDARY_TYPE=0 !< Descriptor: SOLID, MIRROR, OPEN, INTERPOLATED, etc
INTEGER :: SURF_INDEX_ORIG=0 !< Original SURFace index for this cell
INTEGER :: OBST_INDEX=0 !< Index of the OBSTruction
INTEGER :: PRESSURE_BC_INDEX !< Poisson boundary condition, NEUMANN or DIRICHLET
INTEGER :: VENT_INDEX=0 !< Index of the VENT containing this cell
INTEGER :: JD11_INDEX=0
INTEGER :: JD12_INDEX=0
INTEGER :: JD21_INDEX=0
INTEGER :: JD22_INDEX=0
INTEGER :: CUT_FACE_INDEX=0
INTEGER :: N_REALS=0 !< Number of reals to pack into restart or send/recv buffer
INTEGER :: N_INTEGERS=0 !< Number of integers to pack into restart or send/recv buffer
INTEGER :: N_LOGICALS=0 !< Number of logicals to pack into restart or send/recv buffer

END TYPE WALL_TYPE
! define a nested derived data type
type, public :: MESHES_type
type(WALL_TYPE), managed, allocatable, dimension(:) :: WALL
type(BOUNDARY_COORD_TYPE), managed, allocatable,dimension(:) :: BOUNDARY_COORD
end type MESHES_type

! define a nested derived data type variable on the device side
!type(MESHES_type), managed, allocatable, target :: MESHES(:)

type(MESHES_type), allocatable, target, dimension(:) :: MESHES

type(WALL_TYPE), managed, pointer, dimension(:) :: P_WALL, tmp_WALL
type(BOUNDARY_COORD_TYPE), managed, pointer,dimension(:) :: P_BOUNDARY_COORD, tmp_BOUNDARY_COORD

TYPE(WALL_TYPE), managed, POINTER :: WC
TYPE(BOUNDARY_COORD_TYPE),managed, POINTER :: BC

contains

subroutine test_cpy
implicit none
integer :: I, J, K, istat, IW

           allocate(MESHES(2))
           do I=1,2 
               allocate(MESHES(I)%WALL(5608))
               allocate(MESHES(I)%BOUNDARY_COORD(5608))
           enddo

            tmp_WALL => MESHES(1)%WALL
            tmp_BOUNDARY_COORD=> MESHES(1)%BOUNDARY_COORD

            !$CUF kernel do (1) <<<*,*, stream=cudaforGetDefaultStream() >>>
            DO IW=1, 300
                WC => tmp_WALL(IW)
                WC%BC_INDEX= IW
                BC => tmp_BOUNDARY_COORD(WC%BC_INDEX) 
                BC%IOR= IW
            enddo
            istat = cudaDeviceSynchronize()

           do I=1, 30
               print *, "done,  for WALL (I) :", tmp_WALL(I)%BC_INDEX 
               print *, "done,  for BOUNDARY_COORD (I) :", tmp_BOUNDARY_COORD(I)%IOR
           enddo

           do I=1,2
              deallocate(MESHES(I)%WALL)
              deallocate(MESHES(I)%BOUNDARY_COORD)
              !deallocate(Hostelement(I)%element)
           enddo
           deallocate(MESHES)
           !deallocate(MESHES, Hostelement)

    end subroutine test_cpy

end module datastructures

program main
!@cuf use cudafor
use datastructures
implicit none

call test_cpy

print*, “completed style 5”

end program main

compiled with:

nvfortran -g -cuda -gpu=debug -O0 -Mpreprocess -Mlarge_arrays -m64 -Wall -Werror -gpu=ccall,nomanaged,implicitsections -stdpar -traceback -Minfo=accel -cpp -o test_mg driver_managed.F90

I couldn’t print IW by using the cuda-gdb.

Any inputs about this?

Thanks

Not unexpected given IW is the index variable. It’s likely being put in a register so the variable itself can’t be queried. Note that even when setting “-O0”, some optimization must be applied in order for the compiler to perform the implicit transformation of the CUF kernel to GPU code. Hence some debugging information will lost due to this transformation.