-Mcuda=emu issue

Hello,

I have a rather simple code which compile without errors in this case

pgfortran -Mcuda main.cuf

But it gives me incorrect result. I want to use the emulation to be able to debug, but
when I compile with

pgfortran -Mcuda=emu main.cuf -v

I get the following error message:


/opt/pgi/linux86/10.0/bin/pgf901 main.cuf -opt 1 -nohpf -nostatic -x 119 0x100000 -x 19 0x400000 -x 59 4 -x 15 2 -x 49 0x400004 -x 51 0x20 -x 57 0x4c -x 58 0x10000 -x 124 0x1000 -x 57 0xfb0000 -x 58 0x78031040 -x 48 3328 -stdinc /opt/pgi/linux86/10.0/include:/usr/local/include:/usr/lib/gcc/x86_64-redhat-linux/4.1.2/include:/usr/lib/gcc/x86_64-redhat-linux/4.1.2/32/include:/usr/include -def unix -def __unix -def unix -def linux -def __linux -def linux -def i386 -def __i386 -def i386 -def __NO_MATH_INLINES -def linux86 -def __THROW= -def extension= -def SSE -def MMX -def SSE2 -def SSE3 -def SSE4A -def ABM -freeform -x 137 1 -x 163 128 -vect 48 -x 137 1 -mp -x 137 2 -mp -x 69 0x200 -x 69 0x400 -x 69 2 -x 69 1 -modexport /tmp/pgfortranswVh2eukw2wS.cmod -modindex /tmp/pgfortranIwVhMR0Tf60c.cmdx -output /tmp/pgfortrancwVhgKxoVuSS.ilm
0 inform, 0 warnings, 0 severes, 0 fatal for inc_time
0 inform, 0 warnings, 0 severes, 0 fatal for inc_time_step
0 inform, 0 warnings, 0 severes, 0 fatal for initial_conditions
PGF90-S-0038-Symbol, .i0002, has not been explicitly declared (main.cuf: 124)
PGF90-S-0038-Symbol, .i0003, has not been explicitly declared (main.cuf: 124)
PGF90-S-0038-Symbol, .i0004, has not been explicitly declared (main.cuf: 124)
0 inform, 0 warnings, 3 severes, 0 fatal for burger
PGF90/x86 Linux 10.0-0: compilation completed with severe errors
pgfortran-Fatal-f901 completed with exit code 1

Unlinking /tmp/pgfortrancwVhgKxoVuSS.ilm
Unlinking /tmp/pgfortranswVh2eukw2wS.cmod
Unlinking /tmp/pgfortranIwVhMR0Tf60c.cmdx


The offending line (124) is the kernel invocation:

call inc_time_step<<<dimGrid,dimBlock>>(v_dev,tstep,overlap,cellsize, &
RE,block_size,shared_array_size,ncell)

Of course, all variables are declared (I use implicit none).

The question is, what does the emulation mode does differently than normal mode?

Just to check I compiled the sgemm.cuf sample code with -Mcuda=emu, and it worked.

As my question might be difficult to answer without the code, I include the main subroutine code prior to the error:


PROGRAM BURGER
USE CUDAFOR
USE INC_TIME
implicit none

integer, parameter :: ncell=8192, WARP_SIZE=32, GRID_SIZE=30

real(KIND=4), parameter :: Re=150.,boxsize=1.,tfinal=1.

real(KIND=4) :: cellsize,tnow,tstep,cellsize_sq
integer :: block_size,last_block_size,overlap,shared_array_size

real(KIND=4), dimension(0:ncell-1) :: v
real(KIND=4), device, allocatable, dimension(:) :: v_dev

integer :: i,ct,err
CHARACTER(LEN=3) stg

type(dim3) :: dimGrid, dimBlock


shared_array_size= ( (NCELL/GRID_SIZE+1)/WARP_SIZE)* & 
                                                  WARP_SIZE+2*WARP_SIZE+2
if( mod(shared_array_size-(NCELL/GRID_SIZE+1),2) == 1) then
  shared_array_size=shared_array_size+1;
endif
block_size=shared_array_size-2
overlap=(shared_array_size-(NCELL/GRID_SIZE+1) )/2

print*,'-----------------------------------------'
print*,(NCELL/GRID_SIZE+1),'cells for each domain'
print*,overlap,'overlap'
print*,block_size,'threads per domain'
print*,'-----------------------------------------'

cellsize=boxsize/ncell
cellsize_sq=cellsize*cellsize
tnow=0.
tstep=(tfinal-tnow)/2000000.

call initial_conditions(v,ncell)

! Copy data to GPU
allocate(v_dev(0:ncell-1))
v_dev=v

ct=0
DO WHILE (tnow <= tfinal)
  ct=ct+1
  dimGrid = dim3( GRID_SIZE, 1, 1 )
  dimBlock = dim3( block_size, 1, 1 )
  call inc_time_step<<<dimGrid,dimBlock>>>(v_dev,tstep,overlap,cellsize, & 
                                                                    RE,block_size,shared_array_size,ncell)

Thanks for your advice,

Hi benoit,

Thank you for the report and sample code. I was able to recreate the error and believe it to be an issue with the compiler. I have created a technical problem report (TPR#16391) and sent it to our engineers for further investigation.

Best Regards,
Mat

Thanks for your quick answer Mat,

Then, I guess the issue may not be solved in he next few hours.

Is there any way the probe the values of variables in a kernel while it is running other than using emulation mode? Does the debugger supports this? Blind debugging is an unappealing challenge :-)

Benoit.

Hi Benoit,

Is there any way the probe the values of variables in a kernel while it is running other than using emulation mode?

Unfortunately, not yet. Our tools engineers are working on GPU debugging support, but it’s not yet available.

Blind debugging is an unappealing challenge

I do apologize. Hopefully we can have this error fixed soon, or find a suitable workaround.

  • Mat

Hello Mat

I have found a work around it seems. It appears that removing “implicit none” in the
main program makes it possible to compile with -Mcuda=emu.

So I can leave it to check my variables when compiling with -Mcuda, then for debugging with -Mcuda=emu I can remove it.

I notice that “implicit none” is not used in the few fortran cuda examples available.
I find this strange. Maybe there is a specific reason. But in general, my feeling is that no fortran 90/95/2003 code should be written without it…