Hello,
I have a rather simple code which compile without errors in this case
pgfortran -Mcuda main.cuf
But it gives me incorrect result. I want to use the emulation to be able to debug, but
when I compile with
pgfortran -Mcuda=emu main.cuf -v
I get the following error message:
/opt/pgi/linux86/10.0/bin/pgf901 main.cuf -opt 1 -nohpf -nostatic -x 119 0x100000 -x 19 0x400000 -x 59 4 -x 15 2 -x 49 0x400004 -x 51 0x20 -x 57 0x4c -x 58 0x10000 -x 124 0x1000 -x 57 0xfb0000 -x 58 0x78031040 -x 48 3328 -stdinc /opt/pgi/linux86/10.0/include:/usr/local/include:/usr/lib/gcc/x86_64-redhat-linux/4.1.2/include:/usr/lib/gcc/x86_64-redhat-linux/4.1.2/32/include:/usr/include -def unix -def __unix -def unix -def linux -def __linux -def linux -def i386 -def __i386 -def i386 -def __NO_MATH_INLINES -def linux86 -def __THROW= -def extension= -def SSE -def MMX -def SSE2 -def SSE3 -def SSE4A -def ABM -freeform -x 137 1 -x 163 128 -vect 48 -x 137 1 -mp -x 137 2 -mp -x 69 0x200 -x 69 0x400 -x 69 2 -x 69 1 -modexport /tmp/pgfortranswVh2eukw2wS.cmod -modindex /tmp/pgfortranIwVhMR0Tf60c.cmdx -output /tmp/pgfortrancwVhgKxoVuSS.ilm
0 inform, 0 warnings, 0 severes, 0 fatal for inc_time
0 inform, 0 warnings, 0 severes, 0 fatal for inc_time_step
0 inform, 0 warnings, 0 severes, 0 fatal for initial_conditions
PGF90-S-0038-Symbol, .i0002, has not been explicitly declared (main.cuf: 124)
PGF90-S-0038-Symbol, .i0003, has not been explicitly declared (main.cuf: 124)
PGF90-S-0038-Symbol, .i0004, has not been explicitly declared (main.cuf: 124)
0 inform, 0 warnings, 3 severes, 0 fatal for burger
PGF90/x86 Linux 10.0-0: compilation completed with severe errors
pgfortran-Fatal-f901 completed with exit code 1
Unlinking /tmp/pgfortrancwVhgKxoVuSS.ilm
Unlinking /tmp/pgfortranswVh2eukw2wS.cmod
Unlinking /tmp/pgfortranIwVhMR0Tf60c.cmdx
The offending line (124) is the kernel invocation:
call inc_time_step<<<dimGrid,dimBlock>>(v_dev,tstep,overlap,cellsize, &
RE,block_size,shared_array_size,ncell)
Of course, all variables are declared (I use implicit none).
The question is, what does the emulation mode does differently than normal mode?
Just to check I compiled the sgemm.cuf sample code with -Mcuda=emu, and it worked.
As my question might be difficult to answer without the code, I include the main subroutine code prior to the error:
PROGRAM BURGER
USE CUDAFOR
USE INC_TIME
implicit none
integer, parameter :: ncell=8192, WARP_SIZE=32, GRID_SIZE=30
real(KIND=4), parameter :: Re=150.,boxsize=1.,tfinal=1.
real(KIND=4) :: cellsize,tnow,tstep,cellsize_sq
integer :: block_size,last_block_size,overlap,shared_array_size
real(KIND=4), dimension(0:ncell-1) :: v
real(KIND=4), device, allocatable, dimension(:) :: v_dev
integer :: i,ct,err
CHARACTER(LEN=3) stg
type(dim3) :: dimGrid, dimBlock
shared_array_size= ( (NCELL/GRID_SIZE+1)/WARP_SIZE)* &
WARP_SIZE+2*WARP_SIZE+2
if( mod(shared_array_size-(NCELL/GRID_SIZE+1),2) == 1) then
shared_array_size=shared_array_size+1;
endif
block_size=shared_array_size-2
overlap=(shared_array_size-(NCELL/GRID_SIZE+1) )/2
print*,'-----------------------------------------'
print*,(NCELL/GRID_SIZE+1),'cells for each domain'
print*,overlap,'overlap'
print*,block_size,'threads per domain'
print*,'-----------------------------------------'
cellsize=boxsize/ncell
cellsize_sq=cellsize*cellsize
tnow=0.
tstep=(tfinal-tnow)/2000000.
call initial_conditions(v,ncell)
! Copy data to GPU
allocate(v_dev(0:ncell-1))
v_dev=v
ct=0
DO WHILE (tnow <= tfinal)
ct=ct+1
dimGrid = dim3( GRID_SIZE, 1, 1 )
dimBlock = dim3( block_size, 1, 1 )
call inc_time_step<<<dimGrid,dimBlock>>>(v_dev,tstep,overlap,cellsize, &
RE,block_size,shared_array_size,ncell)
Thanks for your advice,