I really want to put this to work, that is the reason i bought it …
Please, can you help me ?
I’m trying but everything points to a problem with your specific system and not an issue with the compiler. As you point out, the code does compile and run successfully on other systems, just not yours.
This is why I’d like you to compile and run a CUDA C program using nvcc. If this works, then it’s a problem with the PGI installation. If it fails in the same way, then it’s a problem with your system.
PS: the compilator was working properly with the same code. The problem SEEMS to have started after i’ve tryed to write the result of a code in an output text file, (using WRITE at the fortran code). Do you think the graphic card can be with memory problem from that time ?
If I understand this correctly, the “Mol_Dyn.f90” code was working until you added the WRITE statement? What happens if you remove the WRITE statement? Is accelerator code still being generated when the WRITE statement is removed?
FYI, a WRITE statement shouldn’t cause this error. However, what could be happening is that without the write statement, dead code elimination optimization is removing the accelerated code. This pure speculation, though, and until I have more details I don’t know for sure.
Again, having the full output from a run where you have the environment variable “PGI_ACC_DEBUG” set to 1, may be helpful.
PS 2: the previous simple program picalc.f90 is also giving a similar memory error :
Have you modified this code from your first post? You’re no longer getting the “sum reduction” message.
Here’s what I want to see, the source your compiling, the command line options and the Minfo output, and the output from the run when PGI_ACC_DEBUG is set to 1.
% cat picalc.f90
program picalc
implicit none
integer, parameter :: n=1000000
integer :: i
real(kind=8) :: t, pi
pi = 0.0
!$acc parallel loop
do i=0, n-1
t = (i+0.5)/n
pi = pi + 4.0/(1.0 + t*t)
end do
!$acc end parallel loop
print *, 'pi=', pi/n
end program picalc
% pgfortran -fast -Minfo=all -o MOL_DYN picalc.f90 -ta=nvidia,4.2 -V12.10
picalc:
7, Accelerator kernel generated
7, CC 1.3 : 23 registers; 32 shared, 36 constant, 0 local memory bytes
CC 2.0 : 23 registers; 0 shared, 60 constant, 0 local memory bytes
8, !$acc loop gang, vector(256) ! blockidx%x threadidx%x
10, Sum reduction generated for pi
7, Generating compute capability 1.3 binary
Generating compute capability 2.0 binary
% setenv PGI_ACC_DEBUG 1
% MOL_DYN
__pgi_cu_init() found 2 devices
__pgi_cu_init( file=picalc.f90, function=picalc, line=7, startline=1, endline=14 )
__pgi_cu_init() will use device 0 (V3.0)
__pgi_cu_init() compute context created
__pgi_cu_module3( lineno=7 )
__pgi_cu_module3 module loaded at 0x85b1c0
__pgi_cu_module_function( name=0x673372=picalc_7_gpu, lineno=7, argname=(nil)=, argsize=12, varname=0x67337f=b1, varsize=8, SWcachesize=0 )
Function handle is 0x8a6db0
__pgi_cu_module_function( name=0x673360=picalc_10_gpu_red, lineno=7, argname=(nil)=, argsize=0, varname=(nil)=, varsize=0, SWcachesize=0 )
Function handle is 0x8a3d60
__pgi_cu_alloc(size=31256,lineno=7,name=)
__pgi_cu_alloc(31256) returns 0x500240000
__pgi_cu_uploadc( "b1", size=8, offset=0, lineno=7 )
constant data b1 at address 0x500140000 devsize=8, size=8, offset=0
First arguments are:
0 0
0x00000000 0x00000000
__pgi_cu_launch_a(func=0x8a6db0, grid=3907x1x1, block=256x1x1, lineno=7)
__pgi_cu_launch_a(func=0x8a6db0, params=0x7fffdf3d5dac, bytes=8, sharedbytes=2048)
First arguments are:
2359296 5
0x00240000 0x00000005
__pgi_cu_launch_a(func=0x8a3d60, grid=1x1x1, block=256x1x1, lineno=10)
__pgi_cu_launch_a(func=0x8a3d60, params=0x7fffdf3d5dac, bytes=12, sharedbytes=2048)
First arguments are:
2359296 5 3907
0x00240000 0x00000005 0x00000f43
__pgi_cu_downloadc( "b1", size=8, offset=0, lineno=7 )
constant data b1 at address 0x500140000 devsize=8, size=8, offset=0
downloaded values are:
1409763568 1095235564
0x540748f0 0x4147f7ec
__pgi_cu_free( 0x500240000, lineno=12, name= )
Memory Freed
__pgi_cu_close()
pi= 3.141592656472318