cuModuleLoadData error from first build/execution


I’ve just installed PGI and CUDA and am seeing
$ f1.exe
call to cuModuleLoadData returned error 300: Invalid Source
on my first build. I’m also seeing this on all otherl
examples. My question is what is causing this problem?

Relevant info follows with these keys.
=== Here’s the build
=== Here’s the system I am on
=== Here’s the sitenvrc file
=== Here’s the nvidia hardware
=== Here’s the source

=== Here’s the build
$ export PGI=/opt/pgi/linux86-64/9.0
$ export N=/usr/local/cuda
$ export PATH=$PGI/bin:$N/bin:$PATH
$ export LD_LIBRARY_PATH=$PGI/lib:$PGI/libso:$N/lib64:$LD_LIBRARY_PATH

$ pgfortran -o f1.exe f1.f90 -ta=nvidia -Minfo=accel -fast
21, Generating copyin(a(1:n))
Generating copyout(r(1:n))
22, Loop is parallelizable
Accelerator kernel generated
22, !$acc do parallel, vector(256)

=== Here’s the system I am on
Red Hat Enterprise Linux Client release 5.4 (Tikanga)
Linux 2.6.18-164.el5 x86_64

pgf90 9.0-4 64-bit target on x86-64 Linux -tp core2-64

=== Here’s the sitenvrc file
cat /opt/pgi/linux86-64/9.0/bin/sitenvrc
set NVOPEN64DIR=/usr/local/cuda/open64/lib ;
set CUDADIR=/usr/local/cuda/bin ;
set CUDALIB=/usr/local/cuda/lib64 ;

=== Here’s the nvidia hardware
$ pgaccelinfo
Device Number: 0
Device Name: Quadro NVS 135M
Device Revision Number: 1.1
Global Memory Size: 133496832
Number of Multiprocessors: 1
Number of Cores: 8
Concurrent Copy and Execution: Yes
Total Constant Memory: 65536
Total Shared Memory per Block: 16384
Registers per Block: 8192
Warp Size: 32
Maximum Threads per Block: 512
Maximum Block Dimensions: 512 x 512 x 64
Maximum Grid Dimensions: 65535 x 65535 x 1
Maximum Memory Pitch: 262144B
Texture Alignment 256B
Clock Rate: 800 MHz
Initialization time: 39351 microseconds
Current free memory 86343680
Upload time (4MB) 2675 microseconds (1888 ms pinned)
Download time 9987 microseconds (9230 ms pinned)
Upload bandwidth 1567 MB/sec (2221 MB/sec pinned)
Download bandwidth 419 MB/sec ( 454 MB/sec pinned)

=== Here’s the source
program main
integer :: n ! size of the vector
real,dimension(:),allocatable :: a ! the vector
real,dimension(:),allocatable :: r ! the results
real,dimension(:),allocatable :: e ! expected results
integer :: i
character(10) :: arg1
if( iargc() .gt. 0 )then
call getarg( 1, arg1 )
read(arg1,’(i10)’) n
n = 100000
if( n .le. 0 ) n = 100000
do i = 1,n
a(i) = i*2.0
!$acc region
do i = 1,n
r(i) = a(i) * 2.0
!$acc end region
do i = 1,n
e(i) = a(i) * 2.0
! check the results
do i = 1,n
if( r(i) .ne. e(i) )then
print *, i, r(i), e(i)
stop ‘error found’
print *, n, ‘iterations completed’
end program

Thanks in advance for any help

Hi Lew,

By default, the compiler targets cards with compute capability 1.3. For cards with earlier CC versions, either add the “cc11” flag to “-ta=nvidia,cc11” or add “set COMPUTECAP=10;” to your sitenvrc file.

Hope this helps,


Thanks! That worked.