Simple code error related on sharing device data btw modules

Hello.

I am a beginner for PGI fortran for CUDA programming.

Here are basic example 3 codes(b.cuf, a.cuf, aPlusB.cuf).

b.cuf
module b_m
integer, device :: b_d
end module b_m

a.cuf
module a_m
integer, device :: a_d
contains
attributes(global) subroutine aPlusB()
use b_m
implicit none
a_d = a_d + b_d
end subroutine aPlusB
end module a_m

aPlusB.cuf
program twoPlusThree
use a_m
use b_m
implicit none
integer :: a

a_d = 2
b_d = 3
call aPlusB<<<1,1>>>()
a = a_d
write(*,"(‘2+3=’,i0)") a
end program twoPlusThree

Compiling works well.
But i got a message saying like this below for executing ./a.out.

copying Symbol Memcpy FAILED : 13(invalid device symbol)

I tested in various way to find where the error comes from.

I think “use b_m” make the error,i.e. sharing device data between devices is not working on my PC which operates on LINUX.

Please reply.

Thanks.

Hi 9mile,

Your code works for me here so I suspect it’s some issue with your system. What OS, compiler version, CPU, and GPU are you using? What are your compile options? What is the output from the utility “pgaccelinfo”?

Thanks,
Mat

Hello mkcolg,

Thank you for quick reply.

My OS is Linux-redhat

Here is my pgaccelinfo.

CUDA Driver Version: 6050
NVRM version: NVIDIA UNIX x86_64 Kernel Module 340.65 Tue Dec 2 09:50:34 PST 2014

CUDA Device Number: 0
Device Name: Tesla T10 Processor
Device Revision Number: 1.3
Global Memory Size: 4294770688
Number of Multiprocessors: 30
Number of Cores: 240
Concurrent Copy and Execution: Yes
Total Constant Memory: 65536
Total Shared Memory per Block: 16384
Registers per Block: 16384
Warp Size: 32
Maximum Threads per Block: 512
Maximum Block Dimensions: 512, 512, 64
Maximum Grid Dimensions: 65535 x 65535 x 1
Maximum Memory Pitch: 2147483647B
Texture Alignment: 256B
Clock Rate: 1440 MHz
Execution Timeout: No
Integrated Device: No
Can Map Host Memory: Yes
Compute Mode: default
Concurrent Kernels: No
ECC Enabled: No
Memory Clock Rate: 800 MHz
Memory Bus Width: 512 bits
Max Threads Per SMP: 1024
Async Engines: 1
Unified Addressing: No
Initialization time: 1008952 microseconds
Current free memory: 4237317888
Upload time (4MB): 1412 microseconds ( 956 ms pinned)
Download time: 2962 microseconds ( 909 ms pinned)
Upload bandwidth: 2970 MB/sec (4387 MB/sec pinned)
Download bandwidth: 1416 MB/sec (4614 MB/sec pinned)
PGI Compiler Option: -ta=nvidia,cc13

CUDA Device Number: 1
Device Name: Tesla T10 Processor
Device Revision Number: 1.3
Global Memory Size: 4294770688
Number of Multiprocessors: 30
Number of Cores: 240
Concurrent Copy and Execution: Yes
Total Constant Memory: 65536
Total Shared Memory per Block: 16384
Registers per Block: 16384
Warp Size: 32
Maximum Threads per Block: 512
Maximum Block Dimensions: 512, 512, 64
Maximum Grid Dimensions: 65535 x 65535 x 1
Maximum Memory Pitch: 2147483647B
Texture Alignment: 256B
Clock Rate: 1440 MHz
Execution Timeout: No
Integrated Device: No
Can Map Host Memory: Yes
Compute Mode: exclusive
Concurrent Kernels: No
ECC Enabled: No
Memory Clock Rate: 800 MHz
Memory Bus Width: 512 bits
Max Threads Per SMP: 1024
Async Engines: 1
Unified Addressing: No
Initialization time: 1008952 microseconds
Current free memory: 4237317888
Upload time (4MB): 1441 microseconds ( 954 ms pinned)
Download time: 3192 microseconds ( 913 ms pinned)
Upload bandwidth: 2910 MB/sec (4396 MB/sec pinned)
Download bandwidth: 1314 MB/sec (4593 MB/sec pinned)
PGI Compiler Option: -ta=nvidia,cc13

Hi 9mile,

The problem is that you have an older device that’s no longer supported. We discontinued support for compute capability 1.3 devices in 15.1.

You can use an older release, such as 14.10, however CC1.3 devices don’t support the ability to use device module data from external devices (it’s one of the reasons why we dropped support). Hence, you’ll get the following compile error:

% pgf90 -Mcuda=cc13 b.cuf a.cuf aPlusB.cuf -V14.10
b.cuf:
a.cuf:
PGF90-S-0521-MODULE data cannot be used in a DEVICE or GLOBAL subprogram unless compiling for compute capability >= 2.0 - b_d (a.cuf: 7)
  0 inform,   0 warnings,   1 severes, 0 fatal for aplusb
aPlusB.cuf:

You’ll need either update your device or put all of your device module data in the same module.

  • Mat

Dear Mat,

I see what the problem is now.

Thanks for the help.

Best wishes,

9mile