Problems with the device subprograms

OceanCloud · August 30, 2013, 5:18am

When I use the subroutine with device attribute in CUDA Fortran, I find the device subprogram must be contained in a module and can only be invoked by subroutines or functions in this module.
Is it true?
Why it does in this way?

MatColgrove · September 3, 2013, 7:20pm

Hi OceanCloud,

Is it true?

Yes, in older versions of the compiler and by default in the current version. The main issue is that until recently, there wasn’t a linker for device code. Hence, device routines needed to be inlined by the compiler thus required the device routines to be placed in the same module as the global routines. (Note that this was true for CUDA C as well where device routines had to be in the same file scope as the global routines).

As of CUDA 5.0, we now can link device routines found in external objects when using the “-Mcuda=rdc” flag. The following PGinsider article gives a good explanation of its usage: Account Login | PGI

Hope this helps,
Mat

OceanCloud · September 5, 2013, 1:25am

Hi, Mat

Thanks a lot.

I read the PGinsider article you mentioned, maybe the compile option is “-Mcuda=rdc” not “-Mcuda=rdo”. But I don’t quite understand when I use the “-Mcuda=rdc” flag and the “allocate” keyword in device routines, the compiler gives errors as below

“error F0155 : Compiler failed to translate accelerator region (see -Minfo messages): Unexpected runtime function call”

Why does this error occur?

MatColgrove · September 5, 2013, 9:49pm

maybe the compile option is “-Mcuda=rdc” not “-Mcuda=rdo”.

Correct, this was a typo in my part. I’ll go back and edit the post.

“error F0155 : Compiler failed to translate accelerator region (see -Minfo messages): Unexpected runtime function call”

Why does this error occur?

This typically means that a compiler generated host routine is being added to the device code. The one open bug (TPR#19462) I see with this failure has to with “pow” when “-i8” is used. This will be fixed in 13.9. If that’s not the same as yours, can you send a reproducing example to PGI Customer Service (trs@pgroup.com)?

Thanks,
Mat

OceanCloud · September 11, 2013, 3:05am

Thanks, Mat

I mean when I test the codes given in the PGinsider article, the codes (dgemmdynamic.cuf, dgemmdynamic_strassen.cuf, dgemmdynamic_streams.cuf)can’t compile fine.

Enviroment: PGI Visual Fortran 13.8, Visual Studio 2012, Windows 7 x64
compile option: -Mcuda=cuda5.0,cc35,rdc
GPU card: K20C

Error message:

dgemmdynamic.cuf
C:\Users\Adiministrator\AppData\Local\Temp\pgcudafor2afqqbp1lEUFtU.gpu(1010): error: identifier "mm88" is undefined

C:\Users\Adiministrator\AppData\Local\Temp\pgcudafor2afqqbp1lEUFtU.gpu(1010): error: identifier "mm28" is undefined

2 errors detected in the compilation of "C:\Users\Adiministrator\AppData\Local\Temp\pgnvd2aGq4bGHw4zbl_.nv0".
D:\Research\Programming\Routine\CUDA Fortran\test\dgemmdynamic.cuf(1) : error F0155 : Compiler failed to translate accelerator region (see -Minfo messages): Device compiler exited with error status code
PGF90/x86-64 Windows 13.8-0: compilation aborted


dgemmdynamic_strassen.cuf
ptxas C:\Users\Adiministrator\AppData\Local\Temp\pgcudafor4c-qbc9YSk0ywp.ptx, line 2337; : error : Instruction 'kernel function address' requires .target sm_35 or higher
ptxas C:\Users\Adiministrator\AppData\Local\Temp\pgcudafor4c-qbc9YSk0ywp.ptx, line 2441; : error : Instruction 'kernel function address' requires .target sm_35 or higher
ptxas C:\Users\Adiministrator\AppData\Local\Temp\pgcudafor4c-qbc9YSk0ywp.ptx, line 2545; : error : Instruction 'kernel function address' requires .target sm_35 or higher
ptxas C:\Users\Adiministrator\AppData\Local\Temp\pgcudafor4c-qbc9YSk0ywp.ptx, line 3082; : error : Instruction 'kernel function address' requires .target sm_35 or higher
ptxas C:\Users\Adiministrator\AppData\Local\Temp\pgcudafor4c-qbc9YSk0ywp.ptx, line 3177; : error : Instruction 'kernel function address' requires .target sm_35 or higher
ptxas : fatal error : Ptx assembly aborted due to errors
pgnvd-Fatal-Could not spawn c:\program files\pgi\win64/2013/cuda/5.0/bin\ptxas.exe
D:\Research\Programming\Routine\CUDA Fortran\test\dgemmdynamic_strassen.cuf(1) : error F0155 : Compiler failed to translate accelerator region (see -Minfo messages): Device compiler exited with error status code
PGF90/x86-64 Windows 13.8-0: compilation aborted

dgemmdynamic_streams.cuf
ptxas C:\Users\Adiministrator\AppData\Local\Temp\pgcudafor4c0KubCnuvxO8w.ptx, line 2372; : error : Instruction 'kernel function address' requires .target sm_35 or higher
ptxas C:\Users\Adiministrator\AppData\Local\Temp\pgcudafor4c0KubCnuvxO8w.ptx, line 3257; : error : Instruction 'kernel function address' requires .target sm_35 or higher
ptxas : fatal error : Ptx assembly aborted due to errors
pgnvd-Fatal-Could not spawn c:\program files\pgi\win64/2013/cuda/5.0/bin\ptxas.exe
D:\Research\Programming\Routine\CUDA Fortran\test\dgemmdynamic_streams.cuf(1) : error F0155 : Compiler failed to translate accelerator region (see -Minfo messages): Device compiler exited with error status code
PGF90/x86-64 Windows 13.8-0: compilation aborted

The above three routines all contain “allocate” statements, and the dgemmdynamic_strassen.cuf, dgemmdynamic_streams.cuf routines contain dynamic parallelism.

Maybe you can point out where the problem is from the above description.

Topic		Replies	Views
Device routines must be in the same module as the caller Legacy PGI Compilers	4	2304	May 6, 2013
Calling a cuda C device subroutine. Legacy PGI Compilers	2	2003	April 15, 2010
Passing shared memory in device subroutine Legacy PGI Compilers	1	2443	September 8, 2015
CUDA - Call device subroutine Legacy PGI Compilers	4	3379	March 29, 2010
Issue calling an attributes(device) routine Legacy PGI Compilers	2	2746	October 6, 2010
device data on a different module Legacy PGI Compilers	5	3440	March 12, 2012
CUDA fortran : debug and rdc option don't work together Legacy PGI Compilers	2	2762	December 12, 2017
CUDAFortan error: ... from device code to a host function... Legacy PGI Compilers	2	3394	November 11, 2013
cuda fortran module data Legacy PGI Compilers	6	8196	September 9, 2010
Device intrinsic problem Legacy PGI Compilers	3	2620	October 21, 2015

Problems with the device subprograms

Related topics