OpenACC routine/nordc problem

TAO_T.CH · May 31, 2018, 9:32am

Hi,

I would like to write an class using OpenACC directives and compile it to a shared library, then write an execute file to instantiate this class, and do parallel computing, here are some problems that i’ve met:

I added an class named A in an already existing library → this class use OpenACC directives while the other codes in this library don’t; This library will be used for many other codes as a link library.

=> If i understand correctly, if i want to compile this class successfully, i should add -nordc flag.
=> But i also want to define an OpenACC routine function in class A, which will be used in other codes

After the first step, i wrote another execute code file, in which class A is instantiated and used for offload computing. i compiled it
But when i ran this execution, i found that the result is not correct, a value that is copied into gpu device, and calculated by OpenACC routine function, updated to host, always equals to 0, which means that there is an unknown problem in my program

I guess this error may be caused by -nordc,
And my question is, if -nordc is used for compiling a shared library while in this library there is an acc routine function defined, can it works correctly, in other words, can i use this library in other code successfully?

Thanks,
Tao CHANG

MatColgrove · May 31, 2018, 4:05pm

Hi Tao CHANG,

And my question is, if -nordc is used for compiling a shared library while in this library there is an acc routine function defined, can it works correctly, in other words, can i use this library in other code successfully?

The core issue is that there isn’t a dynamic linker for device code. Hence in order to use features that require a device linker, such as calling a device routine from device code, the code must be statically linked. Basically, the “nordc” option disables the need for static linking and allows for device objects to be self-contained which in turn allows for them to be used in shared objects.

If the called device routine is contained within the same source file, then typically this routine will be inlined thus removing the need for the link. If the device calls are in separate source files, then they need to be linked and rdc must be used.

Hence external device code can not call device code within a shared object.

I guess this error may be caused by -nordc,

Does the program work if you statically link it with RDC enabled?

I’d try and get this working first, and then see about how to port the code to a shared object. There could be other unrelated issues.

-Mat

TAO_T.CH · June 2, 2018, 9:35am

[

Hence external device code can not call device code within a shared object.

Does the program work if you statically link it with RDC enabled?

Hi, Mat,

Thanks for your reply. I tried to compile with default rdc flag, the compilation goes well, yet it happens to be an error during execution as below:
Failing in Thread:1
call to cuModuleGetFunction returned error 500: Not found

It seems that the routine function is not considered as defined. How to explain it? Is it a device linker problem?

Thanks in advance.
Tao [/u]

MatColgrove · June 3, 2018, 5:18pm

I tried to compile with default rdc flag, the compilation goes well, yet it happens to be an error during execution as below:
Failing in Thread:1
call to cuModuleGetFunction returned error 500: Not found

This means that the generated CUDA “bin” file isn’t getting linked in with your binary. Did you statically link or are you trying to use the shared object?

-Mat

TAO_T.CH · June 4, 2018, 8:39am

I tried to use the shared object which is compiled with rdc flag.

TAO_T.CH · June 4, 2018, 12:47pm

Hi,

What do you mean statically link? I don’t get it.

Right now what i would like to do is as follows:

In directory D: class A, class B, class C, where B is the cuda version class inheriting A, C is the OpenACC version class inheriting A.

compile files in directory D to a library named D.so

In directory E: main function file, using class C, so to compile it i linked library D.so.

But the routine function defined in class C didn’t work.

-Tao

Topic		Replies	Views
Clarification on using OpenACC in a shared library Legacy PGI Compilers	27	4830	December 9, 2020
Using OpenACC with C++ class member functions that have been compiled to static or shared libraries nvc, nvc++ and nvfortran	5	801	October 2, 2021
problem of openacc compiled shared lib on linux Legacy PGI Compilers	1	2687	September 30, 2016
Missing cuda device code when trying to link nvc object file with gcc nvc, nvc++ and nvfortran	3	1299	March 4, 2022
Dynamically loading an OpenACC-enabled shared library from an executable compiled with nvc++ does not work nvc, nvc++ and nvfortran	5	946	April 13, 2022
cudaGetSymbolAddress error when mixing OpenACC and shared libraries nvc, nvc++ and nvfortran	1	510	July 14, 2022
Can an OpenACC accelerated shared object contain cpu and gpu code both? nvc, nvc++ and nvfortran	3	336	April 30, 2024
Missing relocation entries in shlib compiled with OpenACC nvc, nvc++ and nvfortran	12	178	March 12, 2025
Cannot dynamically load a shared library containing both OpenACC and CUDA code nvc, nvc++ and nvfortran	8	2877	August 24, 2022
undefined reference to `__pgi_uacc_computestart' Legacy PGI Compilers	8	7771	June 14, 2018

OpenACC routine/nordc problem

Related topics