OpenACC routine/nordc problem

Hi,

I would like to write an class using OpenACC directives and compile it to a shared library, then write an execute file to instantiate this class, and do parallel computing, here are some problems that i’ve met:

  1. I added an class named A in an already existing library -> this class use OpenACC directives while the other codes in this library don’t; This library will be used for many other codes as a link library.

=> If i understand correctly, if i want to compile this class successfully, i should add -nordc flag.
=> But i also want to define an OpenACC routine function in class A, which will be used in other codes

  1. After the first step, i wrote another execute code file, in which class A is instantiated and used for offload computing. i compiled it

  2. But when i ran this execution, i found that the result is not correct, a value that is copied into gpu device, and calculated by OpenACC routine function, updated to host, always equals to 0, which means that there is an unknown problem in my program

I guess this error may be caused by -nordc,
And my question is, if -nordc is used for compiling a shared library while in this library there is an acc routine function defined, can it works correctly, in other words, can i use this library in other code successfully?

Thanks,
Tao CHANG

Hi Tao CHANG,

And my question is, if -nordc is used for compiling a shared library while in this library there is an acc routine function defined, can it works correctly, in other words, can i use this library in other code successfully?

The core issue is that there isn’t a dynamic linker for device code. Hence in order to use features that require a device linker, such as calling a device routine from device code, the code must be statically linked. Basically, the “nordc” option disables the need for static linking and allows for device objects to be self-contained which in turn allows for them to be used in shared objects.

If the called device routine is contained within the same source file, then typically this routine will be inlined thus removing the need for the link. If the device calls are in separate source files, then they need to be linked and rdc must be used.

Hence external device code can not call device code within a shared object.

I guess this error may be caused by -nordc,

Does the program work if you statically link it with RDC enabled?

I’d try and get this working first, and then see about how to port the code to a shared object. There could be other unrelated issues.

-Mat

[



Hence external device code can not call device code within a shared object.

Does the program work if you statically link it with RDC enabled?

Hi, Mat,

Thanks for your reply. I tried to compile with default rdc flag, the compilation goes well, yet it happens to be an error during execution as below:
Failing in Thread:1
call to cuModuleGetFunction returned error 500: Not found

It seems that the routine function is not considered as defined. How to explain it? Is it a device linker problem?

Thanks in advance.
Tao [/u]

I tried to compile with default rdc flag, the compilation goes well, yet it happens to be an error during execution as below:
Failing in Thread:1
call to cuModuleGetFunction returned error 500: Not found

This means that the generated CUDA “bin” file isn’t getting linked in with your binary. Did you statically link or are you trying to use the shared object?

-Mat

I tried to use the shared object which is compiled with rdc flag.

Hi,

What do you mean statically link? I don’t get it.

Right now what i would like to do is as follows:

In directory D: class A, class B, class C, where B is the cuda version class inheriting A, C is the OpenACC version class inheriting A.

compile files in directory D to a library named D.so

In directory E: main function file, using class C, so to compile it i linked library D.so.

But the routine function defined in class C didn’t work.

-Tao