I have a library of functions declared in a .h file and defined in a .cpp file, with some functions that have been decorated with #pragma acc routine seq information. When I compile the .cpp into a static (.a) or shared library (.so), then if I want to use these device library functions in a different .cpp file, I can simply include the .h file and forward declare the functions at the top of the .cpp file with #pragma acc routine seq information.
However, if I instead have a class, with various member functions, that is compiled to a static or shared library, then I cannot use the above method (i.e. forward declaring functions with #pragma acc routine seq information), since I then get an error saying that I cannot redefine functions. Thus, the only way I’ve currently been able to work around this problem is to not compile the class to a static or shared library, and instead simply define all of its functions in the header itself.
Is there a better solution to this problem? Ideally I’d like to be able to have a class compiled to a .a or .so file on disk, and still be able to link and use its #pragma acc routine seq functions in another .cpp file. Any help would be great, thanks!
Maybe I’m not understanding, but I’m not sure how you’re getting this to work with shared objects. CUDA doesn’t currently have dynamic loader so it’s not possible to have a compute region call a device routine contained in shared object.
Can you provide more detail or an example of what you’re doing?
However, if I instead have a class, with various member functions, that is compiled to a static or shared library, then I cannot use the above method (i.e. forward declaring functions with #pragma acc routine seq information), since I then get an error saying that I cannot redefine functions. Thus, the only way I’ve currently been able to work around this problem is to not compile the class to a static or shared library, and instead simply define all of its functions in the header itself.
Again, I think I need more details and an example to understand what you’re doing here.
Apologies, I’ve actually only tried with static libraries, so my shared library assumptions are incorrect.
I’ve made the following 3 tests scenarios that I hope will clear things up a little. Cases 1 and 3 should compile and run without any errors, whereas case 2 should fail to compile due to missing acc routine information:
Compile test_functions.cpp to a static library, and use its functions in test_executable.cpp (that is compiled to an executable), by forward declaring the functions defined in test_functions.cpp at the top of test_executable.cpp
Compile test_class.cpp to a static library, and try to use the member functions defined inside that class in test_executable.cpp (that is compiled to an executable), by forward declaring the class member functions at the top of test_executable.cpp
Declare and define TestClass inside a .h file (i.e. don’t compile it to a static library), and include this into test_executable.cpp (that is compiled to an executable).
Yes, that’s what I’m seeing, but case 2 only fails because you need to add the routine directive to the prototypes in test_class.h.
Basically, the ‘routine’ directive needs to be visible from both the callee and the caller. The callee needs it so the compiler knows to create a device routines and the caller needs it so it knows that there’s a device routine that it can call.
While you can have routine in both the prototype and definition (though seem to recall that g++ doesn’t like this in their OpenACC implementation), I find it easier to just put it with the prototype since it’s visible to both.
I should note that the compiler is smart enough that it will implicitly add ‘routine seq’ to methods called within device code. The caveat being that the definition needs to be visible which is not the case in test2. Though this feature is especially useful for templated code since it would be very difficult to add ‘routine’ everywhere.
% cat ../test_class.h
#pragma once
#include <stddef.h>
class TestClass {
public:
TestClass();
TestClass(const float test_num);
~TestClass();
#pragma acc routine
void set_test_num(const float test_num);
#pragma acc routine
void print_test_num();
float test_num_;
};
% make
Scanning dependencies of target test_class
[ 25%] Building CXX object CMakeFiles/test_class.dir/test_class.cpp.o
TestClass::set_test_num(float):
14, Generating acc routine seq
Generating Tesla code
TestClass::print_test_num():
19, Generating acc routine seq
Generating Tesla code
[ 50%] Linking CXX static library libtest_class.a
[ 50%] Built target test_class
Scanning dependencies of target test_executable
[ 75%] Building CXX object CMakeFiles/test_executable.dir/test_executable.cpp.o
"/scratch/mcolgrove/jeffr1992/test2/test_executable.cpp", line 42: warning: expression has no effect
test_executable;
^
main:
45, Generating Tesla code
Generating implicit copy(test_executable) [if not already present]
TestExecutable::TestExecutable():
27, Generating enter data copyin(this[:1])
TestExecutable::~TestExecutable():
31, Generating exit data delete(this[:1])
TestExecutable::do_something():
34, Generating acc routine seq
Generating Tesla code
[100%] Linking CXX executable test_executable
[100%] Built target test_executable
% ./test_executable
test_num_: 20.000000