Clarification on using OpenACC in a shared library

david.gutzwiller · July 13, 2020, 4:50pm

Thanks Mat,

I’ll keep you posted as if/when I encounter any related issues.

-David

pramod.s.kumbhar · July 13, 2020, 10:11pm

Thank you very much David and Mat for providing quick feedback.

If there is anything I should try or experiment with, let me know.

Just for the context, I am integrating GPU support in NEURON framework and integrating static libraries is not straightforward (or at least doesn’t match with how large number of users use this software). So any help / feedback will be highly appreciated.

pramod.s.kumbhar · August 12, 2020, 8:03pm

Dear Mat,

Just to check, was there any response/suggestion from compiler dev team?

Thank you!

MatColgrove · August 13, 2020, 10:22pm

No, sorry, nothing yet.

david.gutzwiller · October 9, 2020, 2:16am

Hi Mat, I’m just checking in on this issue to see if there has been any progress or news from the compiler dev team. I’ve been tasked with reworking two of our standalone solvers as shared libraries with a very tight deadline. I started today and ran into a bunch of issues with both PGI 19.10 and the latest NVIDIA HPC SDK (20.9). I’ll try to reproduce these issues in a simplified manner tomorrow but hopefully there are already some fixes in the pipeline.

pramod.s.kumbhar · November 30, 2020, 9:03am

During OpenACC summit few months ago I asked about this issue (Michael Wolfe and Mat were there during the breakout session). I am wondering if compiler dev team has any feedback or workaround solution on this.

If there is no ETA for resolution of this issue, I am wondering if we should give up the shared library approach and use static library instead (which going to impact our user workflows).

MatColgrove · November 30, 2020, 5:13pm

Hi Pramod,

The work around would be to not use RDC (i.e. -gpu=nordc) with shared objects. However without RDC, the code can’t contain any static global references in “declare” directives, like extern variables, nor allows for cross-file device routine calls. So you’d need to reorganize the code so “my_data” is passed in as an argument, rather a global reference.

For example:

% cat test1.cpp
#include <stdlib.h>
#include <stdio.h>

void init(double val, float** my_data, double & my_celsius) {
    *my_data = (float*) malloc(sizeof(float)*3);
    (*my_data)[0] = 1.1;
    float * tmp = *my_data;
    #pragma acc enter data copyin(tmp[0:3])

    my_celsius = val;
    #pragma acc enter data copyin(my_celsius)
}

#pragma acc routine seq
double foo (double my_celsius,  double my_value) {
    double y = 2.0;
    // 2 + 5 + 1 = 8
    return y + my_celsius + my_value;
}

double sample(double & my_celsius,double my_value) {
  double val;
  #pragma acc serial present(my_celsius) copyout(val)
  {
      val = foo(my_celsius, my_value);
  }
  return val;
}
dev-sky5:/local/home/colgrove/tmp% cat main.cpp
#include <iostream>

void init(double val, float** my_data, double & my_celsius);
double sample(double &my_celsius,double my_value);
static double my_value = 1.0;

int main() {

    float * my_data;
    double my_celsius;
    init(5.0,&my_data,my_celsius);
    double val = sample(my_celsius,my_value);
    std::cout << "Val=" << val << std::endl;
    std::cout << "Done\n";
    return 0;
}
% nvc++ -acc -Minfo -gpu=cc70 test1.cpp -c -fPIC
init(double, float **, double &):
     10, Generating enter data copyin(tmp[:3])
     12, Generating enter data copyin(my_celsius[:1])
foo(double, double):
     15, Generating acc routine seq
         Generating Tesla code
sample(double &, double):
     24, Generating present(my_celsius[:1])
         Generating copyout(val) [if not already present]
         Accelerator serial kernel generated
         Generating Tesla code
% nvc++ -acc -gpu=cc70 -shared -o test.so test1.o
% nvc++ -acc main.cpp test.so -gpu=cc70
main.cpp:
% a.out
Val=8
Done

-Mat

pramod.s.kumbhar · December 9, 2020, 5:08pm

Thanks Mat for providing an example.

In our use case currently its difficult to change the code structure because its generated from DSL to C++ transpiler and difficult to change API. But I will keep this in mind.

Topic		Replies	Views
OpenACC routine/nordc problem Legacy PGI Compilers	5	2877	June 4, 2018
undefined reference to `__pgi_uacc_computestart' Legacy PGI Compilers	8	7771	June 14, 2018
Cannot dynamically load a shared library containing both OpenACC and CUDA code nvc, nvc++ and nvfortran	8	2877	August 24, 2022
cudaGetSymbolAddress error when mixing OpenACC and shared libraries nvc, nvc++ and nvfortran	1	510	July 14, 2022
Shared library with openacc code and ccall only runs on hosts's gpu arch nvc, nvc++ and nvfortran	17	246	July 30, 2024
Dynamically loading an OpenACC-enabled shared library from an executable compiled with nvc++ does not work nvc, nvc++ and nvfortran	5	946	April 13, 2022
Missing relocation entries in shlib compiled with OpenACC nvc, nvc++ and nvfortran	12	178	March 12, 2025
No Available accelerator Legacy PGI Compilers	7	6657	November 9, 2016
Problem with openacc with variable initialization Legacy PGI Compilers	6	6040	July 13, 2020
problem of openacc compiled shared lib on linux Legacy PGI Compilers	1	2687	September 30, 2016

Clarification on using OpenACC in a shared library

Related topics