Nvlink error: Undefined reference to

Hello everyone,

I’m dealing with the acceleration of a CFD code in MPI Fortran 90 with OpenACC. I’m using nvhpc-24.1 and I’m compiling the code with the mpif90 wrapper. In a subroutine, I use the DGESV of Lapack to solve a system of equation. In particular the call is the following:

call DGESV(4, 1, Mat, 4, WORK, f, 4, INFO)

where Mat is the system matrix and f is the input and also the solution output (indeed it is overwritten). Unfortunately, this lapack subroutine does not work with OpenACC and I decided to move to cuSOLVER. In particular, I’m using cusolverDnDDgesv that I call though a C++ wrapper. For this reason, I wrote the following:

#include <cusolverDn.h>
#include <cuda_runtime.h>
#include <stdio.h>

extern "C" {
    void cusolver_solve(double* A, double* B, int n, int nrhs, int* info) {
        cusolverDnHandle_t handle;
        int* devIpiv;      // GPU pivot array
        int* devInfo;      // GPU info array
        double* devA;      // Matrix A on GPU
        double* devB;      // Matrix B on GPU
        double* devX;      // Results on GPU
        void* dWorkspace;  // GPU workspace
        size_t workspace_size = 0; // Workspace dimension
        int lda = n, ldb = n, ldx = n; // Leading dimensions

        // GPU memory allocation
        cudaMalloc((void**)&devIpiv, n * sizeof(int));
        cudaMalloc((void**)&devInfo, sizeof(int));
        cudaMalloc((void**)&devA, n * n * sizeof(double));
        cudaMalloc((void**)&devB, n * nrhs * sizeof(double));
        cudaMalloc((void**)&devX, n * nrhs * sizeof(double));

        // Copy from CPU to GPU
        cudaMemcpy(devA, A, n * n * sizeof(double), cudaMemcpyHostToDevice);
        cudaMemcpy(devB, B, n * nrhs * sizeof(double), cudaMemcpyHostToDevice);

        // cuSOLVER handler creation
        cusolverDnCreate(&handle);

        // Query for workspace dimension
        cusolverDnDDgesv_bufferSize(handle, n, nrhs, devA, lda, devIpiv, devB, ldb, devX, ldx, NULL, &workspace_size);

        // workspace allocation
        cudaMalloc((void**)&dWorkspace, workspace_size);

        // System Ax = B
        cusolverDnDDgesv(handle, n, nrhs, devA, lda, devIpiv, devB, ldb, devX, ldx, dWorkspace, workspace_size, NULL, devInfo);

        // Copy from GPU to CPU
        cudaMemcpy(B, devX, n * nrhs * sizeof(double), cudaMemcpyDeviceToHost);
        cudaMemcpy(info, devInfo, sizeof(int), cudaMemcpyDeviceToHost);

        // memory deallocation
        cudaFree(devIpiv);
        cudaFree(devInfo);
        cudaFree(devA);
        cudaFree(devB);
        cudaFree(devX);
        cudaFree(dWorkspace);

        // handle destruction
        cusolverDnDestroy(handle);
    }
}                                                            

that I compile with:

nvcc -c solver_wrapper.cpp -o solver_wrapper.o

In the final program linking I use -lcudart -lcusolver. After that, I modified the fortran code in this way:

SUBROUTINE Rad_Eddington

USE openacc
USE iso_c_binding
IMPLICIT NONE

INTERFACE
       subroutine cusolver_solve(A, B, n, nrhs, info) bind(C, name="cusolver_solve")
        !$acc routine
        use iso_c_binding
        implicit none
        real(c_double), intent(inout) :: A(*)
        real(c_double), intent(inout) :: B(*)
        integer(c_int), intent(in) :: n, nrhs
        integer(c_int), intent(out) :: info
       end subroutine cusolver_solve
END INTERFACE

... REST OF THE CODE ...

call cusolver_solve(Mat, f, 4, 1, info)

.... REST OF THE CODE ...

The problem I’m dealing with is during compilation. In fact, when I compile the code I obtain:

nvlink error : Undefined reference to ‘cusolver_solve’ in ‘Radiation_Eddington_PEER4S.o’

I already encountered this issue in the past but I was never able to solve it. Any suggestion?

Thanks a lot for your help and time!

Hi Matteo,

The error is because while you have the “acc routine” in the interface module, the actual C code isn’t a device routine. You’d need the routine defined with a “device” attribute.

However, you can’t call “cudaMalloc” or cuSolver routines from device code, so the real solution is to not call “cusolver_solve” from an OpenACC compute region.

-Mat

Hi Mat,

thanks for the help!! I didn’t know I was not able to call cuSolver routines from an OpenACC compute region. Do you know any alternatives for Lapack DGESV to be call in OpenACC?

Thanks again!

All the CUDA Math libraries can only be called from the host. Note that this limitation applies to all GPU programming models, not just OpenACC.

Possibly there’s a third party library or CUDA project that implements these out, but I’m not aware of any.

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.