Hello everyone,
I’m dealing with the acceleration of a CFD code in MPI Fortran 90 with OpenACC. I’m using nvhpc-24.1 and I’m compiling the code with the mpif90
wrapper. In a subroutine, I use the DGESV
of Lapack to solve a system of equation. In particular the call is the following:
call DGESV(4, 1, Mat, 4, WORK, f, 4, INFO)
where Mat
is the system matrix and f
is the input and also the solution output (indeed it is overwritten). Unfortunately, this lapack subroutine does not work with OpenACC and I decided to move to cuSOLVER. In particular, I’m using cusolverDnDDgesv
that I call though a C++ wrapper. For this reason, I wrote the following:
#include <cusolverDn.h>
#include <cuda_runtime.h>
#include <stdio.h>
extern "C" {
void cusolver_solve(double* A, double* B, int n, int nrhs, int* info) {
cusolverDnHandle_t handle;
int* devIpiv; // GPU pivot array
int* devInfo; // GPU info array
double* devA; // Matrix A on GPU
double* devB; // Matrix B on GPU
double* devX; // Results on GPU
void* dWorkspace; // GPU workspace
size_t workspace_size = 0; // Workspace dimension
int lda = n, ldb = n, ldx = n; // Leading dimensions
// GPU memory allocation
cudaMalloc((void**)&devIpiv, n * sizeof(int));
cudaMalloc((void**)&devInfo, sizeof(int));
cudaMalloc((void**)&devA, n * n * sizeof(double));
cudaMalloc((void**)&devB, n * nrhs * sizeof(double));
cudaMalloc((void**)&devX, n * nrhs * sizeof(double));
// Copy from CPU to GPU
cudaMemcpy(devA, A, n * n * sizeof(double), cudaMemcpyHostToDevice);
cudaMemcpy(devB, B, n * nrhs * sizeof(double), cudaMemcpyHostToDevice);
// cuSOLVER handler creation
cusolverDnCreate(&handle);
// Query for workspace dimension
cusolverDnDDgesv_bufferSize(handle, n, nrhs, devA, lda, devIpiv, devB, ldb, devX, ldx, NULL, &workspace_size);
// workspace allocation
cudaMalloc((void**)&dWorkspace, workspace_size);
// System Ax = B
cusolverDnDDgesv(handle, n, nrhs, devA, lda, devIpiv, devB, ldb, devX, ldx, dWorkspace, workspace_size, NULL, devInfo);
// Copy from GPU to CPU
cudaMemcpy(B, devX, n * nrhs * sizeof(double), cudaMemcpyDeviceToHost);
cudaMemcpy(info, devInfo, sizeof(int), cudaMemcpyDeviceToHost);
// memory deallocation
cudaFree(devIpiv);
cudaFree(devInfo);
cudaFree(devA);
cudaFree(devB);
cudaFree(devX);
cudaFree(dWorkspace);
// handle destruction
cusolverDnDestroy(handle);
}
}
that I compile with:
nvcc -c solver_wrapper.cpp -o solver_wrapper.o
In the final program linking I use -lcudart -lcusolver
. After that, I modified the fortran code in this way:
SUBROUTINE Rad_Eddington
USE openacc
USE iso_c_binding
IMPLICIT NONE
INTERFACE
subroutine cusolver_solve(A, B, n, nrhs, info) bind(C, name="cusolver_solve")
!$acc routine
use iso_c_binding
implicit none
real(c_double), intent(inout) :: A(*)
real(c_double), intent(inout) :: B(*)
integer(c_int), intent(in) :: n, nrhs
integer(c_int), intent(out) :: info
end subroutine cusolver_solve
END INTERFACE
... REST OF THE CODE ...
call cusolver_solve(Mat, f, 4, 1, info)
.... REST OF THE CODE ...
The problem I’m dealing with is during compilation. In fact, when I compile the code I obtain:
nvlink error : Undefined reference to ‘cusolver_solve’ in ‘Radiation_Eddington_PEER4S.o’
I already encountered this issue in the past but I was never able to solve it. Any suggestion?
Thanks a lot for your help and time!