CUDA 11.7 undefined reference to cusolverDnDtrtri

I installed NVIDIA HPC SDK 22.5 with CUDA 10.2, 11.0, and 11.7. My CUDA Fortran code works with CUDA 10.2 and 11.0, but not 11.7. Below is a minimal reproducer:

$ cat test.f90
program test

  use cublas
  use cusolverdn

  implicit none

  type(cusolverDnHandle) :: handle
  integer :: n,info,lwork
  real(8), device :: mat(2,2),work(2)
  integer, device :: info_d

  info = cusolverDnDtrtri(handle,CUBLAS_FILL_MODE_UPPER,&
       & CUBLAS_DIAG_NON_UNIT,n,mat,n,work,lwork,info_d)

end program

With CUDA 11.7:

$ nvfortran -cuda -gpu=cc80,cuda11.7 -cudalib=cusolver -o test.x test.f90
/usr/bin/ld: /tmp/nvfortran1BKsFbyQtIVw.o: in function `MAIN_':
test.f90:13: undefined reference to `cusolverDnDtrtri'
pgacclnk: child process exit status 1: /usr/bin/ld

With CUDA 11.0:

$ nvfortran -cuda -gpu=cc80,cuda11.0 -cudalib=cusolver -o test.x test.f90
# no error
$ cd /opt/nvidia/hpc_sdk/Linux_x86_64/22.5/math_libs/11.7/targets/x86_64-linux/lib
$ nm libcusolver_static.a | grep cusolverDnDtrtri
0000000000000100 T cusolverDnDtrtriHost
                 U cusolverDnDtrtri
                 U cusolverDnDtrtri_bufferSize
0000000000000620 T cusolverDnDtrtri
0000000000000020 T cusolverDnDtrtri_bufferSize
                 U cusolverDnDtrtri
                 U cusolverDnDtrtri_bufferSize
$ cd /opt/nvidia/hpc_sdk/Linux_x86_64/22.5/math_libs/11.0/targets/x86_64-linux/lib
$ nm libcusolver_static.a | grep cusolverDnDtrtri
0000000000000100 T cusolverDnDtrtriHost
00000000000001f0 T _Z16cusolverDnDtrtriP17cusolverDnContext16cublasFillMode_t16cublasDiagType_tlPdlS3_mS3_mPi
0000000000000120 T _Z27cusolverDnDtrtri_bufferSizeP17cusolverDnContext16cublasFillMode_t16cublasDiagType_tlPdlPmS4_

So cusolverDnDtrtri and cusolverDnDtrtri_bufferSize disappeared in CUDA 11.7?

Thanks,
Victor

It looks like these functions were removed from cusolver in 11.4. Unfortunately, we don’t have Fortran interfaces for the new functions, cusolverDnXtrtri_buffersize and cusolverDnXtrtri, in our cusolver module we ship. I have opened an internal issue, FS#32180, to get those interfaces into our next release.

Thanks a lot for the answer!

May I ask one more question: On the C side I’ve been relying on CUSOLVER_VERSION in cusolverDn.h to figure out the version of cuSOLVER. I wonder how this can be done for Fortran?

And just for your information, I was checking the NVIDIA Fortran CUDA Interfaces section of the NVIDIA HPC SDK documentation, where cusolverDnDtrtri has not been removed. Would be great to synchronize the HPC SDK documentation with cuSOLVER.

We are a little out-of-sync because we use the same Fortran module for all CUDA versions we support in a given release, which for 22.5 for instance was 11.7, 11.0, and 10.2. But, yes, we do try to update the documentation when the modules change (but we don’t remove anything until we no longer ship with a CUDA version that supports the feature). As you point out, that makes the versioning difficult.

Yes, I can see how much work is needed to support 3 versions of CUDA and maintain the docs. Thanks again for the explanation!

Hi Brent,

I’m testing NVIDIA HPC SDK 22.9, which has Fortran interfaces for the functions I need (cusolverDnXtrtri_buffersize and cusolverDnXtrtri). I’m following this example of cusolverDnXgetrf_buffersize you posted last year:

My test code looks like this:

$ cat test.f90

program test_cusolver_workspace_size

   use cudafor
   use cusolverdn

   implicit none

   integer(8), parameter :: nn = 10
   integer :: ierr
   integer(8) :: lwork
   integer(8) :: lwork_d
   real(8), device, allocatable :: mat_d(:,:)
   type(cusolverDnHandle) :: cusolver_h

   ierr = cusolverDnCreate(cusolver_h)
   if(ierr /= 0) print *, "cusolverDnCreate error:", ierr

   ierr = cusolverDnXtrtri_buffersize(cusolver_h, CUBLAS_FILL_MODE_UPPER, CUBLAS_DIAG_NON_UNIT, &
        & nn, cudaDataType(CUSOLVER_R_64F), mat_d, nn, lwork_d, lwork)
   if(ierr /= 0) print *, "cusolverDnXtrtri_buffersize error:", ierr

   print *, "cusolverDnXtrtri_buffersize:", lwork_d, lwork

   ierr = cusolverDnDestroy(cusolver_h)
   if(ierr /= 0) print *, "cusolverDnDestroy error:", ierr

end program

But I’m getting error code 9 from cusolverDnXtrtri_buffersize:

$ nvfortran --version

nvfortran 22.9-0 64-bit target on x86-64 Linux -tp zen2 
NVIDIA Compilers and Tools
Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

$ nvfortran -cuda -gpu=cc80,cuda11.7 -cudalib=cusolver -o test.x test.f90

$ ./test.x

 cusolverDnXtrtri_buffersize error:            9
 cusolverDnXtrtri_buffersize:          140720438772200                  4199965

Do you see any problem with my test code? Looks like error code 9 means “not supported”, but how do I know what’s not supported?

Thanks in advance,

Victor

It looks okay. I wonder if it is because your matrix is not allocated. The init routine might look at the matrix address for alignment or whether it is managed or not.

I tried allocating the matrix with different sizes, but still got error 9 from cusolverDnXtrtri_buffersize.

I also tested cusolverDnXgetrf_buffersize. I got meaningful buffer sizes both with and without allocating the matrix.

Ahh, I see it. cudaDataType(CUDA_R_64F) not cudaDataType(Cusolver_R_64F)! I’m surprised that wasn’t caught by implicit none.

Ah yes now it works. Thanks a lot!

I don’t know from where I copied CUSOLVER_R_64F. Apparently both CUDA_R_64F and CUSOLVER_R_64F exist, but they have different values.

$ cat test.f90

program test
  use cudafor, only : CUDA_R_64F
  use cusolverdn, only : CUSOLVER_R_64F
  print *, "CUDA_R_64F:", CUDA_R_64F
  print *, "CUSOLVER_R_64F:", CUSOLVER_R_64F
end program

$ nvfortran -cuda -gpu=cc80,cuda11.7 -cudalib=cusolver -o test.x test.f90
$ ./test.x

 CUDA_R_64F:            1
 CUSOLVER_R_64F:         1203

My tests seem to work correctly using CUDA_R_64F Thank you again!

I’m glad it works. Your error with CUSOLVER_R_64F seems to be an unfortunate by-product of lack of enum type checking in Fortran. CUSOLVER_R_64F is defined in the module, but corresponds to the cusolverPrecType_t C enum type, not the type the new cusolver routines expect.