I installed NVIDIA HPC SDK 22.5 with CUDA 10.2, 11.0, and 11.7. My CUDA Fortran code works with CUDA 10.2 and 11.0, but not 11.7. Below is a minimal reproducer:
$ cat test.f90
program test
use cublas
use cusolverdn
implicit none
type(cusolverDnHandle) :: handle
integer :: n,info,lwork
real(8), device :: mat(2,2),work(2)
integer, device :: info_d
info = cusolverDnDtrtri(handle,CUBLAS_FILL_MODE_UPPER,&
& CUBLAS_DIAG_NON_UNIT,n,mat,n,work,lwork,info_d)
end program
With CUDA 11.7:
$ nvfortran -cuda -gpu=cc80,cuda11.7 -cudalib=cusolver -o test.x test.f90
/usr/bin/ld: /tmp/nvfortran1BKsFbyQtIVw.o: in function `MAIN_':
test.f90:13: undefined reference to `cusolverDnDtrtri'
pgacclnk: child process exit status 1: /usr/bin/ld
$ cd /opt/nvidia/hpc_sdk/Linux_x86_64/22.5/math_libs/11.7/targets/x86_64-linux/lib
$ nm libcusolver_static.a | grep cusolverDnDtrtri
0000000000000100 T cusolverDnDtrtriHost
U cusolverDnDtrtri
U cusolverDnDtrtri_bufferSize
0000000000000620 T cusolverDnDtrtri
0000000000000020 T cusolverDnDtrtri_bufferSize
U cusolverDnDtrtri
U cusolverDnDtrtri_bufferSize
$ cd /opt/nvidia/hpc_sdk/Linux_x86_64/22.5/math_libs/11.0/targets/x86_64-linux/lib
$ nm libcusolver_static.a | grep cusolverDnDtrtri
0000000000000100 T cusolverDnDtrtriHost
00000000000001f0 T _Z16cusolverDnDtrtriP17cusolverDnContext16cublasFillMode_t16cublasDiagType_tlPdlS3_mS3_mPi
0000000000000120 T _Z27cusolverDnDtrtri_bufferSizeP17cusolverDnContext16cublasFillMode_t16cublasDiagType_tlPdlPmS4_
So cusolverDnDtrtri and cusolverDnDtrtri_bufferSize disappeared in CUDA 11.7?
It looks like these functions were removed from cusolver in 11.4. Unfortunately, we don’t have Fortran interfaces for the new functions, cusolverDnXtrtri_buffersize and cusolverDnXtrtri, in our cusolver module we ship. I have opened an internal issue, FS#32180, to get those interfaces into our next release.
May I ask one more question: On the C side I’ve been relying on CUSOLVER_VERSION in cusolverDn.h to figure out the version of cuSOLVER. I wonder how this can be done for Fortran?
And just for your information, I was checking the NVIDIA Fortran CUDA Interfaces section of the NVIDIA HPC SDK documentation, where cusolverDnDtrtri has not been removed. Would be great to synchronize the HPC SDK documentation with cuSOLVER.
We are a little out-of-sync because we use the same Fortran module for all CUDA versions we support in a given release, which for 22.5 for instance was 11.7, 11.0, and 10.2. But, yes, we do try to update the documentation when the modules change (but we don’t remove anything until we no longer ship with a CUDA version that supports the feature). As you point out, that makes the versioning difficult.
I’m testing NVIDIA HPC SDK 22.9, which has Fortran interfaces for the functions I need (cusolverDnXtrtri_buffersize and cusolverDnXtrtri). I’m following this example of cusolverDnXgetrf_buffersize you posted last year:
It looks okay. I wonder if it is because your matrix is not allocated. The init routine might look at the matrix address for alignment or whether it is managed or not.
I don’t know from where I copied CUSOLVER_R_64F. Apparently both CUDA_R_64F and CUSOLVER_R_64F exist, but they have different values.
$ cat test.f90
program test
use cudafor, only : CUDA_R_64F
use cusolverdn, only : CUSOLVER_R_64F
print *, "CUDA_R_64F:", CUDA_R_64F
print *, "CUSOLVER_R_64F:", CUSOLVER_R_64F
end program
$ nvfortran -cuda -gpu=cc80,cuda11.7 -cudalib=cusolver -o test.x test.f90
$ ./test.x
CUDA_R_64F: 1
CUSOLVER_R_64F: 1203
My tests seem to work correctly using CUDA_R_64F Thank you again!
I’m glad it works. Your error with CUSOLVER_R_64F seems to be an unfortunate by-product of lack of enum type checking in Fortran. CUSOLVER_R_64F is defined in the module, but corresponds to the cusolverPrecType_t C enum type, not the type the new cusolver routines expect.