Hello! I’m using WSL2 and trying to use CUDA Fortran in conjunction with R. I have the multi version of CUDA, with 11.7 and 11.0.
Here is the CUDA compilation:
nvfortran -c -fPIC cuda1a.cuf -o cuda1a.o -ta=tesla:nordc -V22.7 -Mcudalib=cublas -lcuda
nvfortran -shared -fPIC cuda1a.o -o cuda1a.so -ta=tesla:nordc -V22.7 -Mcudalib=cublas -lcuda
And here is the R output;
dyn.load("cuda1a.so")
Error in dyn.load("cuda1a.so") :
unable to load shared object '/home/ehodgess/cuda1a.so':
/opt/nvidia/hpc_sdk/Linux_x86_64/22.7/compilers/lib/libcudaforwrapblas117.so: undefined symbol: cublasSgemvBatched
I also tried:
nvfortran -c -fPIC cuda1a.cuf -o cuda1a.o -ta=tesla:nordc -Mcuda=cuda11.0 -Mcudalib=cublas -lcuda nvfortran -shared -fPIC cuda1a.o -o cuda1a.so -ta=tesla:nordc -Mcuda=cuda11.0 -Mcudalib=cublas -lcuda
And the R output is:
> dyn.load("cuda1a.so") > .Fortran("t2",as.integer(50),as.integer(1:50),as.single(0.0)) 0: ALLOCATE: 200 bytes requested; status = 100(no CUDA-capable device is detected)
Finally, here is the CUDA Fortran subroutine:
module mytests contains attributes (global) subroutine test1(a) integer, device :: a(*) !real, device :: a(*) i = threadIdx%x a(i) = i + 2 !a(i) = a(i) + 2.0*i return end subroutine test1 end module mytests subroutine t2(n,h,xt) !DEC$ ATTRIBUTES DLLEXPORT :: t2 use cudafor use mytests integer, allocatable, device :: iarr(:) !real, allocatable, device :: iarr(:) integer n,h(n) !integer n !real :: h(n) real :: xt,x1,x2 type(dim3) :: grid, tBlock istat = cudaSetDevice(0) allocate(iarr(n)) !h = 0; iarr = h x1=0.0;x2=0.0 tBlock = dim3(512,1,1) grid = dim3(ceiling(real(N)/tBlock%x),1,1) call cpu_time(x1) call test1<<<grid,tBlock>>> (iarr) h = iarr call cpu_time(x2) deallocate(iarr) xt = x2-x1 end subroutine t2
Any suggestions much appreciated.
Thanks,
Erin