Hi everyone,
I met with an strange issue related to separate compilation of Cuda Fortran codes. It can be explained in details with a minimum working example like, I want to call a subroutine “test()” from the main program. However, I compiled them separately, which means the subroutine “test” (in file type.cuf) is compiled as dynamic library (type.so) and then linked to the main program named “cudatest” (in file main.f90). I did it in this way (separate compilation) for other reason. For test, there are a same code block of “Cuda cuf kernal” in both of the main body program and the subroutine. And the two compilation process succeeded without any errors or even warnings. Strangely, the kernel in the main body runs normally with right result, but the same kernel part in the subroutine (dynamic library) broke out with the following error
cudaLaunchKernel returned status 98: invalid device function
After google the keywords, the error seems to indicate that the cuf kernel has not been compiled as device codes correctly. But I don’t know the reasons and don’t know how to fix it. I will appreciate it if anyone can offer some useful information.
Besides, there may be some useful information to diagnose the problem. I test it with sdk21.9 (Cuda 11.4) on Tesla V100 . Execute nvidai-smi command, I get
NVIDIA-SMI 470.129.06 Driver Version: 470.129.06 CUDA Version: 11.4
Also, I test it with sdk22.7 (cuda 11.7) on a very clean and new Tesla A100 machine. When testing it with this cuda version (11.7), it occurs with a different error in the same code part, which is
cudaLaunchKernel returned status 500: named symbol not found
Similarly, execute nvidai-smi, I get
NVIDIA-SMI 515.57 Driver Version: 515.57 CUDA Version: 11.7
However, the related code can be compiled correctly and run normally in the same Tesla V100 machine with pgi free community edition 2019-1910 and cuda 10.0.130. I did in this way without any problem in the past two years. However, as the pgi compiler has been integrated into Nvidia hpc-sdk package, there for long-term benefit and convenience, I decide to turn to Nvidia hpc-sdk recently. Then I met with this problem. I have struggled with it for several days and have no idea so far.
The minimum working code includes two source files main.f90 and type.cuf. The first one is compiled into executable and the second one is compiled as dynamic library. The source file and corresponding Makefile reads.
The file main.f90 reads,
program cudatest
use cudafor
use cudatype
implicit none
integer :: ii, istat
real(kind=8) :: mat1(100)
real(kind=8), device :: mat1_d(100)
istat = cudaSetDevice(0)
! test cuf kernel in main body
!$cuf kernel do (1) <<<*,*>>>
do ii = 1, 100
mat1_d(ii) = 1.d0
enddo
mat1 = mat1_d
print *,'Test in main: sum(mat1)=',sum(mat1)
! test cuf kernel from linking dynamic library
call test()
end program
Makefile for main.f90
EXE=gputest
FCMPI=pgf90
FILES=main.o
MODS=$(wildcard *.mod)
UNAME_S=$(shell uname -n)
RM=rm -fv
INCLUDE=-I./tmp
cublas=/public/home/sypeng/soft/hpc_sdk/Linux_x86_64/21.9/math_libs/11.4
LIBS = ./tmp/type.so
LIBS += -L${cublas}/lib64 -lcublas -lblas
FCFLAGS=-fPIC -O3 -traceback -g -Mpreprocess -Mcuda -gpu=cc70 -Mcudalib=cublas $(INCLUDE)
.SUFFIXES: .o .f .f90 .cuf
all: ${EXE}
${EXE}: ${FILES} ${MODS}
${FCMPI} -o $@ ${FILES} ${LIBS} ${FCFLAGS}
main.o:
${FCMPI} ${FCFLAGS} -c main.f90
%.mod: %.f90
@echo “Some modules are out of date. Do clean and then recompile”
${RM} $@ ${EXE}
.PHONY: clean
clean:
${RM} *.o
${RM} *.mod
${RM} ${EXE}
./tmp/type.cuf reads,
module cudatype
implicit none
private
public :: test
contains
subroutine test()
integer :: ii
integer, device :: mat(100)
integer :: mat_h(100)
!$cuf kernel do (1) <<<*,*>>>
do ii = 1, 100
mat(ii) = 1
enddo
mat_h = mat
print *,'Test in dynamic lib: sum(mat2)=',sum(mat_h)
end subroutine
end module
The related Makefile reads,
FCMPI=/public/home/sypeng/soft/hpc_sdk/Linux_x86_64/21.9/compilers/bin/pgf90
FCFLAGS=-fPIC -O3 -traceback -g -Mpreprocess -Mcuda -gpu=cc70 -Mcudalib=cublas
#-fortranlibs
FILES=type.o
MODS=$(wildcard *.mod)
UNAME_S=$(shell uname -n)
RM=rm -fv
cublas=/public/home/sypeng/soft/hpc_sdk/Linux_x86_64/21.9/math_libs/11.4
LIBS = -L${cublas}/lib64 -lcublas -lblas
.SUFFIXES: .o .f .f90 .cuf
all:${FILES} ${MODS}
${FCMPI} -fPIC -shared -Mcuda -Mcudalib=cublas -o type.so ${FILES} ${LIBS}
type.o:
${FCMPI} ${FCFLAGS} -c type.cuf
%.mod: %.f90
@echo “Some modules are out of date. Do clean and then recompile”
${RM} $@ ${EXE}
.PHONY: clean
clean:
${RM} *.o
${RM} *.mod
${RM} *.so
${RM} ${EXE}
When compiling the dynamic library and the main program, I also tried the following compilation flags : -Wl,-export-dynamic, -fortranlibs. But they all do not work.
Problem : How to make the test code run normally with the hpc_sdk 21.9 or newer. And for my personal reason, the subroutine module (type.cuf) should be compiled as dynamic library and then linked to the main program (main.f90).