Seperate compilation of cuda fortran code concerning dynamic library

pengshiyuj · August 21, 2022, 2:54pm

Hi everyone,

I met with an strange issue related to separate compilation of Cuda Fortran codes. It can be explained in details with a minimum working example like, I want to call a subroutine “test()” from the main program. However, I compiled them separately, which means the subroutine “test” (in file type.cuf) is compiled as dynamic library (type.so) and then linked to the main program named “cudatest” (in file main.f90). I did it in this way (separate compilation) for other reason. For test, there are a same code block of “Cuda cuf kernal” in both of the main body program and the subroutine. And the two compilation process succeeded without any errors or even warnings. Strangely, the kernel in the main body runs normally with right result, but the same kernel part in the subroutine (dynamic library) broke out with the following error

cudaLaunchKernel returned status 98: invalid device function

After google the keywords, the error seems to indicate that the cuf kernel has not been compiled as device codes correctly. But I don’t know the reasons and don’t know how to fix it. I will appreciate it if anyone can offer some useful information.

Besides, there may be some useful information to diagnose the problem. I test it with sdk21.9 (Cuda 11.4) on Tesla V100 . Execute nvidai-smi command, I get

NVIDIA-SMI 470.129.06 Driver Version: 470.129.06 CUDA Version: 11.4

Also, I test it with sdk22.7 (cuda 11.7) on a very clean and new Tesla A100 machine. When testing it with this cuda version (11.7), it occurs with a different error in the same code part, which is

cudaLaunchKernel returned status 500: named symbol not found

Similarly, execute nvidai-smi, I get

NVIDIA-SMI 515.57 Driver Version: 515.57 CUDA Version: 11.7

However, the related code can be compiled correctly and run normally in the same Tesla V100 machine with pgi free community edition 2019-1910 and cuda 10.0.130. I did in this way without any problem in the past two years. However, as the pgi compiler has been integrated into Nvidia hpc-sdk package, there for long-term benefit and convenience, I decide to turn to Nvidia hpc-sdk recently. Then I met with this problem. I have struggled with it for several days and have no idea so far.

The minimum working code includes two source files main.f90 and type.cuf. The first one is compiled into executable and the second one is compiled as dynamic library. The source file and corresponding Makefile reads.
The file main.f90 reads,

program cudatest
    use cudafor
    use cudatype
    implicit none
    integer      ::  ii, istat
    real(kind=8) :: mat1(100)
    real(kind=8), device :: mat1_d(100)

    istat = cudaSetDevice(0)

    ! test cuf kernel in main body 
    !$cuf kernel do (1) <<<*,*>>>
    do ii = 1, 100
        mat1_d(ii) = 1.d0
    enddo
    mat1 = mat1_d
    print *,'Test in main: sum(mat1)=',sum(mat1)

    ! test cuf kernel from linking dynamic library 
    call test()
end program

Makefile for main.f90

EXE=gputest
FCMPI=pgf90

FILES=main.o
MODS=$(wildcard *.mod)
UNAME_S=$(shell uname -n)
RM=rm -fv

INCLUDE=-I./tmp

cublas=/public/home/sypeng/soft/hpc_sdk/Linux_x86_64/21.9/math_libs/11.4
LIBS = ./tmp/type.so
LIBS += -L${cublas}/lib64 -lcublas -lblas

FCFLAGS=-fPIC -O3 -traceback -g -Mpreprocess -Mcuda -gpu=cc70 -Mcudalib=cublas $(INCLUDE)

.SUFFIXES: .o .f .f90 .cuf

all: ${EXE}

${EXE}: ${FILES} ${MODS}
	${FCMPI} -o $@ ${FILES} ${LIBS} ${FCFLAGS}

main.o:
	${FCMPI} ${FCFLAGS} -c main.f90

%.mod: %.f90
	@echo “Some modules are out of date. Do clean and then recompile”
	${RM} $@ ${EXE}

.PHONY: clean

clean:
	${RM} *.o
	${RM} *.mod
	${RM} ${EXE}

./tmp/type.cuf reads,

module cudatype
    implicit none
    private
    public  :: test

    contains
        subroutine test()
            integer :: ii
            integer, device :: mat(100)
            integer :: mat_h(100)
            !$cuf kernel do (1) <<<*,*>>>
            do ii = 1, 100
                mat(ii) = 1
            enddo
            mat_h = mat
            print *,'Test in dynamic lib: sum(mat2)=',sum(mat_h)
        end subroutine 
end module

The related Makefile reads,

FCMPI=/public/home/sypeng/soft/hpc_sdk/Linux_x86_64/21.9/compilers/bin/pgf90

FCFLAGS=-fPIC -O3 -traceback -g -Mpreprocess -Mcuda -gpu=cc70 -Mcudalib=cublas
#-fortranlibs 
FILES=type.o

MODS=$(wildcard *.mod)

UNAME_S=$(shell uname -n)
RM=rm -fv

cublas=/public/home/sypeng/soft/hpc_sdk/Linux_x86_64/21.9/math_libs/11.4
LIBS = -L${cublas}/lib64 -lcublas -lblas

.SUFFIXES: .o .f .f90 .cuf

all:${FILES} ${MODS}
	${FCMPI} -fPIC -shared -Mcuda -Mcudalib=cublas -o type.so ${FILES} ${LIBS}

type.o:
	${FCMPI} ${FCFLAGS} -c type.cuf

%.mod: %.f90
	@echo “Some modules are out of date. Do clean and then recompile”
	${RM} $@ ${EXE}

.PHONY: clean

clean:
	${RM} *.o
	${RM} *.mod
	${RM} *.so
	${RM} ${EXE}

When compiling the dynamic library and the main program, I also tried the following compilation flags : -Wl,-export-dynamic, -fortranlibs. But they all do not work.

Problem : How to make the test code run normally with the hpc_sdk 21.9 or newer. And for my personal reason, the subroutine module (type.cuf) should be compiled as dynamic library and then linked to the main program (main.f90).

MatColgrove · August 22, 2022, 4:18pm

Hi pengshiyuj,

Looks like you just need to update to use the newer “-cuda” and “-cudalibs” flags as opposed to the deprecated “-Mcuda”/“-Mcudalibs” flags.

% pgf90 -V

pgf90 (aka nvfortran) 21.9-0 64-bit target on x86-64 Linux -tp zen
PGI Compilers and Tools
Copyright (c) 2021, NVIDIA CORPORATION & AFFILIATES.  All rights reserved.
% make -f makefile.lib clean all
rm -fv *.o
rm -fv *.mod
rm -fv *.so
removed 'type.so'
rm -fv
pgf90 -fPIC -O3 -g -Mpreprocess -cuda -gpu=ccall -cudalib=cublas -c type.cuf
pgf90 -fPIC -shared -cuda -cudalib=cublas -o type.so type.o -L/public/home/sypeng/soft/hpc_sdk/Linux_x86_64/21.9/math_libs/11.4/lib64 -lcublas -lblas
% make
pgf90 -fPIC -O3 -traceback -g -Mpreprocess -cuda -gpu=ccall -cudalib=cublas -I. -c main.f90
pgf90 -o gputest main.o ./type.so -lblas -fPIC -O3 -traceback -g -Mpreprocess -cuda -gpu=ccall -cudalib=cublas -I.
% ./gputest
 Test in main: sum(mat1)=    100.0000000000000
 Test in dynamic lib: sum(mat2)=          100

Hope this helps,
Mat

pengshiyuj · August 23, 2022, 1:06am

Hi Mat,

It really works for me. Thanks very much!

MatColgrove:

Hi pengshiyuj,

Looks like you just need to update to use the newer “-cuda” and “-cudalibs” flags as opposed to the deprecated “-Mcuda”/“-Mcudalibs” flags.

% pgf90 -V

pgf90 (aka nvfortran) 21.9-0 64-bit target on x86-64 Linux -tp zen
PGI Compilers and Tools
Copyright (c) 2021, NVIDIA CORPORATION & AFFILIATES.  All rights reserved.
% make -f makefile.lib clean all
rm -fv *.o
rm -fv *.mod
rm -fv *.so
removed 'type.so'
rm -fv
pgf90 -fPIC -O3 -g -Mpreprocess -cuda -gpu=ccall -cudalib=cublas -c type.cuf
pgf90 -fPIC -shared -cuda -cudalib=cublas -o type.so type.o -L/public/home/sypeng/soft/hpc_sdk/Linux_x86_64/21.9/math_libs/11.4/lib64 -lcublas -lblas
% make
pgf90 -fPIC -O3 -traceback -g -Mpreprocess -cuda -gpu=ccall -cudalib=cublas -I. -c main.f90
pgf90 -o gputest main.o ./type.so -lblas -fPIC -O3 -traceback -g -Mpreprocess -cuda -gpu=ccall -cudalib=cublas -I.
% ./gputest
 Test in main: sum(mat1)=    100.0000000000000
 Test in dynamic lib: sum(mat2)=          100

Hope this helps,
Mat

Topic		Replies	Views
Problems dynamic compilation and cuda fortran Legacy PGI Compilers	0	2342	May 21, 2010
Cuda fortran doesnt launch subroutines containing gpu code Legacy PGI Compilers	3	2444	May 26, 2018
Compilation issues between Fortran with MPI and CUDA Fortran nvc, nvc++ and nvfortran	3	1627	March 2, 2021
Different results with -Mcuda=emu / -Mcuda with simple code Legacy PGI Compilers	17	15445	December 10, 2009
Using cudaGetDeviceProperties in nvfortran nvc, nvc++ and nvfortran	27	978	December 16, 2023
About dynamic parallelism of CUDA Fortran Legacy PGI Compilers	7	9286	December 2, 2016
Undefined reference to MAIN_" error using nvfortran nvc, nvc++ and nvfortran cuda	15	691	June 3, 2024
Can I Write CUDA Fortran Code in a Fortran File(.F90) Legacy PGI Compilers	5	7502	November 20, 2017
Compiling error nvc, nvc++ and nvfortran	1	52	May 19, 2025
linking CUDA fortran and gfortran Legacy PGI Compilers	3	5698	July 6, 2011

Seperate compilation of cuda fortran code concerning dynamic library

Related topics