Compiling Python wrappers with F2PY and CUDA Fortran

MatColgrove · October 17, 2020, 3:31pm

Yes, this looks to be the issue in that “-Mcuda” isn’t being added to the link when creating the share object. If you can figure out how add “-Mcuda” to the link flags, that would be ideal.

I did find this post on StackOverflow which suggests that you can set the environment variable “LDFLAGS=-Mcuda” to set the f2py linker flags so you may want to try it. Setting NPY_DISTUTILS_APPEND_FLAGS=1 looks necessary as well so it doesn’t overwrite the other linker flags.

If NPY_DISTUTILS_APPEND_FLAGS isn’t functional in your version of f2py (it seems to be numpy specific), then you might need to set LDFLAGS to “-shared -fpic -Mcuda”.

I could try manually specifying all the required libraries that -MCuda provides (specifying this from f2py does not work for the link stage)… Do you know which libraries Mcuda proxies for?

You can, but it’s a little more complex that just adding the libraries. In addition, the “-Mcuda” flag tells the compiler to also run a device code link step when creating the shared object. If you add the libraries but hand, you’ll need to also compile the code with “-Mcuda=nordc” so the device link isn’t required. Though without RDC enabled, some CUDA Fortran features are disabled such the ability to call device routines not in the same module or accessing device module variables outside of the module in which their defined.

Second, the “-Mcuda” flag can use different CUDA versions, selecting the one to use based on the NVIDIA driver version being used, if “CUDA_HOME” is set, or if the users has selected a particular CUDA version via “-Mcuda=cudaX.y”. The included libraries can be different depending of the CUDA version being used.

Finally, the libraries can change from release to release, so the exact libraries used is release dependent.

The best way to determine what flags to add, is to run the command: “pgfortran -dryrun -Mcuda=nordc -shared test.o -o libtest.so”. “-dryrun” will show you the commands the compiler driver would execute, but not actually do them. “-v” (verbose) also shows the driver commands, but does perform them.

Here’s the ld command with 19.10 using my local install:

/usr/bin/ld /usr/lib/x86_64-linux-gnu/crti.o /proj/pgi/linux86-64-llvm/19.10/lib/trace_init.o /home/sw/thirdparty/gcc/gcc-9.2.0/linux86-64/lib/gcc/x86_64-pc-linux-gnu/9.2.0/crtbeginS.o --eh-frame-hdr -m elf_x86_64 -dynamic-linker /lib64/ld-linux-x86-64.so.2 /proj/pgi/linux86-64-llvm/19.10/lib/pgi.ld -L/proj/pgi/linux86-64-llvm/19.10/lib -L/usr/lib64 -L/home/sw/thirdparty/gcc/gcc-9.2.0/linux86-64/lib/gcc/x86_64-pc-linux-gnu/9.2.0 test2.o -rpath /proj/pgi/linux86-64-llvm/19.10/lib -rpath /proj/pgi/linux86-64-llvm/2019/cuda/10.1/lib64 -rpath /home/sw/thirdparty/gcc/gcc-9.2.0/linux86-64/lib/gcc/x86_64-pc-linux-gnu/9.2.0/../../../../lib64 -o libtest.so -shared /proj/pgi/linux86-64-llvm/19.10/lib/pgiloc.ld -L/home/sw/thirdparty/gcc/gcc-9.2.0/linux86-64/lib/gcc/x86_64-pc-linux-gnu/9.2.0/../../../../lib64 -lcudafor101 -lcudafor -lcudaforblas101 /proj/pgi/linux86-64-llvm/19.10/lib/cuda_init_register_end.o -L/proj/pgi/linux86-64-llvm/2019/cuda/10.1/lib64 -lcudadevrt -lcudart -lcudafor2 -lpgf90rtl -lpgf90 -lpgf90_rpm1 -lpgf902 -lpgf90rtl -lpgftnrtl -lpgatm -lpgkomp -lomp -as-needed -lomptarget -no-as-needed -lpthread --start-group -lpgmath -lpgc --end-group -lrt -lpthread -lm -lgcc -lc -lgcc -lgcc_s /home/sw/thirdparty/gcc/gcc-9.2.0/linux86-64/lib/gcc/x86_64-pc-linux-gnu/9.2.0/crtendS.o /usr/lib/x86_64-linux-gnu/crtn.o

You can then compare this to another dryrun without -Mcuda=nordc to see the added library paths, libraries and objects.

So in my case where CUDA 10.1 is being used, I’d want to add “-L/proj/pgi/linux86-64-llvm/2019/cuda/10.1/lib64 -lcudafor101 -lcudafor -lcudaforblas101 /proj/pgi/linux86-64-llvm/19.10/lib/cuda_init_register_end.o -lcudadevrt -lcudart -lcudafor2”

Topic		Replies	Views
MPICH linking failing Legacy PGI Compilers	12	12599	October 25, 2013
Using cudaGetDeviceProperties in nvfortran nvc, nvc++ and nvfortran	27	711	December 16, 2023
Problem with NVFORTRAN and R nvc, nvc++ and nvfortran	46	2773	April 25, 2024
Linking Error in OpenACC Code Legacy PGI Compilers	5	7064	January 6, 2017
Installing pytorch - /usr/local/cuda/lib64/libcudnn.so: error adding symbols: File in wrong format collect2: error: ld returned 1 exit status Jetson TX2 pytorch	20	5379	March 11, 2022
nvlink error : Multiple definitions on compile Legacy PGI Compilers	4	8683	March 7, 2014
F2PY + PGI 17.4 issues Legacy PGI Compilers	4	5075	August 23, 2017
Run time error for f2py wrapped fortran Legacy PGI Compilers	14	26265	July 13, 2007
HPL on Kepler GPUs CUDA Programming and Performance	3	5089	March 12, 2018
The detected CUDA version mismatches the version that was used to compile Pytorch CUDA Setup and Installation cuda , pytorch	4	6580	May 6, 2025

Compiling Python wrappers with F2PY and CUDA Fortran

Related topics