Hi Loriano,
This is an interesting one, though seems like a very specific case. The problem seems to be limited to using fixed size module arrays in a declare directive. My best guess is that there is some type initialization issue when the SO is loaded in python that doesn’t occur when loaded by ld. While I don’t know if it’s relevant, strace shows mprotect getting called after loading the SO from python. Possibly this added memory protection is causing the issue. This is purely a guess, so I’ve created an issue report, TPR # 34208, and will let engineering determine the root cause and if it’s something we can fix or not.
The good news is that I have two work arounds for you.
The first is to remove the “declare” directives and instead use an “enter data” directive in the init routine. This will delay the device array creating until runtime as opposed to the library load.
The second is to instead change the arrays to be allocatable instead of fixed size and add an “allocate” in the init routine. The “declare” is still used. Again the device array creating is delayed until the array allocation.
Here’s the modified code with the work arounds:
pycudaf.tar (450 KB)
Case 1:
% make clean
% make EXTRA="-DUSE_CASE1"
nvfortran -acc=gpu -Minfo=accel -cuda -cudalib=cublas,cusolver -r8 -Minform=warn -Mextend -O3 -cudalib=cublas -fopenmp -fpic -DUSECUDANV -DUSE_CASE1 -o funfm.o -c funfm.F
nvfortran -acc=gpu -Minfo=accel -cuda -cudalib=cublas,cusolver -r8 -Minform=warn -Mextend -O3 -cudalib=cublas -fopenmp -fpic -DUSECUDANV -DUSE_CASE1 -o gfinit.o -c gfinit.F
gfinit:
23, Generating enter data create(t(:),fm(:,:))
25, Generating update device(t(:),fm(:,:))
nvcc -D_FILE_OFFSET_BITS=64 -O3 --compiler-options '-fopenmp' --compiler-options '-fPIC' -o c_wrapper.o -c c_wrapper.c
nvfortran -acc=gpu -Minfo=accel -cuda -cudalib=cublas,cusolver -r8 -Minform=warn -Mextend -O3 -cudalib=cublas -fopenmp -fpic -DUSECUDANV -DUSE_CASE1 -o bertha_wrapper.o -c bertha_wrapper.F
nvfortran -shared -acc=gpu -Minfo=accel -cuda -cudalib=cublas,cusolver -fopenmp funfm.o gfinit.o c_wrapper.o bertha_wrapper.o -o bertha_wrapper.so
nvfortran -acc=gpu -Minfo=accel -cuda -cudalib=cublas,cusolver -r8 -Minform=warn -Mextend -O3 -cudalib=cublas -fopenmp -fpic -DUSECUDANV -DUSE_CASE1 -o main.o -c main.F
nvfortran -acc=gpu -Minfo=accel -cuda -cudalib=cublas,cusolver -fopenmp main.o -o testb bertha_wrapper.so
f% python3 ./pybertha.py
0.0
1e-05
Case 2:
% make clean
rm -f *.o *.mod *__genmod.f90 bertha_wrapper.so testb
% make EXTRA="-DUSE_CASE2"
nvfortran -acc=gpu -Minfo=accel -cuda -cudalib=cublas,cusolver -r8 -Minform=warn -Mextend -O3 -cudalib=cublas -fopenmp -fpic -DUSECUDANV -DUSE_CASE2 -o funfm.o -c funfm.F
nvfortran -acc=gpu -Minfo=accel -cuda -cudalib=cublas,cusolver -r8 -Minform=warn -Mextend -O3 -cudalib=cublas -fopenmp -fpic -DUSECUDANV -DUSE_CASE2 -o gfinit.o -c gfinit.F
gfinit:
25, Generating update device(t(:),fm(:,:))
nvcc -D_FILE_OFFSET_BITS=64 -O3 --compiler-options '-fopenmp' --compiler-options '-fPIC' -o c_wrapper.o -c c_wrapper.c
nvfortran -acc=gpu -Minfo=accel -cuda -cudalib=cublas,cusolver -r8 -Minform=warn -Mextend -O3 -cudalib=cublas -fopenmp -fpic -DUSECUDANV -DUSE_CASE2 -o bertha_wrapper.o -c bertha_wrapper.F
nvfortran -shared -acc=gpu -Minfo=accel -cuda -cudalib=cublas,cusolver -fopenmp funfm.o gfinit.o c_wrapper.o bertha_wrapper.o -o bertha_wrapper.so
nvfortran -acc=gpu -Minfo=accel -cuda -cudalib=cublas,cusolver -r8 -Minform=warn -Mextend -O3 -cudalib=cublas -fopenmp -fpic -DUSECUDANV -DUSE_CASE2 -o main.o -c main.F
nvfortran -acc=gpu -Minfo=accel -cuda -cudalib=cublas,cusolver -fopenmp main.o -o testb bertha_wrapper.so
% python3 ./pybertha.py
0.0
1e-05
-Mat