Compiling Python C Extension with nvc++ using OpenMP offloading cannot run on GPU


I’m trying to build a C extension for python using OpenMP offloading. Compiling works but when using the extension from python and setting OMP_TARGET_OFFLOAD=MANDATORY I get the following error message:

(venv) bash-4.2$ python
2D (dependend) :
Fatal error: Could not run target region on device 0, execution terminated.

The output during compilation looks promising:

nvc++ -Iinclude -I/software/rome/SciPy-bundle/2021.05-foss-2021a/lib/python3.9/site-
packages/numpy/core/include -I/home/h0/seth295c/cmi/venv/include -
I/sw/installed/Python/3.9.5-GCCcore-10.3.0/include/python3.9 -c src/gamma_c.cpp -o
build/temp.linux-x86_64-3.9/src/gamma_c.o -g -O3 -shared -std=c++17 -Minfo=mp -mp=gpu -
mp -target=gpu
cmi::gamma_c_2d_naive(_object *, _object *):
     97, #omp target teams distribute parallel for
97, Generating "nvkernel__ZN3cmi16gamma_c_2d_naiveEP7_objectS1__F1L97_1" GPU
    Generating Tesla and Multicore code
    Generating reduction(+:res,.res22168p)
    Loop parallelized across teams and threads(128), schedule(static)
cmi::gamma_c_2d(_object *, _object *):
    228, #omp parallel
    240, #omp parallel
        240, Generating reduction(+:res)
cmi::gamma_c_2d_independence(_object *, _object *):
    338, #omp parallel
    350, #omp parallel
        350, Generating reduction(+:res)