Get compiler issue using CUDA in the OpenMP GPU offloading code

Hi,

We have one OpenMP GPU offloading code (compiled using nvfortran/nvc) calling CUDA functions (compiled using nvc). It works fines with smaller cases. But when increase the array sizes, we got error for the cuda program

in function `__cudaUnregisterBinaryUtil()':
tmpxft_00003429_00000000-6_kernels.cudafe1.cpp:(.text+0x3): relocation truncated to fit: R_X86_64_PC32 against `.bss'
obj/kernels.o: in function `init()':
tmpxft_00003429_00000000-6_kernels.cudafe1.cpp:(.text+0x1b): relocation truncated to fit: R_X86_64_PC32 against `.bss'
tmpxft_00003429_00000000-6_kernels.cudafe1.cpp:(.text+0x31): relocation truncated to fit: R_X86_64_PC32 against `.bss'
tmpxft_00003429_00000000-6_kernels.cudafe1.cpp:(.text+0x45): relocation truncated to fit: R_X86_64_PC32 against `.bss

The flags such as

nvcc -shared -Xcompiler -fPIC ...

seem not to fix the issue.

Is there any similar flag “-mcmodel=medium” available for nvcc with .bss section extend to 2GB?

Thanks. /Jing

Hi Jing,

The error looks to be coming from the host code so you’d want try passing the “-mcmodel=medium” option to the host compiler via the -Xcompiler option.

-Mat

Hi Mat,

he error looks to be coming from the host code so you’d want try passing the “-mcmodel=medium” option to the host compiler via the -Xcompiler option.

The flag “-mcmodel=medium” has been added to compiler nvfortran/nvc for the OpenMP host code and “-Xcompiler” added to nvcc for CUDA code. Without calling the cuda kernels, the OpenMP offloading code works for large cases. Do any other flags need to be added?

Thanks. /Jing

Hi Jing,

I can’t find anything in the nvcc documentation related to setting it to use the medium memory model. nvcc is developed by a different team within NVIDIA so I don’t know it’s internals, but my best guess is that the error, given the filename, is occurring in a compiler auto-generated routine to handle the unregistration of the device binary.

We can move this question over to the CUDA forums, and they may have better ideas, but as a work around, I’m wondering if you can try using nvc++ instead of nvcc? While not official, we have added some support for CUDA C++ in nvc++. No guarentee that it will work since support since full support is still under development, but might be worth a try.

Though is the array in question fixed size? If so, can you instead make it allocatable?

-Mat

Hi Mat,

We can move this question over to the CUDA forums, and they may have better ideas, but as a work around, I’m wondering if you can try using nvc++ instead of nvcc? While not official, we have added some support for CUDA C++ in nvc++. No guarentee that it will work since support since full support is still under development, but might be worth a try.

nvc++ works properly though the performance seems to be a bit worse (doing some benchmarking tests)

Though is the array in question fixed size? If so, can you instead make it allocatable?

Yes, all arrays are fixed sizes. It can be changed in the mini-app but perhaps difficult in the full code.

Thanks. /Jing