At runtime: "Fatal error: Registered function 'nvkernel_xyz_foo_16_' not found in the CUBIN, error 1"

Hi, I got this very weird error on a program that is using OpenMP for offloading to GPU:

Fatal error: Registered function 'nvkernel_xyz_foo_16_' not found in the CUBIN, error 1

I must to say that the actual code is quite large and complex (dozens of GPU kernels) and every attempt to create a small reproducible code was unsuccessful so far.

I said that the error is “weird”, because most of the kernels are generated correctly (according to the compiler feedback: 123, Generating “nvkernel_xyz_foo_16” GPU kernel). However, just a few are missed. I can verify that from the binary generated through the cuobjdump utility.

As the error says, how is it possible that a kernel is generated, even registered but not found in the final binary?

I’ve only seen this once before when a user was missing a “declare target” directive around a routine declaration so the device version of routine didn’t get created. So it might not be the kernel itself, but rather a routine it’s calling, or possibly a global variable such as a Fortron module variable.

-Mat

Thanks for answering, Mat. My bad, I forgot to mention that I’m offloading a Fortran application.

But, the problem still persists, I’ve double-checked the missing kernels, and they are pretty similar to the others that are included in the cubin file. So, there is nothing special about those omitted kernels.

I’m trying to create again a small reproducible example focusing on the module global variable, though.

That would be great. The symbol name, i.e. the “16_” suffix, implies that it’s a module variable that missing.

Note that I’m on vacation for the holiday break, so my not respond until next year.