Objtool: 'naked' return found in MITIGATION_RETHUNK build with pre-compiled blobs on kernel 6.19

The pre-compiled binary blobs nv-kernel.o_binary and nv-modeset-kernel.o_binary shipped with driver 590.48.01 fail objtool validation on kernel 6.19 when CONFIG_MITIGATION_RETHUNK is enabled. The blobs were compiled without
-mfunction-return=thunk-extern and -mharden-sls=all, causing hundreds of errors during module linking.

Environment

  • Driver version: 590.48.01
  • OS: openSUSE Tumbleweed
  • Kernel: 6.19.0-lowlatency-sunlight1
  • GPU: NVIDIA GeForce RTX 3060 Laptop
  • Architecture: x86_64
  • Kernel config: CONFIG_MITIGATION_RETHUNK=y
  • NVIDIA Proprietary driver + MIT/GPL driver

Build errors

The build fails at the composite module link step when objtool runs on nvidia.o and nvidia-modeset.o:

nvidia.o: error: objtool: sessionAddDependant_IMPL+0x86: ‘naked’ return found in MITIGATION_RETHUNK build
nvidia.o: error: objtool: serverAllocShareWithHalspecParent+0xbb: ‘naked’ return found in MITIGATION_RETHUNK build
nvidia.o: error: objtool: kgspGetBinArchiveGspRmBoot_TU102+0xb: ‘naked’ return found in MITIGATION_RETHUNK build
nvidia.o: error: objtool: libosLogInit+0xbe: ‘naked’ return found in MITIGATION_RETHUNK build

nvidia-modeset.o: error: objtool: nvstatusToString+0x3a: ‘naked’ return found in MITIGATION_RETHUNK build
nvidia-modeset.o: error: objtool: f32_div+0x8d: ‘naked’ return found in MITIGATION_RETHUNK build

make[4]: *** [scripts/Makefile.build:509: nvidia.o] Error 1
make[4]: *** [scripts/Makefile.build:509: nvidia-modeset.o] Error 1

Root cause

PR #1015 added -mfunction-return=thunk-extern and -mharden-sls=all to src/nvidia/Makefile and src/nvidia-modeset/Makefile, but the pre-compiled blobs shipped in the installer package were not rebuilt with these flags. The
blobs contain plain ret instructions instead of __x86_return_thunk calls, and objtool with --rethunk --werror rejects them.

Workaround

Adding the following to kernel-open/Kbuild and kernel/Kbuild skips objtool on composite module objects containing the blobs:

$(obj)/nvidia.o: private objtool := true
$(obj)/nvidia-modeset.o: private objtool := true

The driver loads and works correctly after this, but produces a runtime warning:

Unpatched return thunk in use. This should not happen!
WARNING: arch/x86/kernel/cpu/bugs.c:3737

Expected fix

The nv-kernel.o_binary and nv-modeset-kernel.o_binary blobs should be recompiled with -mfunction-return=thunk-extern and -mharden-sls=all so they pass objtool validation on kernels with CONFIG_MITIGATION_RETHUNK enabled,
without requiring the workaround above.

Thanks for the explanation and work-around. Modules built and loaded fine (with the warning) in 6.19.

The (initial) issue is still present with the 595.54.04 BETA driver release. Thanks to the OP for reporting it.

Edit: I referenced this thread in the 595 release thread:

Thank you for the report. Tracked internally on Bug #4452776. Currently under Engineering investigation.

Thanks for being that quick about reacting. I responded with the make.log in the other thread, but I’ll also post it here for consistency.

make.log (1.6 MB)

fwiw In Gentoo we pass CONFIG_WERROR= and CONFIG_OBJTOOL_WERROR= to makewhen building out-of-tree modules (incl. nvidia) which overrides the kernel’s in the event that it is set (usually not for us, but it can be given users often configure their own kernels). As such it builds fine even with 6.19.x with CONFIG_MITIGATION_RETHUNK=y

WERROR is more interesting for developers or at most distros packagers, not users just wanting their GPU to work with dkms or similar (Edit: I’d personally discourage distros from distributing their kernel with it, maybe test with it at most – note that even without these, some bad warnings that may result in runtime problems will still be fatal).

The millions of warning about the thunk thing are still a big annoyance in the build logs though, and users may want MITIGATION_RETHUNK to be used with nvidia rather than ignored (Edit: so hope NVIDIA can sort this out still), but they’re not fatal this way.

Sadly, although the new 595.58.03 driver lists

Fixed kernel module build issue with Linux kernel v6.19.

the issue re: the building of the DKMS still persists. Find the make.log in the attachment.

make.log (1.6 MB)

The issue got resolved with the 595.71.05 driver release.

Thanks to all people being involved with the solution and the collection of information. :-)