NVIDIA modules 470.239.06 build failure with gcc-14 due to conftest.sh

I don’t know if a fix already exists, but to fix build failure with 470.239.06, that is built using gcc-14, I created the following 3 patches:

  • This patch is necessary since GCC 14:
Subject: [PATCH 1/3] Fix conftest to ignore implicit-function-declaration and
 strict-prototypes warnings

conftest rely on the fact that a missing prototype should build, but an invalid
call to a function (missing function parameters) the build should fail.
---
 conftest.sh | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/conftest.sh b/conftest.sh
index ea2676e..6e6da83 100755
--- a/conftest.sh
+++ b/conftest.sh
@@ -101,7 +101,8 @@ test_header_presence() {
 build_cflags() {
     BASE_CFLAGS="-O2 -D__KERNEL__ \
 -DKBUILD_BASENAME=\"#conftest$$\" -DKBUILD_MODNAME=\"#conftest$$\" \
--nostdinc -isystem $ISYSTEM"
+-nostdinc -isystem $ISYSTEM \
+-Wno-implicit-function-declaration -Wno-strict-prototypes"
 
     if [ "$OUTPUT" != "$SOURCES" ]; then
         OUTPUT_CFLAGS="-I$OUTPUT/include2 -I$OUTPUT/include"
-- 
2.45.0
  • This patch is unrelated to the new GCC version, but was discovered because of it.
    Since linux commit 8c97023cf0518, the Makefile contains -fshort-wchar, when a test file include include/linux/efi.h the conftest build always fails.
Subject: [PATCH 2/3] Fix conftest to use a short wchar_t

Fix build error about ``const efi_char16_t *v = L"SecureBoot"``
when including include/linux/efi.h
---
 conftest.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/conftest.sh b/conftest.sh
index 6e6da83..678b79c 100755
--- a/conftest.sh
+++ b/conftest.sh
@@ -101,7 +101,7 @@ test_header_presence() {
 build_cflags() {
     BASE_CFLAGS="-O2 -D__KERNEL__ \
 -DKBUILD_BASENAME=\"#conftest$$\" -DKBUILD_MODNAME=\"#conftest$$\" \
--nostdinc -isystem $ISYSTEM \
+-nostdinc -isystem $ISYSTEM -fshort-wchar \
 -Wno-implicit-function-declaration -Wno-strict-prototypes"
 
     if [ "$OUTPUT" != "$SOURCES" ]; then
-- 
2.45.0
Subject: [PATCH 3/3] Fix conftest to use nv_drm_gem_vmap() which has the
 secondary map argument
---
 conftest.sh | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/conftest.sh b/conftest.sh
index 678b79c..606f13d 100755
--- a/conftest.sh
+++ b/conftest.sh
@@ -4593,8 +4593,13 @@ compile_test() {
             #
             CODE="
             #include <drm/drm_gem.h>
+            #if defined(NV_LINUX_IOSYS_MAP_H_PRESENT)
+            typedef struct iosys_map nv_sysio_map_t;
+            #else
+            typedef struct dma_buf_map nv_sysio_map_t;
+            #endif
             int conftest_drm_gem_object_vmap_has_map_arg(
-                    struct drm_gem_object *obj, struct dma_buf_map *map) {
+                    struct drm_gem_object *obj, nv_sysio_map_t *map) {
                 return obj->funcs->vmap(obj, map);
             }"
 
-- 
2.45.0
  • I also think that the NV_DMA_IS_DIRECT_PRESENT conftest is broken, indeed dma_is_direct() that was added in linux 5.0, was removed in linux 5.9.

I have what seems to be the same GCC 14.1.0 issue as yours, @benjarobin_nvidia, but when trying to build 550.54.14:

/tmp/SBo/NVIDIA-Linux-x86_64-550.54.14/kernel/nvidia-drm/nvidia-drm-gem.c:115:16: error: initialization of ‘int (*)(struct drm_gem_object *, struct iosys_map *)’ from incompatible pointer type ‘void * (*)(struct drm_gem_object *)’ [-Wincompatible-pointer-types]
  115 |     .vmap    = nv_drm_gem_prime_vmap,
      |                ^~~~~~~~~~~~~~~~~~~~~
/tmp/SBo/NVIDIA-Linux-x86_64-550.54.14/kernel/nvidia-drm/nvidia-drm-gem.c:115:16: note: (near initialization for ‘nv_drm_gem_funcs.vmap’)
/tmp/SBo/NVIDIA-Linux-x86_64-550.54.14/kernel/nvidia-drm/nvidia-drm-gem.c:116:16: error: initialization of ‘void (*)(struct drm_gem_object *, struct iosys_map *)’ from incompatible pointer type ‘void (*)(struct drm_gem_object *, void *)’ [-Wincompatible-pointer-types]
  116 |     .vunmap  = nv_drm_gem_prime_vunmap,
      |                ^~~~~~~~~~~~~~~~~~~~~~~
/tmp/SBo/NVIDIA-Linux-x86_64-550.54.14/kernel/nvidia-drm/nvidia-drm-gem.c:116:16: note: (near initialization for ‘nv_drm_gem_funcs.vunmap’)
make[3]: *** [scripts/Makefile.build:244: /tmp/SBo/NVIDIA-Linux-x86_64-550.54.14/kernel/nvidia-drm/nvidia-drm-gem.o] Error 1

Are there patches similar to yours that fix this for 550.54.14? Your first patch seems already applied in 550.54.14, and your second and third ones do apply, but don’t fix the problem; I still get:

/tmp/SBo/NVIDIA-Linux-x86_64-550.54.14/kernel/nvidia-uvm/uvm_push.h:401:13: error: storage class specified for parameter ‘uvm_push_inline_data_begin’
  401 | static void uvm_push_inline_data_begin(uvm_push_t *push, uvm_push_inline_data_t *data)
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~~
/tmp/SBo/NVIDIA-Linux-x86_64-550.54.14/kernel/nvidia-uvm/uvm_push.h:402:1: error: expected ‘;’, ‘,’ or ‘)’ before ‘{’ token
  402 | {
      | ^
make[3]: *** [scripts/Makefile.build:244: /tmp/SBo/NVIDIA-Linux-x86_64-550.54.14/kernel/nvidia-uvm/uvm_gpu_semaphore.o] Error 1

and tons of these:

/tmp/SBo/NVIDIA-Linux-x86_64-550.54.14/kernel/nvidia/nv.o: warning: objtool: __x86_indirect_thunk_rax+0x0: indirect jump found in MITIGATION_RETPOLINE build
/tmp/SBo/NVIDIA-Linux-x86_64-550.54.14/kernel/nvidia/nv.o: warning: objtool: __x86_indirect_thunk_rbx+0x0: indirect jump found in MITIGATION_RETPOLINE build
/tmp/SBo/NVIDIA-Linux-x86_64-550.54.14/kernel/nvidia/nv.o: warning: objtool: __x86_indirect_thunk_rcx+0x0: indirect jump found in MITIGATION_RETPOLINE build
/tmp/SBo/NVIDIA-Linux-x86_64-550.54.14/kernel/nvidia/nv.o: warning: objtool: __x86_indirect_thunk_rdx+0x0: indirect jump found in MITIGATION_RETPOLINE build
/tmp/SBo/NVIDIA-Linux-x86_64-550.54.14/kernel/nvidia/nv.o: warning: objtool: __x86_indirect_thunk_rsi+0x0: indirect jump found in MITIGATION_RETPOLINE build
/tmp/SBo/NVIDIA-Linux-x86_64-550.54.14/kernel/nvidia/nv.o: warning: objtool: __x86_indirect_thunk_rdi+0x0: indirect jump found in MITIGATION_RETPOLINE build
/tmp/SBo/NVIDIA-Linux-x86_64-550.54.14/kernel/nvidia/nv.o: warning: objtool: __x86_indirect_thunk_rbp+0x0: indirect jump found in MITIGATION_RETPOLINE build
/tmp/SBo/NVIDIA-Linux-x86_64-550.54.14/kernel/nvidia/nv.o: warning: objtool: __x86_indirect_thunk_r8+0x0: indirect jump found in MITIGATION_RETPOLINE build
/tmp/SBo/NVIDIA-Linux-x86_64-550.54.14/kernel/nvidia/nv.o: warning: objtool: __x86_indirect_thunk_r9+0x0: indirect jump found in MITIGATION_RETPOLINE build
/tmp/SBo/NVIDIA-Linux-x86_64-550.54.14/kernel/nvidia/nv.o: warning: objtool: __x86_indirect_thunk_r10+0x0: indirect jump found in MITIGATION_RETPOLINE build
/tmp/SBo/NVIDIA-Linux-x86_64-550.54.14/kernel/nvidia/nv.o: warning: objtool: __x86_indirect_thunk_r11+0x0: indirect jump found in MITIGATION_RETPOLINE build
/tmp/SBo/NVIDIA-Linux-x86_64-550.54.14/kernel/nvidia/nv.o: warning: objtool: __x86_indirect_thunk_r12+0x0: indirect jump found in MITIGATION_RETPOLINE build
/tmp/SBo/NVIDIA-Linux-x86_64-550.54.14/kernel/nvidia/nv.o: warning: objtool: __x86_indirect_thunk_r13+0x0: indirect jump found in MITIGATION_RETPOLINE build
/tmp/SBo/NVIDIA-Linux-x86_64-550.54.14/kernel/nvidia/nv.o: warning: objtool: __x86_indirect_thunk_r14+0x0: indirect jump found in MITIGATION_RETPOLINE build
/tmp/SBo/NVIDIA-Linux-x86_64-550.54.14/kernel/nvidia/nv.o: warning: objtool: __x86_indirect_thunk_r15+0x0: indirect jump found in MITIGATION_RETPOLINE build

Is there a way to disable / not compile the drm module? I really don’t need it.

It’s solely an nvidia-drm issue. I prepended NV_EXCLUDE_KERNEL_MODULES=nvidia-drm to make, and everything else built successfully!

Hi @benjarobin_nvidia
Fix is already available in 550 and 535 latest releases.