NVIDIA modules build failure with upcoming gcc-14 and recent kernels due to misfiring conftest.sh test (heads up)

Unsure if already aware, but was investigating a build failure with 545.29.06 and the upcoming gcc-14 which enables -Werror=incompatible-pointer-types by default (regardless of CFLAGS), failing with:

kernel/nvidia-drm/nvidia-drm-gem.c:115:16: error: initialization of 'int (*)(struct drm_gem_object *, struct iosys_map *)' from incompatible pointer type 'void * (*)(struct drm_gem_object *)' [-Wincompatible-pointer-types]
  115 |     .vmap    = nv_drm_gem_prime_vmap,
      |                ^~~~~~~~~~~~~~~~~~~~~

And upon closer look the issue is not there, but rather in conftest.sh’s:

            CODE="
            #include <drm/drm_gem.h>
            int conftest_drm_gem_object_vmap_has_map_arg(
                    struct drm_gem_object *obj, struct dma_buf_map *map) {
                return obj->funcs->vmap(obj, map);
            }"

            compile_check_conftest "$CODE" "NV_DRM_GEM_OBJECT_VMAP_HAS_MAP_ARG" "" "types"

…resulting in (if remove the /dev/null):

conftest4394.c: In function 'conftest_drm_gem_object_vmap_has_map_arg':
conftest4394.c:24:46: error: passing argument 2 of 'obj->funcs->vmap' from incompatible pointer type [-Wincompatible-pointer-types]
   24 |                 return obj->funcs->vmap(obj, map);
      |                                              ^~~
      |                                              |
      |                                              struct dma_buf_map *
conftest4394.c:24:46: note: expected 'struct iosys_map *' but argument is of type 'struct dma_buf_map *'

…and in kernel/nvidia-drm/nvidia-drm-gem.c there’s a comment which explains what’s happening:

/*
 * The 'dma_buf_map' structure is renamed to 'iosys_map' by the commit
 * 7938f4218168 ("dma-buf-map: Rename to iosys-map").
 */
#if defined(NV_LINUX_IOSYS_MAP_H_PRESENT)
typedef struct iosys_map nv_sysio_map_t;
#else
typedef struct dma_buf_map nv_sysio_map_t;
#endif

conftest.sh does not know about this rename. I’ve built this against the 6.6.11 linux kernel but should reproducible with any >5.18 kernel that has this rename.

Still some time before gcc-14 releases and spreads to distros, but would be good to get this fixed before it becomes a problem. This affects other driver branches too (at least 535.146.02, have not tried older ones).

There could potentially be other tests that need reviewing with gcc-14 (considering the >/dev/null 2>&1 silencing), albeit it’s the only one that resulted in a straight up build failure for me.

As a quickfix, I’ve done this locally (fortunately seems to know about this define during the test):

--- a/kernel/conftest.sh
+++ b/kernel/conftest.sh
@@ -5071,6 +5071,11 @@
             CODE="
             #include <drm/drm_gem.h>
+            #if defined(NV_LINUX_IOSYS_MAP_H_PRESENT)
+            typedef struct iosys_map nv_sysio_map_t;
+            #else
+            typedef struct dma_buf_map nv_sysio_map_t;
+            #endif
             int conftest_drm_gem_object_vmap_has_map_arg(
-                    struct drm_gem_object *obj, struct dma_buf_map *map) {
+                    struct drm_gem_object *obj, nv_sysio_map_t *map) {
                 return obj->funcs->vmap(obj, map);
             }"

Well, I tried others so I can fix them in Gentoo’s packaging. 525.147.05, 535.43.22(vulkan), 535.146.02, 545.29.06 are fine with gcc14 if patched with the above (open source variant is fine if patched too).

Have not dug into that one but 470.223.02 seems to be in a more sorry state (gcc14 also defaults to -Werror=implicit-function-declarations). Tried 390.157 too (obviously broken) but that one is unsupported anyway. So for the legacy branches I just settled for lame -Wno-error=... (both incompatible and implicit) for now given not caring too much there.

this affects the 550 branch also, but the fix above does not help

Are you sure the patch was properly applied? 550.40.07 still fails for me without the patch, but the patch still sorts it out.

Note that if using the open source variant you need to patch kernel-open/conftest.sh instead.

If not that, is it really the same error? Possible there’s different issues with gcc14 I just haven’t found yet due to my kernel configuration.

Oh right, by “fix above” did you mean the patch or passing -Wno-error=incompatible-pointer-types?. The latter won’t work if you just pass it in CFLAGS (conftest won’t pick that up from there last I checked).

Both things I tried. Patch applies, but C is not my strong suit (I started on Rust back when v1.05 came out). It has the same error.

Thanks for the report and for the diagnosis. I’ve filed NVIDIA internal bug 4478534 for this, and hopefully we can resolve this soon. Thanks again.

The above patch works for me on Fedora Rawhide with kernel 6.8.0-0.rc6.20240229git805d849d7c3c.51.fc41.x86_64 and 550.54.14. For some reason wayland doesn’t work but xorg is running great! Im using the Open Kernel Modules and Nvidia Installer files.

For the record, this is fixed in 550.67 – thanks.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.