Patches for 340.108 and 5.6-rc - need help with driver init!

I realise this driver is now EOL so there’s likely to be very little interest, but I just wondered if anyone has had any luck getting 340.108 to work with kernel 5.6-rc1.

I’ve got it building having fixed the various issues thrown up by 5.6-rc1[1], but the drm_legacy_pci_init/drm_legacy_pci_exit functions are now hidden in 5.6-rc1 and a naiive replacement with pci_register_device/pci_unregister_device results in the following failure on a Revo3700 (ION2):

[   16.321446] nvidia: loading out-of-tree module taints kernel.
[   16.321484] nvidia: module license 'NVIDIA' taints kernel.
[   16.321487] Disabling lock debugging due to kernel taint
[   16.374053] nvidia 0000:01:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=io+mem
[   16.375356] Error: Driver 'nvidia' is already registered, aborting...
[   16.375362] NVRM: DRM init failed

Full dmesg: http://ix.io/2bj

If I revert the kernel commit[2] and make the drm_legacy_pci_init/exit functions visible then the modified 340.108 driver loads, and is working perfectly. However this isn’t really a long-term solution, so I need a helping hand with the driver initialisation when using pci_register_driver/pci_unregister_driver…

In the same PR[1] there are 5.6-rc1 patches for 440.59 which is building, but I have no hardware to test the modified driver (see “xf86-video-nvidia: fix 5.6-rc1 build”) - mentioning it here in case it helps anyone else.

  1. https://github.com/LibreELEC/LibreELEC.tv/pull/4199 (commit “xf86-video-nvidia-legacy: fix 5.6-rc1 build”)
  2. https://github.com/torvalds/linux/commit/1be9d5f069964108125592af92304da76c5865bf#diff-01a0fbd8037627d5d55b23bae3faca39

If those two functions are the only ones missing, then the official way seems to be to build the kernel with DRM_LEGACY enabled – if I’m seeing the code right. Another, highly hack-y, option that could maybe work, would be to simply replicate the code from the kernel into the NVIDIA driver. The functions actually used in drm_legacy_pci_{init,exit} seem to be exported, still, even without DRM_LEGACY present.

#if defined(NV_DRM_LEGACY_PCI_INIT_PRESENT)
#define nv_drm_pci_init drm_legacy_pci_init
#define nv_drm_pci_exit drm_legacy_pci_exit
#elif LINUX_VERSION_CODE >= KERNEL_VERSION(5, 6, 0)
int nv_drm_pci_init(struct drm_driver *driver, struct pci_driver *pdriver)
{IMPLEMENT ME}

void nv_drm_pci_exit(struct drm_driver *driver, struct pci_driver *pdriver)
{IMPLEMENT ME}
#else
#define nv_drm_pci_init drm_pci_init
#define nv_drm_pci_exit drm_pci_exit
#endif

Though, I really doubt it’d be that easy, given that there’s a fair share of bits that relies on IS_ENABLED(CONFIG_DRM_LEGACY) in the kernel code.

Thanks, yes - I did try building with CONFIG_DRM_LEGACY[1] which then resulted in a kernel build failure[2], which may be temporary at this stage in the rc cycle.

Assuming it’s possible to enable CONFIG_DRM_LEGACY then it should be possible to patch in the references to drm_legacy.h in the 340.108 driver (conftest, nv-drm.h etc.).

However as a solution this is only going to work for those users that are self-building the kernel, unless most distributions build the kernel with CONFIG_DRM_LEGACY=y?

  1. http://ix.io/2bli
  2. http://ix.io/2blh

I noticed there’s an error in my build log, near the top as it processes the firmwares we build into the kernel:

Firmware blobs root directory (EXTRA_FIRMWARE_DIR) [/lib/firmware] (NEW) 
Error in reading or end of file.

  Enable the firmware sysfs fallback mechanism (FW_LOADER_USER_HELPER) [N/y/?] n
  Enable compressed firmware support (FW_LOADER_COMPRESS) [N/y/?] n
  Enable firmware caching during suspend (FW_CACHE) [Y/n/?] y
#

This could explain the inability to find the /lib/firmware/amdgpu/navi10_ta.bin firmware which causes the build to fail (this firmware is in /external-firmware/amdgpu/navi10_ta.bin, as per our config).

I suspect there’s a few bugs in 5.6-rc1 (no surprise!), possibly related to the new CONFIG_BOOT_CONFIG[1] functionality (which is currently enabled by default - will try building with it disabled).

  1. https://cateee.net/lkddb/web-lkddb/BOOT_CONFIG.html

Yeah, it’s no wonder an -rc1 has got the odd bug in it. My local build went fine, although I’ve not actually booted with it yet.

As for configs, Arch, at least, doesn’t have CONFIG_DRM_LEGACY enabled, no; and at this point, most others probably don’t either, it being deprecated and marked as “DANGEROUS” and all – although, I’ve not exactly checked.

Also, sorry if you’ve tried it already, but I took at stab at trying to compile the driver with the following hack:

diff --git a/kernel/nv-drm.c b/kernel/nv-drm.c
index 0d1cdbf..2e4b867 100644
--- a/kernel/nv-drm.c
+++ b/kernel/nv-drm.c
@@ -50,6 +50,60 @@
 #if defined(NV_DRM_LEGACY_PCI_INIT_PRESENT)
 #define nv_drm_pci_init drm_legacy_pci_init
 #define nv_drm_pci_exit drm_legacy_pci_exit
+#elif LINUX_VERSION_CODE >= KERNEL_VERSION(5, 6, 0)
+int nv_drm_pci_init(struct drm_driver *driver, struct pci_driver *pdriver)
+{
+	struct pci_dev *pdev = NULL;
+	const struct pci_device_id *pid;
+	int i;
+
+	DRM_DEBUG("\n");
+
+	if (WARN_ON(!(driver->driver_features & DRIVER_LEGACY)))
+		return -EINVAL;
+
+	/* If not using KMS, fall back to stealth mode manual scanning. */
+	INIT_LIST_HEAD(&driver->legacy_dev_list);
+	for (i = 0; pdriver->id_table[i].vendor != 0; i++) {
+		pid = &pdriver->id_table[i];
+
+		/* Loop around setting up a DRM device for each PCI device
+		 * matching our ID and device class.  If we had the internal
+		 * function that pci_get_subsys and pci_get_class used, we'd
+		 * be able to just pass pid in instead of doing a two-stage
+		 * thing.
+		 */
+		pdev = NULL;
+		while ((pdev =
+			pci_get_subsys(pid->vendor, pid->device, pid->subvendor,
+				       pid->subdevice, pdev)) != NULL) {
+			if ((pdev->class & pid->class_mask) != pid->class)
+				continue;
+
+			/* stealth mode requires a manual probe */
+			pci_dev_get(pdev);
+			drm_get_pci_dev(pdev, pid, driver);
+		}
+	}
+	return 0;
+}
+
+void nv_drm_pci_exit(struct drm_driver *driver, struct pci_driver *pdriver)
+{
+	struct drm_device *dev, *tmp;
+	DRM_DEBUG("\n");
+
+	if (!(driver->driver_features & DRIVER_LEGACY)) {
+		WARN_ON(1);
+	} else {
+		list_for_each_entry_safe(dev, tmp, &driver->legacy_dev_list,
+					 legacy_dev_list) {
+			list_del(&dev->legacy_dev_list);
+			drm_put_dev(dev);
+		}
+	}
+	DRM_INFO("Module unloaded\n");
+}
 #else
 #define nv_drm_pci_init drm_pci_init
 #define nv_drm_pci_exit drm_pci_exit

It’s code taken directly from the drm_legacy_pci_{init,exit} kernel functions. And it does compile, but I can’t really load it to test.

Thanks for your patch - I hadn’t got around to that, it is a bit ugly but if it works I can live with that. :)

I’ll try your patch in the next hour or 2 and let you know how it goes.

@Isaak.Aleksandrov your 340.108 drm legacy patch is working perfectly! It means the driver is not dependent on the kernel config, which is great. I’ll add it to our PR, many thanks! Nice to keep the old hardware going a bit longer…

I’ll also pinch your 440.59 patch from gitlab if you don’t mind - it looks much more comprehensive than my attempt!

That good, that it worked – even if it’s, as you said, a bit on the fugly side. There’s also the issue of it potentially going out of sync with the kernel code as that gets updated, but yeah… There could very well’ve been a better way of solving it, but at least it worked for now.

And sure, go ahead a use the 440.59 bits. Glad it could be of any use.

I’ve determined what the problem is with 5.6-rc1 and our config, and it’s not a 5.6-rc1 issue.

A colleague had added a new PR which introduced some but not all config options to support the change, and since we run “make oldconfig” during our build the kernel began prompting the user with the missing options despite stdin not being available (running in a batch, hence the message “Error in reading or end of file.” as this is relating to stdin).

So, entirely our fault and nothing to do with 5.6-rc1! :)

It’s good that you were able to track the issue down.

Just wanted to add, about the hack I posted, I seem to have made a copy&paste mistake. There’s a “DRM_INFO(“Module unloaded\n”);” line missing [1] in the nv_drm_pci_exit function. Not exactly something that’ll impact functionality, but figured I’d point it out.

It should be fine otherwise, but do take a look, just in case. It bugged me a bit, so I went ahead and updated my previous comment.

  1. https://github.com/torvalds/linux/blob/v5.6-rc1/drivers/gpu/drm/drm_pci.c#L356

Thanks, I’ve updated the patch in my PR.

I’ve also had someone test your 440.59 patch with 5.6-rc1 and that appears to be working fine too.

Additional patches [1,2] for 340.108 and 5.7-rc1 (on top of existing 5.6.y patch, [3]).

  1. https://github.com/LibreELEC/LibreELEC.tv/blob/eea52f8af0e4f1dec6f53f8671a397d9a8ae29e6/packages/x11/driver/xf86-video-nvidia-legacy/patches/xf86-video-nvidia-legacy-0002-fix-5.7-rc1.patch
  2. https://github.com/LibreELEC/LibreELEC.tv/blob/eea52f8af0e4f1dec6f53f8671a397d9a8ae29e6/packages/x11/driver/xf86-video-nvidia-legacy/patches/xf86-video-nvidia-legacy-0003-fix-5.7-rc1-reinstate-legacy-support.patch
  3. https://github.com/LibreELEC/LibreELEC.tv/blob/master/packages/x11/driver/xf86-video-nvidia-legacy/patches/xf86-video-nvidia-legacy-0001-fix-5.6-rc1.patch
2 Likes

Thank you very much for these 3 patches milhouse, they alone work perfectly for Linux Kernel 5.7!