Nvidia driver installation

Hello,

We are currently working with an old linux kernel for an embedded system. Precisely, it is 3.9.0-xxx version for a 32-bit ARM.

I have tried installing graphics drivers for Quadro FX 380 on this kernel from the archives starting from 319.X to 352.x . While some drivers build but give problems such as hanging and some don’t, the most promising lead I have at this point is with: 331.113 version.

It builds fine but when I do mod-probe I get the unresolved symbol init_mm(err 0) error. But as a matter of fact, init_mm symbol is present based on probe in /proc/kallsyms. Is there anyway I can fix this?

I am willing to try other things including trying not so newer kernel versions[ owing to our hardware dependencies]. Please advise!

Thank you!

What CPU architecture are you working with? And if known, what is your X11 ABI (ABI should be in Xorg.0.log or similar even if actual graphics load failed).

Hello,

Thanks for the reply! I was able to get past that point by using EXPORT_SYMBOL macro while building the kernel. But the issue is that modprobe causes the device to hang.

We have a 32-bit Dual-core ARM Cortex-A9 [Zynq Z7045 FPGA] on a custom board with PCIe x4 slot and 1 GB RAM. Interface is through terminal and there’s no monitor attached to the GPU itself. Ideally, we like to utilize it with CUDA.

X11 log says ABI class: X.org video drivers version is 15.0, also reports that nvidia_drv.so & Nvidia GLX module load successfully. But when I view lspci after the kernel takes over, PCIe GPU’s BARs are disabled. Not sure why that happens.

Any advice on how to debug with the kernel hanging is greatly appreciated!

Thank you!

I’m at a disadvantage right now, my desktop machine is dead and waiting on parts (which drastically limits my ability to view my older A9 hardware). What I’m thinking though is that ABI versions are closely related to any given nVidia driver. Somewhere in the documentation for each driver would be a listing for supported ABI. The drivers you are using would require compatibility with both CPU architecture and X11 ABI…is there any cross reference on your driver source’s docs for the version you are using as to which ABI the driver works with? Second, just to verify, are you cross-compiling, and if so, which cross-compiler are you using (e.g., linaro EABIHF, so on)? This should validate if ABI, compiler, and source are all in sync.

Thank you for the quick reply!

I checked up on the versions supported by the driver in its documentation and all of them including X.org version are supported.

We use Xilinx’s own arm-xilinx-linux-gnueabi- compiler for building the kernel, while Nvidia driver is compiled on the board itself. Looks like everything is synced up.
Could it be that DDR3 memory[1 GB] is too less for kernel & GPU?

Running out of memory could always lead to strange catastrophic behavior, but undefined symbol init_mm has to be something very different. If there were a failure of init_mm, then you might be able to look at return code or how it failed and figure it out…but outright undefined makes me wonder if something in the kernel config is missing or if the precompiled code doing something such as using different arguments to init_mm (the symbol would be present but not be the correct signature).

Or perhaps the result would change depending on whether your config provides init_mm as a module or integrated in the kernel (it seems like something which probably requires direct compile into the kernel).

Looking here gives a huge list of places for init_mm for a 3.9 kernel:
http://lxr.free-electrons.com/ident?v=3.9;i=init_mm

Basically I’d consider throwing it on kgdboc or a JTAG debugger and finding the exact signature of the init_mm it wants, and then compare it to definitions found in the above URL, paying close attention to whether a kernel configuration can change existence or layout of the init_mm. Despite getting past symbol issues with EXPORT_SYMBOL, I’d hesitate to call this a fix unless I traced down the caller and callee and they match in every detail. If this works, then kgdboc or JTAG debugger should get you closer to finding where it hangs. As a poor man’s solution, you could put a printk in the init_mm and see if anything else calls it successfully prior to hanging, or else at least infer where this function hangs.

Hello,

Thanks very much for the detailed answer! I did see that there are very many places in the kernel source “init_mm” is used and with being new to kernel compilation and the like, I assumed all that was needed was “EXPORT_SYMBOL” to make the module available to inserted drivers. Let me try what you suggested.

But it would be very helpful to understand the nvidia & cuda driver architecture in general. So we have a hardware platform similar to that of CARMA kit with no display attached, how do these modules & drivers(nvidia_drv.so, nvidia.ko) interact ? Will Linux-32-bit ARM drivers compiled for integrated graphics work for PCIe based GPUs? How was CARMA kit set-up ?
Also, what settings would be necessary to utilize a GPU for CUDA only and not for video.Is there a way I can reduce the memory requested by the driver for the nvidia-card during boot up process[u-boot]? I hope you can shed some light on these.

Thank you!

I’m not familiar with CARMA.

Normally you would not attempt to install CUDA until the nVidia version of video driver is installed. CUDA cannot succeed without this.

The trouble here is that I’m not sure how video driver install would work if there is no monitor attached. In theory, in order to do CUDA, you don’t need a monitor…but the install of the video driver might get confused without this. Is there any way to install a monitor, at least temporarily?

The drivers won’t care if the video is integrated or PCI or PCIe. However, this will change configuration, so changing the layout could invalidate config. If the 32-bit ARM drivers work with that video card, it shouldn’t care if the card is on one 32-bit ARM system or another…but again, configuration might change. Also, the ABI of the X server must match…the nVidia drivers are tied to this, and this is a moving target over time.

I’m not sure about memory use, I’d suggest if you can get this working with video connected, and then remove the video or configure it as disabled after CUDA is installed. There are probably some kernel arguments which can be used related to this, but those arguments might change depending on video driver…thus installing video driver first would be very helpful.

So once again, do you have the possibility of adding a monitor via something other than 15-pin VGA? (VGA does not have a DDC/EDID channel)