[SOLVED] Randomly: VM: nv_alloc_contig_pages: DMA address not in addressable range...

I recently updated the nvidia driver (x86_64) from an old 331.89 to the new 343.36 / 346.35 on Linux 3.14.30.

The problem is when start X, randomly it may fail because of nvidia driver fail. This does not happens many times, just really few times. There is no need to reboot the machine, just trying to start X again and works. I can logout X, login again and may or not fail, there is no pattern.

Note that with old driver (331.89) I did not experiment any issue on start X, always works all times and without any error during runtime.

NVRM: VM: nv_alloc_contig_pages: DMA address not in addressable range of device 0000:01:00 (0x12190f000, 0x0-0xffffffff)
NVRM: RmInitAdapter failed! (0x24:0x1e:1230)
NVRM: rm_init_adapter failed for device bearing minor number 0
NVRM: nvidia_frontend_open: minor 0, module->open() failed, error -5

nvidia-bug-report.log.gz (78.9 KB)

Happens also here regularly since months on two systems with different driver versions (340.x, 343.x and now 346.35). I don’t know for older versions.
Both systems are running kernel 3.14.x serie from kernel.org, currently 3.14.28.
nvidia-bug-report.log.gz (86.1 KB)

This issue is still valid with current driver 346.59, xorg-server 1.17.1 and Linux 4.0.0.

nvidia-bug-report.log.gz (72.3 KB)

Just to say that this always happens regularly here on the same systems.
xorg-server-1.16.4, nvidia-drivers-349.16, vanilla kernel 3.14.43

Sometimes this may not happen during weeks, and suddenly this will happen ten times in two days.
It is frustrating because this can happen at each session logout or at each system start, and when you manage systems with real end users, you’re not always here to restart xorg when it crash.

The problem is still present in 352.21, running Linux 4.0.0 and Xorg-server 1.17.2

There is one change related to DMA in this release, but nothing to do with this issue.

Indeed, I have enabled this CONFIG_DEBUG_DMA_API option in my config.

GOOD!. The problem is fixed in 352.30 (not mentioned it in the “release highlights”). Looking at kernel module source, you can see that code related to this message is removed, and some other changes related to DMA are made. Nice.

I made many start / stop Xorg cycles, without any issue.

Thanks for the info, will try it ASAP !

Thanks for your feedback.