Problem with resume from suspend (Ubuntu 16.04, GT 940MX)

Sure how does one do this?

Run preinstalled /usr/bin/nvidia-bug-report.sh script. The nvidia-bug-report.sh will collect information about your system and create the file ‘nvidia-bug-report.log.gz’ in the current
directory.

Running on nouveau, do a suspend/resume cycle

sh -c "./NVIDIA-Linux-x86_64-396.45.run -x"

in the created directory NVIDIA-Linux-x86_64-396.45 you will find the nvidia-bug-report.sh script.

  • Post the output of
xrandr --listproviders
  • Enable PRIME offload:
xrandr --setprovideroffloadsink nouveau Intel
  • Post the output of
DRI_PRIME=1 glxinfo | grep "OpenGL vendor string"
  • in the previously extracted directory, run
sudo ./nvidia-bug-report.sh
  • Attach the created nvidia-bug-report.log.gz to your post
    Hovering the mouse over an existing post will reveal a paperclip icon.

Running on nvidia, install and start an ssh server (Ubuntu: sudo apt install openssh-server)

  • suspend/resume
  • ssh into the now frozen box from another system
  • remove the previously created nvidia-bug-report.log.gz
sudo nvidia-bug-report.sh
  • Attach the newly created nvidia-bug-report.log.gz to your post.

Only got one computer so difficult to get any info.

Dear Sandip,
With all due respect I agree with tjknigge that this really has gone on too long (over a year just on this forum). Nvidia is a major player in the market, and so is Asus. I would have thought by now that you would have used some of your support budget to get one of the systems listed above (a few hundred dollars max) and do your own tests to discover the problem. We don’t mind helping out, but really 
 How long is this going to go on. It is a major waste of power to leave our laptops on 24/7 and also can be a security issue with systems left running due to not being able to be in sleep mode (having to close down and save all windows for someone like me at the end of each day is not really practical). We are really counting on you to go to bat for us with those people higher up and get the resources from your testing teams to work on this properly internally. This is really making wonder weather I want to buy nvidia again when I purchase my next laptop from asus.

Thanks for any help you can give us.
Sincerely,
Jeff
IT Tech and Web developer
acpidump.txt (545 KB)

jeff4ze58, can you please run
sudo acpidump >acpidump.txt
and attach that?

No, but given the fact that this issue is manifesting itself in the same way in many different HW/SW configurations as reported by dozens of users I find it rather likely that it does.

I have read carefully through all 6 pages again and it seems to me that, despite the concise description, TS is facing exactly the same issue as I am. So I beg to differ, there is absolutely a connection. It is imho plain silly to focus on a specific kind of (ASUS laptop) hardware just because this is what TS has and also some other people that have reported. For what it’s worth I do not believe at all that this issue is HW or BIOS-related. I think it is a software issue and I put my money on a race condition in the driver. Which would mean that nothing usefull is going to be found in the error logs that are being requested here over and over again. Unfortunately these kinds of errors are very hard to find and correct and require the involvement of core driver developers. User dsd_endless already nudged you in the right direction by posting a stack trace of the X server. Please just focus on reproducing this thing internally on whatever hardware and hit the debugging tools. It is my firm belief that you can make many people happy at once with one bug fix and help the environment while doing so.
Best regards,
Tim

Hi all, Internally we reproduced this issue with ASUS X555UQ + Fedora 27 + nvidia prime. Resume the system. Blank screen observed. We are still in sync with ASUS to take the issue to respective SBIOS folks. I think you should also report this issue to ASUS to get the solution.

Performed the following experiments for more clarity.
A.

  1. Boot to console mode.
  2. Uninstall nvidia driver using “nvidia-uninstall”.
  3. Check ‘lsmod’ to confirm that nvidia modules are not present.
  4. Reboot the system to console mode again.
  5. Then perform pm-suspend and resume
  6. Now, install Nvidia driver.
  7. Run nvidia-smi and it fails with below message.
    “No devices were found”.

B.
Start experiment A and after step #4, do the below:
“echo 1 > /sys/bus/pci/devices/0000:00:1c.0/rescan” to rescan the port to which GPU is connected.

Now, perform step 5 and 6.
This time ‘nvidia-smi’ runs successfully.
So this proves that there is something wrong with PCH root port.

Just to eliminate Nvidia from picture, we performed some experiments with Nouveua as well.

C.
We blacklisted NVIDIA driver and the GPU was driven by the nouveau driver.
References : Optimus and PRIME - ArchWiki
Steps followed :-

  1. Boot to console mode after uninsatlling the nvidia driver.
  2. Configure xorg.config file for nouveau.
  3. Start X
  4. Configure glxgears to run using nouveau driver. ( ‘lsmod’ will show the usage count of nouveau >1, above listed reference links will help)
  5. Perform suspend/resume
  6. Again run glxgears, it fails.

These experiments support the fact that PCH is not configured properly and causing problem even to glxgears using nouveau.

Jeff

Generix,acpidump duly attached in my post above. Hope that helps.
All the best,

jeff4ze58, thank you.
Playing odd one out here, I don’t think you’re hit by the same issue being examined in this thread. All other systems share the same acpi method HGON to turn on the dGPU on resume, your acpi tables are completely different and relying on Windows 10 infrastructure. Please check if a 4.17 kernel fixes your resume issue.

HI Generix,
Thanks for info, but windows 10 has never, every run on this machine. Can you explain what you mean by a “windows 10 infrastructure” as this is really surprising to me and confusing. I could try to upgrade my kernel but am not experienced at upgrading kernels so I will consider maybe doing this after I can be sure it’s not going to brick my machine .

Thanks for info nonetheless,
Jeff

jeff4ze58,
Simply put, acpi is an interface that comes with the system bios offering functions which let the OS control the hardware like turning off/on the gpu. This is mostly designed as Windows(10,8,7,Vista
) wants it. acpidump extracts those functions.

[url]https://devtalk.nvidia.com/default/topic/1036501/linux/why-nvidia-driver-messes-up-linux-distro-boot-process-/post/5266035/#5266035[/url]

Hello,

I got the same problem here, that is, after a resume the graphic card doesn’t work, although you can access by ssh the laptop after a resume. Before installing the nvidia drivers it worked fine.

Also, after installing the nvidia drivers the screen brightness buttons (fn-F6, fn-F7) don’t work.

Bumblebee seems to works fine.

The laptop is a toshiba satellite pro with a GM108M [GeForce 930M] (rev a2).

The OS is opensuse leap 15, with default kernel 4.12.14 and also has been tested with 4.17.12
The nvidia drivers tested have been 390.77 and also with 396.51, currently installed.

In dmesg appears the message:

[ 467.190349] ACPI Warning: _SB.PCI0.RP09.PXSX._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20180313/nsarguments-66)

I have attached two nvidia-bug-report.log.gz files, one obtained after a resume, without a screen, and the other after a reboot (nvidia-bug-report_before_resume.log.gz) before doing a resume, while I was writting this message, when everything is still fine.

The most relevant info, IMHO:


xset -q:

xset: unable to open display “”


nvidia-settings -q all:

Unable to init server: No se pudo conectar: ConexiĂłn rehusada
No protocol specified
No protocol specified

ERROR: Unable to find display on any available system

No protocol specified
No protocol specified

ERROR: Unable to find display on any available system


xrandr --verbose:

No protocol specified
No protocol specified
Can’t open display :0


Running window manager properties:

Unable to detect window manager properties


So it seems the display has disappeared.

This seems a quite common problem in hybrid systems, obscured because almost everyone uses only the integrated graphics in linux.

Any ideas ?

Thanks,

Inteltank

nvidia-bug-report.log.gz (64.5 KB)
nvidia-bug-report_before_resume.log.gz (86.3 KB)

rlarrosa, your notebook is incorrecty set up. You’re not even using the nvidia driver, you’re running on fbdev. Please remove the ‘nomodeset’ kernel parameter and follow these hints to properly configure PRIME:
[url]https://devtalk.nvidia.com/default/topic/1022670/linux/official-driver-384-59-with-geforce-1050m-doesn-t-work-on-opensuse-tumbleweed-kde/post/5203910/#5203910[/url]
If you need further help, please open a new thread.

My correction : actually I found out that the system did come initially with windows 8 (as you presumed Generix from the acpidump) but then ubuntu 14.04 was immediately installed instead which has run on it from the beginning until now. My apologies.
Jeff

Thanks, I have followed those instructions, but the screen is off, with no graphic output.

It seems to me the problem is that I only got one vga controller, so I cannot use that configuration, as seen with:

lspci |egrep -i “3d|vga”

00:02.0 VGA compatible controller: Intel Corporation Skylake GT2 [HD Graphics 520] (rev 07)
02:00.0 3D controller: NVIDIA Corporation GM108M [GeForce 930M] (rev a2)

I have put the busid and everything as stated in your link, I have attached the nvidia bug when the screen was black, althought I have tested more xorg.conf configurations to try to make it work.

That is why I installed bumblebee, and later the only problem was with the acpi interface, and that is why I thought the problem was the same reported in this thread.

Thanks
nvidia-bug-report_blank_screen.log.gz (93.4 KB)

The xorg.conf you used is fine, everything is coming up allright, you’re just missing the last step of adding the two xrandr commands which enable output to your login manager.
Please open a new thread if you need further help, don’t hijack this one.

We have investigated this issue and found this is System BIOS bug.

can you inform the vendor please .

Here is a suggestion, how about you inform the vendor (asus) about a flaw in in your graphics card? Asus have a notorious disregard for their linux customers and will obviously ignore any customer request.
I am happy with the way my laptop performs using the intel driver and will probably never buy an nvidia product again.