Developing with CUDA on a Lenovo P50 with RHEL

Hello, I have a Lenovo P50 laptop that features a Quadro M1000M and the onboard Intel HD 530 graphics (although it doesn’t say integrated, I’m certain it is as in the bios I can pick between running in hybrid mode or discreet).

I want to continue developing CUDA applications on this newfound machine that I plan to use as my desktop machine as well (so I’ll want my lovely GNOME desktop to work still).

Can I develop with CUDA while maintaining my UI with this device? I’ve tried the following and I can never satisfy these three points at once.

  1. CUDA toolkit: 7 or better
  2. Nvidia driver: trying with 352.xx currently
  3. My desktop must still be available for presentations/eclipse/mail/browsing etc

I’ve read around plenty of times and heard Bumblebee mentioned - I’ve also had fun blacklisting nouveau, modifying the grub, using dracut, playing with my x11.conf…no matter what I’ve tried so far, the best I can do is have the following.

  1. Desktop environment but deviceQuery returns no devices
  2. Desktop environment and nvidia-smi shows devices a few times then never again if I reboot (always unavailable)
  3. nvcc working great, nvidia-smi showing my gpu fine, but no desktop environment (either what looks to be a kernel panic when I try to get to my desktop - totally halting after reading a slice) - deviceQuery works too

But never a combination of all three…

I’m convinced I don’t need my x11.conf but having looked through my var/log/X… files, I do notice the complaint about no screen being available. Any guidance will be much appreciated to save myself and anybody else time either heading down the CUDA on its own path or the way of Bumblebee.

I’m currently “desktopless” with the CUDA toolkit and driver available and I’ve modified the BIOS procedure to use “discreet” GPU mode - which gives me a crash at the login prompt to access my desktop. I’m using the CUDA 7.5 .run file which includes the toolkit, driver, and the samples - and that all installs fine. I’ve seen my fair share of the white “Oh dear” screen too…

Cheers in advance, I have root, time and patience so can provide plenty of files upon request, surely this isn’t a rare use case.

nvidia-smi output shows I’m using the 352.39 driver.

uname -a tells me I’m using the kernel 3.10.0-514.6.1.e17.x86_64

rpm -i cuda-repo-rhel7-7-5-local-7.5-10.x86_64.rpm shows it’s already installed

the .run file installer works fine with no errors

Here’s the last thing I ever see - and I can’t switch to other X sessions - must manually reboot, suggesting a crash behind the scenes.

https://www.dropbox.com/s/7e7vfab86i3usv0/IMG_20170227_134820373.jpg?dl=0

No errors reported in Xorg.0.log either (and it is the latest file in the log directory). I see no issues in my /var/log/boot.log, or with dmesg.

I can make and run the samples fine, deviceQuery mentions the driver and runtime is 7.5, picks up my Quadro M1000M great, I have 512 CUDA cores available.

If I then remove my x11 conf file (apparently not needed) I get the lovely “Oh no! Something has gone wrong” message and I’m forced to reboot. Then if I look in Xorg.0.log, I see we load the nvidia driver fine and that we’ll be loading glx, but then I see, and this is probably the problem:

Failed to initialise GLX extension (Compatible NVIDIA X driver not found).

An x11 conf from nvidia-xconfigure doesn’t lead to any progress - nor if I add various – options.

I don’t think this is an optimus laptop as in the bios I can only switch between hybrid and discrete - I can’t select optimus (it’s not there).

When I switch to discrete graphics, I get to the login screen but no input is recognised. I know the system hasn’t massively crashed because the clock at the top still works and updates.

If I enter recovery mode and do startx, I see the nvidia splash screen very briefly and I check my Xorg.0.log I see ACPI failed o connect to the ACPI event daemon (looks to be nothing to worry about based on searching around). Finally in the same log I see

NVIDIA: Freed GPU:0, then the PCI bus D, then deleting GPU-0.

If I run nvidia-bug-report and look in the log, I do see “Screen(s) found but none have a usable configuration”.

Fixed the keyboard log in as well with a yum install xorg-* - accepting and downloading all packages. Now I’m back to my desktop, device query works and so does nvidia-smi!