GeForce GT330M + nvidia 331.38 kernel oops and black screen (linux x68_64)

Hi,

I have been struggling with this issue for the last 2 nites :

I own a Macbook Pro embedding a GT330M nvidia graphics card, on which I installed a dual boot MacOSX/Linux Debian. Everything worked flawlessly till I had to upgrade my MacOSX to Maverick …

Upgrading MacOS X forced me to upgrade my kernel to a more recent version (more than 3.3) or else the new EFI firmware would cause the kernel to stall at bootup. I did that, and then obviously, I had to re-install the nvidia driver … which I thought would be as straightforward as it has been for me the last 8 years … (on other systems as well).

I downloaded and installed the 331.38 x86_64 version, and I ran into a few issues :

  • First I was unable to build with CUDA support (not a big deal for me, so I simply deactivated it),
  • Second after installation, and Xorg restart, I get a Kernel Oops, and a black screen (my machine still respond to the network, so luckily I could debug it from another machine via ssh).

I tried several kernel versions :

  • 3.4.77 : after installation, the graphics card is not detected by the nvidia driver => no luck
  • 3.12.1 : kernel oops, black screen, can’t make it work
  • 3.12.8 : kernel oops, black screen, can’t make it work either

nouveau is blacklisted off course, (that was the first thing I checked).

Here is the stack trace printed in my message log

[  626.564388] Oops: 0000 [#1] SMP
[  626.564391] Modules linked in: nvidia(PO) pci_stub vboxpci(O) vboxnetadp(O) vboxnetflt(O) vboxdrv(O) xt_multiport iptable_filter ip_tables x_tables nfnetlink_log nfnetlink parport_pc ppdev lp parport xfrm_user xfrm4_tunnel tunnel4 ipcomp xfrm_ipcomp esp4 ah4 binfmt_misc deflate ctr twofish_generic twofish_x86_64_3way twofish_x86_64 twofish_common camellia_generic camellia_x86_64 serpent_sse2_x86_64 serpent_generic xts blowfish_generic blowfish_x86_64 blowfish_common cast5_generic cast_common des_generic cbc cmac xcbc rmd160 sha512_generic sha256_generic hmac crypto_null af_key xfrm_algo fuse nfsd auth_rpcgss oid_registry nfs_acl nfs lockd fscache sunrpc clip atm nls_utf8 nls_cp437 vfat fat dm_crypt loop firewire_sbp2 snd_hda_codec_hdmi arc4 joydev brcmsmac uvcvideo cordic videobuf2_vmalloc efi_pstore brcmutil videobuf2_memops videobuf2_core b43 btusb bcm5974 videodev bluetooth media mac80211 apple_gmux cfg80211 ssb mmc_core snd_hda_codec_cirrus rfkill rng_core pcmcia pcmcia_core
kernel: iTCO_wdt intel_powerclamp coretemp iTCO_vendor_support evdev applesmc input_polldev kvm_intel kvm pcspkr i915 snd_hda_intel efivars snd_hda_codec snd_hwdep i2c_i801 snd_pcm snd_page_alloc drm_kms_helper snd_seq snd_seq_device snd_timer intel_ips drm snd lpc_ich mfd_core video i2c_algo_bit i2c_core bcma soundcore apple_bl battery acpi_cpufreq button ac processor thermal_sys ext4 crc16 mbcache jbd2 hid_generic hid_appleir md_mod dm_mirror dm_region_hash dm_log dm_mod usb_storage hid_apple usbhid hid sg sd_mod crct10dif_generic sr_mod crc_t10dif cdrom crct10dif_common crc32_pclmul crc32c_intel ghash_clmulni_intel ahci libahci libata scsi_mod aesni_intel ehci_pci firewire_ohci aes_x86_64 ablk_helper cryptd lrw gf128mul glue_helper uhci_hcd ehci_hcd tg3 ptp pps_core libphy firewire_core crc_itu_t usbcore usb_common [last unloaded: nvidia]
[  626.564502] CPU: 1 PID: 19624 Comm: Xorg Tainted: P           O 3.12-1-amd64 #1 Debian 3.12.6-2
[  626.564505] Hardware name: Apple Inc. MacBookPro6,2/Mac-F22586C8, BIOS    MBP61.88Z.0057.B0C.1007261552 07/26/10
[  626.564507] task: ffff880262f66840 ti: ffff880263754000 task.ti: ffff880263754000
[  626.564509] RIP: 0010:[<ffffffffa177139d>]  [<ffffffffa177139d>] _nv006366rm+0x54/0xd3 [nvidia]
[  626.564648] RSP: 0018:ffff880263755a38  EFLAGS: 00010282
[  626.564650] RAX: 0000000000000000 RBX: ffff8801f14d4008 RCX: 0000000000000000
[  626.564651] RDX: 0000000000000000 RSI: 0000000000000028 RDI: 0000000000000000
[  626.564653] RBP: ffff880263777098 R08: 0000000000000002 R09: ffff88026510a5c8
[  626.564654] R10: ffff880036138030 R11: ffffffffffffffdc R12: 0000000000000000
[  626.564655] R13: 0000000000000000 R14: ffff8801f14d8008 R15: ffff880265cda008
[  626.564657] FS:  00007fe346b35880(0000) GS:ffff88026fc40000(0000) knlGS:0000000000000000
[  626.564659] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  626.564660] CR2: 0000000000000280 CR3: 0000000036832000 CR4: 00000000000007e0
[  626.564662] Stack:
[  626.564664]  ffff8801f0f16008 ffff8801f0f16008 0000000000000002 ffff8801f14d4008
[  626.564667]  ffff8801f14d8008 ffffffffa1756c85 ffff8801f14d4080 ffff8801f14d8008
[  626.564669]  ffff8801f14d4008 ffff880088abe008 ffff880263784008 ffffffffa17bb48e
[  626.564672] Call Trace:
[  626.564811]  [<ffffffffa1756c85>] ? _nv006908rm+0x7ec/0x903 [nvidia]
[  626.564955]  [<ffffffffa17bb48e>] ? _nv006904rm+0x15a/0x316 [nvidia]
[  626.565099]  [<ffffffffa17be142>] ? _nv006118rm+0x461/0x5dd [nvidia]
[  626.565246]  [<ffffffffa182b753>] ? _nv008671rm+0x4340/0x64a7 [nvidia]
[  626.565299]  [<ffffffffa14fa074>] ? _nv009042rm+0x25/0x4f [nvidia]
[  626.565351]  [<ffffffffa1a2619b>] ? _nv013363rm+0xa80/0xc03 [nvidia]
[  626.565402]  [<ffffffffa1a26f8e>] ? _nv000809rm+0x3e5/0x626 [nvidia]
[  626.565453]  [<ffffffffa1a1fd32>] ? rm_init_adapter+0x73/0xf6 [nvidia]
[  626.565501]  [<ffffffffa1a409c1>] ? nvidia_open+0x1f1/0x920 [nvidia]
[  626.565506]  [<ffffffff813486fa>] ? kobj_lookup+0x10a/0x170

This is the first time in a very very long while I have no idea of how to make it work … I googled almost any keyword I could think of, and read this forum as well, but I could not quite get anything close to what I am experiencing today.

I had to revert temporary to the nouveau driver, since I have no solution at the moment, and this is quite a big problem for me, as this part of my job involves 3D work … For now, it will have to wait for next week, but still, I have absolutely no lead here, if anybody could give me a pointer or a workaround to make it work, I would be more than happy.
nvidia-bug-report.log.gz (51.4 KB)

I continued my test, to find a working scenario, but in vain so far :

  • I tried to install kernel 3.9.11 (latest version before 3.10 that seems to give trouble to nvidia), but same result (Oops+blackscreen)
  • I tried to apply the CK patch on linux 3.12.8 but same result
  • I tried to install an older version of nvidia kernel (304.88) which is the same that used to work, but same result (Oops + blackscreen).
  • I tried to add the kernel parameter suggested in another thread rcutree.rcu_idle_gp_delay=1 but then the kernel stall and does not complete the boot (it stalls at the mod probe step, and all mod probes get timeout killed).

Interestingly though, when I built the last version of nvidia driver under linux 3.9.11 the unified memory driver did build correctly, and the nvidia 325 version did build too, both giving the same Oops + black screen scenario.

I am out of lead now, and I am completely stuck. Even the old version that I used (304.88, as shipped in debian packages), crash with new kernels than 3.2 it seems. Now, I can not boot anything older than 3.3 because of the EFI firmware.

One thing I did not notice, (is it really related?), I am using Refind to boot my system, and it boots it in graphical mode (1024x768 it seems), I really don’t think it has anything to do with it but I thought I’d mention it.

At this point, I am left with no option, and I have to stick with the “nouveau” driver, which is slow and unstable (well at least it does not crash at boot up, which makes it less unstable than the official driver for me so far).

Any help would be greatly appreciated, I am completely stuck here …

Here some more information, the fact that linux 3.2.0 does not boot properly on my Macbook seems to be somewhat related to the two graphics card connected in the laptop.

There seems to be a i915 graphics card connected together, and I need to disable it using the “i915.modeset=0” kernel parameter. With this I can boot the kernel just fine, but if I try to start Xorg with the nvidia driver, it fails with the “no screen found” message, claiming that no NVIDIA board is detected. (it shows up in the lspci though).

What I understand about this problem is that for some reasons, if I disable the i915 driver, it disables the muxer as well, and the nvidia board can not be activated anymore.

My pb is I don’t care about the Hybrid graphics support (although I may look at it in the future), all I want is to use my nvidia board correctly on that laptop. I tried a few tweaks, but so far I have not been able to make use of the nvidia driver (and again the nouveau driver is just bogus, for instance thunderbird crashes the whole system every time I start it, so this solution can just not work :/ )

I tried booting with the “noefi” boot parameter, but it just breaks the same way …

Can somebody give me a clue on that? I am getting quite desperate here :(

Could somebody just point me a working configuration (any configuration ?!!).

Anybody an idea ?? it’s been one week I’m struggling on that issue with no result so far :(

I just attached the nvidia-bug-report (both 3.12 kernel and 3.9.11) file to this post, in case this could help.
nvidia-bug-report.log.gz (51.4 KB)
nvidia-bug-report-3.9.11.log.gz (48.2 KB)