Bug report: 455.23.04 - Kernel Panic due to NULL pointer dereference

@ddimi, What’s the most convenient way to archieve it in Arch? With or without dkms?

Security Bulletin: NVIDIA GPU Display Driver - January 2021 | NVIDIA does not list any unaffected version of drivers except for 390.x and 418.x as having the latest security fixes.

It’s unclear, if any of the security problems mentioned are present in 440.100, the latest unaffected driver version.

What should we use until there will be a fix in 460.x?

AMD

also I can’t add new reply, so I have to add it in this reply:
I’ve tried installing river with “–no-unified-memory” and it hasn’t helped.
Manjaro, 5.10, 460.39
the torvalds quote

UPDATE
I really doubt it installed without unified memory last time, I now fully removed driver and installed again and this time it said me that I’m installing without uvm, no CUDA, so I’ll check it. will write here in a few days

UPDATE
I’ve checked it. no-unified-memory doesn’t help at all. nvidia, torvalds you

I said today would come and it has. I was faced with haters and Nvidia bootlickers and shills but today has come. I didn’t want to flame again but here I am again.

Today the Arch Linux repos have updated it’s LTS kernel to Linux 5. DKMS will no longer build and I have to choose between using the latest kernel and it’s security patches or actually using my computer.

make.log (102.7 KB)

I cannot use the latest kernel as I need my GPU to run ML tasks. I’m currently working with GPT-2 and training easily takes more than a day. Crashes happen once a few hours on a buggy driver. I’ve had it do it multiple times a minute.

If anyone knows how to get the latest kernel with a stable driver PLEASE LET ME KNOW. MY DEGREE DEPENDS ON THIS. As far as I’m aware, this is the (worst) situation I have imagined and it’s come to fruition. To my sm0l brain there is no fixes as the headers data types are unmatching thus the underlying data structures and code. There is no fix and there cannot be a fix unless the code is refactored or patched. Aka a new driver.

I could not imagine any company especially the size of Nvidia doing something this bad on such a scale. I cannot even begin to imagine how any of these devs and mods still even have their jobs. This is a not a mere “oopsie, lol we don’t know what is happening xddd haha, sawweeee UwU” laugh and forget situations.

I NEED MY COMPUTER. I paid for a fast GPU because I need one for my work and now I cannot use it! So no, I cannot just sit quietly and not demand Nvidia give us driver for devices we have paid good money for. My life and many others’ lives rest on the work done on their computers! My project cannot be hosted on a cloud service as it would be hard for me easily dev and debug this project with the data sets I’m using easily reaching gigs. Any why should I have to pay out of my own pocket to work around this bug?

When you make the best GPUs on the planet and earn billions a year this is not an excuse! This is not an excuse no matter what the situation or company! What place can I go into work and mess up the lives of thousands if not potentially millions of people for over 5 months? Knowing the sky high salaries that Nvidia pay, I have to ask. Can I work here? I can also spam the exact same comment while I do nothing on a forum. Look I’m doing it now!

Since NVidia are having a hard time reproducing it, maybe you could share your code with them. That would hopefully move things along.

I really don’t think the lack of reports are the problem here. If you read the other posts people have said they literally have the same log. I’ve made several bug reports already and so have others.

The problem here is not the lack of reports okay? It’s Nvidia, like it was on day 1 and like it is today.

Have you seen the Torvald’s video? People only remember the “FU nvidia” bit. But the question was from someone in the audience who also shared Torvald’s experience and thus asking him about it. This is a universal experience. This situation is not out of the ordinary it’s an emergent property of how Nvidia works and continues to work today.

It’s not about the lack of bugs reports. It never has been. It’s Nvidia. Always has been. I’m tired of giving them the benefit of doubt And why should I? They have shown time and time again they don’t care and just want your money. And if I and other haven’t mentioned yet. IT’S BEEN 5 MONTHS.

Anyone who is still defending Nvidia at this stage needs to go get their head checked.

You appear to be confused. NVidia says the problem is that they can’t reproduce it. Additional bug reports containing the same information don’t make an issue any easier to reproduce, so your point about the number of bug reports is irrelevant. On the other hand, you appear to have something other people in this thread don’t have – a way to reproduce the problem reliably within a few hours. That might be useful to NVidia toward solving this problem. Assuming, of course, that your end goal is to have this problem solved.

2 Likes

Yes? I want to get on with my life and never go on this website again. Do you really think I’m so petty I’d put my own degree and life on hold so I can win an internet argument???

Nvidia is one of the largest semiconductor companies with massive wallet and some of the smartest and well paid people in the industry. And you want to put the responsibility of debugging on the end user?

You are a bootlicker.

1 Like

At this point Nvidia developers should have been working with what they have. Even when you cannot reproduce the bug you can still analyze code and make preventive changes that would possibly prevent the reported crash. At the very least you would be adding debug info and logging, including at the point of crash. They could contact a few reporters with an ask for help reproducing the bug - I’m sure those affected would understand and could be helpful. I’m telling this as a software developer myself, I’ve been in the situations like this.

In any case, staying silent for months and occasionally releasing drivers once in a few months as if nothing is wrong is not the normal way of tackling the problem. The bug will not go away this way.

@aplattner Where is the driver with the change that you mentioned almost a week ago? Why is it not released yet? Nvidia, stop sitting on fixed drivers and get your releases going.

You do appear to be more interested in sharing your emotions than actually contributing toward fixing the problem. You seem petty enough to make posts on the internet that contribute nothing, even though it seems you do have the means to contribute something useful.

You of course have no obligation to help solve the problem. But surely it is in your best interest to have this solved as soon as possible, so I don’t see why you’d refuse to do that if it doesn’t cause you any undue burden.

One thing I’d like you to consider is that you’re not the only person having this problem. By withholding potentially useful information, you’re not only putting your own degree and life on hold, but also the lives and livelihoods of other people.

2 Likes

And I tried to emphasise. I understand very much as well. It’s not a good situation for anyone.

But Nvidia simply doesn’t care. Their behaviour is disgusting. They haven’t released any public statements except the same spam comment about how sorry they are and how much it sucks for us all T.T

Driver are still rolling out and none have been pulled or even have a warning on install.

They could contact a few reporters with an ask for help reproducing the bug

Read my comment here: Bug report: 455.23.04 - Kernel Panic due to NULL pointer dereference - #93 by anon52993935

“”"
It does seem a bit weird that you are having trouble reproducing this. I have some experience in operating systems programming but not on this level but from what I can gather and how the patch has helped it might be a stack/buffer overflow and corruption of some essential calls stacks. I wonder if there is a safe and sanitary way to dump the memory as well as the register logs in order to assist in debugging. The current logs do provide registers but it’s not much help if you don’t know how it got there. It would be extremely invasive for the drivers to do so by default but I’d be happy to run a logging daemon for the purposes of squashing this bug. Again not sure how great of an idea this is and how helpful it could be considering how invasive this is.
“”"

I literally offered to take one for the team and install malware on my computer to help the cause. This was a direct reply to a mod. They definitely saw this.

NVIDIA DOESN’T CARE. END OF.

1 Like

now try this attitude with nvidia instead of other users like you.

read my previous comments, I’ve tired to help. they just don’t care.

What “useful information” are you talking about, and how do you think we can contribute to solving this issue? We have already contributed our money to Nvidia, and everything we want is just a working driver. We have been getting new driver versions with the same critical bug for almost half a year! The bug is absolutely unpredictable, and everything we can do here is to provide our logs. What else do you expect from us?

1 Like

Simply scroll up and read my previous posts again. There is no need for me to type them again.

Me and you probably can’t contribute, but yuannan might be able to.

Registered just to say you are absolutely right @yuannan. This is disgusting behaviour from NVIDIA. I will not buy from them again, if I can help it.

I would suggest you program your program to save intermediate states. And reload from those intermediate states after crashes. You should not have to do this, but yeah. Other people let us down.

You might also consider downgrading your driver to 440.100 as suggested by @abelits. Try installing downgrade from the AUR. It might get you through it. Alternatively I believe pacman caches old packages and you may be able to downgrade through pacman, I am not sure how exactly. Otherwise you should setup your system on something other than Arch, something that will more easily let you select your driver version.

I hope you resolve your issue. I agree with you that these problems are unacceptable give nthe amount NVIDIA charges.

1 Like

BTW I’ve found out this bug also appears on 450.80.02 driver version

Where? Or your system presents the bug across all nVidia Driver Version

No. If someone decides to drive drunk and blindfolded from the boot of their car the manufacturer isnt responsible for the resulting accident.
Wheres your bug report?
Ill tear your system apart.

You could read your BIOS manual, the nVidia Driver manual and your Distro and Kernel whitepapers.
Your running Linux. A commercial enterprise grade Open Sourced product developed and presented to market entry point at no financial cost.

Change your perspective and you’ll see the issue differently.
*

*
http://us.download.nvidia.com/XFree86/Linux-x86_64/460.39/README/index.html
*
https://www.kernel.org/doc/html/v4.14/admin-guide/index.html
*

OH, the nvidia drivers are open source now? Geez, guess I missed something!

Follow your psychiatric doctors orders :p