S870 causes kernel panic Device query of S870 crashes kernel

nvidia-smi ships with the driver package. Knowing whether the results differ with the 2.0-beta driver is important.

Ok, so nvidia-smi does rely on some of the stuff that CUDA installs then? I was under the impression that it could be run without CUDA even on the system.

I’ll install the CUDA 2.0-beta and re-run nvidia-smi to see if that changes anything, if that’s what you’re asking me to do.

I think netllama wants you to install the beta 174.55 nvidia driver associated with the CUDA 2.0 release. Here’s a link,

[url=“The Official NVIDIA Forums | NVIDIA”]http://forums.nvidia.com/index.php?showtopic=65067[/url]

Ah, I understand now. I will try this this morning and see if it helps. Thanks.

On my CUDA 2.0 Linux x86_64 machine, nvidia-smi reports an error about a driver version mismatch.

Yes, I see the same problem. After installing the 174.55 driver package, when I run nvidia-smi, I get the following:

[i][jgreen@tesla1 ~]$ nvidia-smi

Error: API mismatch: the NVIDIA kernel module has version 174.55,

but this NVIDIA driver component has version 171.05. Please make

sure that the kernel module and all NVIDIA driver components

have the same version.

NVSMI: Failed to allocate an RM client. Failing…

Could not allocate resources!

[jgreen@tesla1 ~]$ [/i]

Apparently the 174.55 for RHEL5 on x86_64 doesn’t properly uninstall the previous package? I tried running the 174.55 installer twice, and rebooted in between installing and then running nvidia-smi one time, but always the same result, this error.

This is interesting. I downgraded to 171.06.01 after trying 174.55. I still had my BUG problems. Then, as a test, I installed 174.55 again. When running nvidia-smi after installing 174.55, I get the above mentioned error, exactly as it is above.

I do not think that it’s not properly uninstalling the previous driver installation, I think that 174.55 has a driver component in it from 171.05.

It detected 171.06.01 during install:

There appears to already be a driver installed on your system (version: 171.06.01). As part of installing this driver (version: 174.55), the existing driver will be uninstalled. Are you sure you want to continue? (‘no’ will abort installation)

But it errors out saying the driver component is 171.05:

[jgreen@tesla1 ~]$ nvidia-smi
Error: API mismatch: the NVIDIA kernel module has version 174.55,
but this NVIDIA driver component has version 171.05. Please make
sure that the kernel module and all NVIDIA driver components
have the same version.
NVSMI: Failed to allocate an RM client. Failing…
Could not allocate resources!

I amend my earlier comment about improper de-installation. This looks like there are disparate versions of components in the 174.55 bundle.

i am getting pulled into something else for a little bit, so i probably won’t be back looking at this problem for about a week. i’ll check back in when i come back to it, since i still need to get my s870 working again.

thanks.

–Joe