Debian hangs after upgrading to deverloper drivers 260.24

Hello,

I have a workstation with a Tyan S7025 motherboard, 2x Intel Xeon X5550, 24 GB RAM, 1x Nvida Quadro FX 4800 and 3x Nvidia Tesla C1060. It runs Debian Squeeze 64-bit.

It was running fine with the developer drivers 195.36.15 (and user drivers 256.35 and previous) and CUDA 3.0.

However, after upgrading to developer drivers 260.24 I started to observe system hangs after using the computer for a few hours. I have also observed the same behavior with the normal drivers 256.53.

The computer hangs completely and becomes unresponsive to any keyboard key combination. The screen is frozen but not blank. I have to switch it off by pressing the power button for a few seconds.

I have two questions:

  1. Do I really need the developer drivers in order to use CUDA?

  2. Do I have to use a specific version of the drivers for each CUDA version? I mean should I strictly use 195.36.15 with 3.0, 256.40 with 3.1, 260.24 with 3.2?

Best regards,

Miro

Hello,

I have a workstation with a Tyan S7025 motherboard, 2x Intel Xeon X5550, 24 GB RAM, 1x Nvida Quadro FX 4800 and 3x Nvidia Tesla C1060. It runs Debian Squeeze 64-bit.

It was running fine with the developer drivers 195.36.15 (and user drivers 256.35 and previous) and CUDA 3.0.

However, after upgrading to developer drivers 260.24 I started to observe system hangs after using the computer for a few hours. I have also observed the same behavior with the normal drivers 256.53.

The computer hangs completely and becomes unresponsive to any keyboard key combination. The screen is frozen but not blank. I have to switch it off by pressing the power button for a few seconds.

I have two questions:

  1. Do I really need the developer drivers in order to use CUDA?

  2. Do I have to use a specific version of the drivers for each CUDA version? I mean should I strictly use 195.36.15 with 3.0, 256.40 with 3.1, 260.24 with 3.2?

Best regards,

Miro

Hangs with 256.53 are strange - I am using them on 64-bit Debian and do not have any troubles (I had some instability, but it was rather caused by experimental GTK, as they stopped after gtk update).

  1. If you want to use CUDA 3.2 you need to use 260.x - CUDA 3.2 does not work with drivers 256.53.

  2. If you do not need CUDA 3.2 (do not have Fermi) you can use CUDA 3.1 with drivers 256.53 (currently in Debian experimental).

There will be no 260.x drivers until Squeeze release unforutnately, as Debian NVIDIA team is working on 195.x drivers - they will be shipped in Squeeze.

Hangs with 256.53 are strange - I am using them on 64-bit Debian and do not have any troubles (I had some instability, but it was rather caused by experimental GTK, as they stopped after gtk update).

  1. If you want to use CUDA 3.2 you need to use 260.x - CUDA 3.2 does not work with drivers 256.53.

  2. If you do not need CUDA 3.2 (do not have Fermi) you can use CUDA 3.1 with drivers 256.53 (currently in Debian experimental).

There will be no 260.x drivers until Squeeze release unforutnately, as Debian NVIDIA team is working on 195.x drivers - they will be shipped in Squeeze.

Thanks for the reply. Do you know what is the difference between the developer drivers and the others? I have noticed that having installed the developer drivers significantly reduces the number of FPS in glxgears. I am aware that glxgears is not a good benchmark, but anyway, is there any difference in performance? And, again, do I really need the developer drivers in order to use CUDA-aware code?

Another question, a bit off-topic, when compiling the drivers, the installer complains about a gcc version mismatch (the kernel is compiled with 4.3, but the default gcc version in Squeeze is 4.4). I solved the problem by making /etc/alternatives/cc to point to gcc-4.3. Is this really important or can one just ignore the warning?

Just one more thing, the reason why I believe there is a problem with the drivers and not with the OS, is because I have tried different distros (Debian testing and Scientific Linux) and the hang happened in all cases and it was version-dependent. Quite likely, though, it is also specific to my particular hardware setup (FX4800 + 3x C1060).

Thanks for the reply. Do you know what is the difference between the developer drivers and the others? I have noticed that having installed the developer drivers significantly reduces the number of FPS in glxgears. I am aware that glxgears is not a good benchmark, but anyway, is there any difference in performance? And, again, do I really need the developer drivers in order to use CUDA-aware code?

Another question, a bit off-topic, when compiling the drivers, the installer complains about a gcc version mismatch (the kernel is compiled with 4.3, but the default gcc version in Squeeze is 4.4). I solved the problem by making /etc/alternatives/cc to point to gcc-4.3. Is this really important or can one just ignore the warning?

Just one more thing, the reason why I believe there is a problem with the drivers and not with the OS, is because I have tried different distros (Debian testing and Scientific Linux) and the hang happened in all cases and it was version-dependent. Quite likely, though, it is also specific to my particular hardware setup (FX4800 + 3x C1060).

I assume that situation with developer/final drivers is similar to one with Debian stable/testing.

Final drivers are for public - everyone should be able to use them without problems.

Developer drivers contain new code, that maybe was not fully tested. Developers outside of NVIDIA can download and use them. They got new features, NVIDIA has better testing (more strange hardware configurations, etc.) At the same time developers are more suited for dealing with strange errors and give better bug reports that ordinary users.

Performance difference between developer and final drivers would suggest that there is some debugging code - but one cannot be sure without disassembling (might not be legal - depending on your country).

I do not have access to multi-GPU machine, so cannot help you there.

As for compiler version - currently I am using dkms, so NVIDIA modules are build automatically. I was building modules myself for 190.x and 195.x and my system worked without any problems. I do not remember having to play with compiler versions - so I assume that it should not matter.

I assume that situation with developer/final drivers is similar to one with Debian stable/testing.

Final drivers are for public - everyone should be able to use them without problems.

Developer drivers contain new code, that maybe was not fully tested. Developers outside of NVIDIA can download and use them. They got new features, NVIDIA has better testing (more strange hardware configurations, etc.) At the same time developers are more suited for dealing with strange errors and give better bug reports that ordinary users.

Performance difference between developer and final drivers would suggest that there is some debugging code - but one cannot be sure without disassembling (might not be legal - depending on your country).

I do not have access to multi-GPU machine, so cannot help you there.

As for compiler version - currently I am using dkms, so NVIDIA modules are build automatically. I was building modules myself for 190.x and 195.x and my system worked without any problems. I do not remember having to play with compiler versions - so I assume that it should not matter.

Kind of the same problem for me.

I’m on a laptop (a rebranded CLEVO W860CU), i7 CPU, the video card is a GTS 360M, OS: Ubuntu 10.4 (so I’m partially off-topic, sorry) with devdriver_3.2_linux_64_260.24 and all recommended v3.2 stuff.

Symptoms seems to be the same, but they appear when I run my kernels (which went smoothly on the previous environment: Ubuntu 9.10, devdriver_3.0_linux_64_195.36.15 and all recommended 3.0 stuff).

At every attempt to re-run the same kernel the problem worsens (less and less output, more and more waiting time). The screen is “almost” freezed: I have mouse pointer interaction (but none with the windows manager) for a while, then complete freeze for other, say, 30’’, and so on.

Strangely, the performance level shown in the nvidia-settings window got stuck at level 1, even with powermizer set on “adaptive” mode.

While writing these lines I had another short freeze while running glxgears to unblock the performance level.

Kind of the same problem for me.

I’m on a laptop (a rebranded CLEVO W860CU), i7 CPU, the video card is a GTS 360M, OS: Ubuntu 10.4 (so I’m partially off-topic, sorry) with devdriver_3.2_linux_64_260.24 and all recommended v3.2 stuff.

Symptoms seems to be the same, but they appear when I run my kernels (which went smoothly on the previous environment: Ubuntu 9.10, devdriver_3.0_linux_64_195.36.15 and all recommended 3.0 stuff).

At every attempt to re-run the same kernel the problem worsens (less and less output, more and more waiting time). The screen is “almost” freezed: I have mouse pointer interaction (but none with the windows manager) for a while, then complete freeze for other, say, 30’’, and so on.

Strangely, the performance level shown in the nvidia-settings window got stuck at level 1, even with powermizer set on “adaptive” mode.

While writing these lines I had another short freeze while running glxgears to unblock the performance level.