Strange freezes with Tesla C2050 - Help needed! Help needed!!!!

Dittoaway · November 17, 2010, 3:57pm

Have you tried swapping slots for the 2 cards?

Dittoaway · November 17, 2010, 3:57pm

Have you tried swapping slots for the 2 cards?

Dittoaway · November 17, 2010, 4:03pm

Does your ASUS motherboard have the latest bios? There have been 4 bios updates for that board in 2010, including one for system instability.

Dittoaway · November 17, 2010, 4:03pm

Does your ASUS motherboard have the latest bios? There have been 4 bios updates for that board in 2010, including one for system instability.

plegresley · November 17, 2010, 4:57pm

Ours are all production cards sourced through two different vendors. We’ve had failures from both.

plegresley · November 17, 2010, 4:57pm

Ours are all production cards sourced through two different vendors. We’ve had failures from both.

venovako · November 17, 2010, 6:43pm

Yes, that was one of the first attempts to fix the problems, but without success.

BTW, the motherboard (Gigabyte) BIOS is the latest one.

venovako · November 17, 2010, 6:43pm

Yes, that was one of the first attempts to fix the problems, but without success.

BTW, the motherboard (Gigabyte) BIOS is the latest one.

Sephiroth85 · November 19, 2010, 12:14pm

Dear all,

I am experiencing similar problems with my Tesla C2050.

My system configuration is:
Computer: Custom Assembled
Motherboard: ASUS P6T SE
Processor: Intel Pentium D (no overclocking)
Memory: 2 GB
Power supply: CoolerMaster GX 750 W
GPU: ATI FireGL V3100 (slot 1) + NVIDIA Tesla C2050 (slot 2)
OS: XP professional x64 SP2
Driver: NVIDIA 260.81 (Tesla version, SLI disabled)
CUDA Toolkit: 3.2
CUDA SDK: 3.2

When I use only the Tesla card as main graphic card (having previously phisically removed the ATI card from the cabinet), I can sometimes get into Windows, but I start seeing strange “stripes” since the first main window screen. The number of stripes around the screen increases as the time passes and especially after launching some programs (GPU-Z, matlab, the control panel…) the system stops completely. When I then boot the computer again, a blue screen appears with the following message:

HARDWARE MALFUNCTION
CALL YOUR HARDWARE VENDOR FOR SUPPORT
NMI: PARITY CHECK / MEMORY PARITY ERROR
THE SYSTEM HAS HALTED

When I reboot in safe model and check my Device Mangaer, it says that the Tesla card does not work because it has a problem. I then put back again into the system the ATI card, and boot again in safe mode. I disable the Tesla card and set the ATI card as my main VGA driver. Then I can get to windows without any problems, and when I re-enable the Tesla card it does not give me any problem. But then, with such a configuration running, GPU-Z crashes and can’t see the Tesla card. Matlab crashes as well and can’t find any CUDA-capable gpu device (unknown error 10100). I’ve reinstalled all the drivers and tried both the CUDA toolkit 3.2 and 3.1 x64, but nothing seems to solve this situation. To me, it already feels strange that I can’t use the Tesla as my main VGA device succesfully.

I don’t know whether my hardware configuration is not appropriate, or the Tesla card is not working properly, or there are some driver conflicts. Do you have any ideas and / or suggestion?

Please help…it’s so frustrating having so much power, but not being able to use it :(

Best,
Sephi

Sephiroth85 · November 19, 2010, 12:14pm

Dear all,

I am experiencing similar problems with my Tesla C2050.

My system configuration is:
Computer: Custom Assembled
Motherboard: ASUS P6T SE
Processor: Intel Pentium D (no overclocking)
Memory: 2 GB
Power supply: CoolerMaster GX 750 W
GPU: ATI FireGL V3100 (slot 1) + NVIDIA Tesla C2050 (slot 2)
OS: XP professional x64 SP2
Driver: NVIDIA 260.81 (Tesla version, SLI disabled)
CUDA Toolkit: 3.2
CUDA SDK: 3.2

When I use only the Tesla card as main graphic card (having previously phisically removed the ATI card from the cabinet), I can sometimes get into Windows, but I start seeing strange “stripes” since the first main window screen. The number of stripes around the screen increases as the time passes and especially after launching some programs (GPU-Z, matlab, the control panel…) the system stops completely. When I then boot the computer again, a blue screen appears with the following message:

HARDWARE MALFUNCTION
CALL YOUR HARDWARE VENDOR FOR SUPPORT
NMI: PARITY CHECK / MEMORY PARITY ERROR
THE SYSTEM HAS HALTED

When I reboot in safe model and check my Device Mangaer, it says that the Tesla card does not work because it has a problem. I then put back again into the system the ATI card, and boot again in safe mode. I disable the Tesla card and set the ATI card as my main VGA driver. Then I can get to windows without any problems, and when I re-enable the Tesla card it does not give me any problem. But then, with such a configuration running, GPU-Z crashes and can’t see the Tesla card. Matlab crashes as well and can’t find any CUDA-capable gpu device (unknown error 10100). I’ve reinstalled all the drivers and tried both the CUDA toolkit 3.2 and 3.1 x64, but nothing seems to solve this situation. To me, it already feels strange that I can’t use the Tesla as my main VGA device succesfully.

I don’t know whether my hardware configuration is not appropriate, or the Tesla card is not working properly, or there are some driver conflicts. Do you have any ideas and / or suggestion?

Please help…it’s so frustrating having so much power, but not being able to use it :(

Best,
Sephi

ceearem · November 19, 2010, 3:04pm

Something doesnt fit here:

Asus P6T SE is a 1366 slot while Intel Pentium D is 775. If you err on the mainboard, my guess is you have a motherboard which only provides pcie1.1 slots (the V3100 is a pcie1.1 card). But I think the C2050 needs a PCIe2.0 slot. So that would be no surprise if it doesnt work.

Cheers

Ceearem

ceearem · November 19, 2010, 3:04pm

Something doesnt fit here:

Asus P6T SE is a 1366 slot while Intel Pentium D is 775. If you err on the mainboard, my guess is you have a motherboard which only provides pcie1.1 slots (the V3100 is a pcie1.1 card). But I think the C2050 needs a PCIe2.0 slot. So that would be no surprise if it doesnt work.

Cheers

Ceearem

seibert · November 19, 2010, 4:50pm

PCI-Express is supposed to be backward compatible, right? I know I used early PCI-Express 2.0 CUDA cards on a PCI-Express 1.x motherboard back when they first came out. The host and device negotiate the fastest speed that both support.

seibert · November 19, 2010, 4:50pm

PCI-Express is supposed to be backward compatible, right? I know I used early PCI-Express 2.0 CUDA cards on a PCI-Express 1.x motherboard back when they first came out. The host and device negotiate the fastest speed that both support.

SPWorley · November 19, 2010, 5:59pm

I’ll repeat the advice that always comes up with GPU hardware issues… Tim pointed it out already in this thread.

Check your PSU.

Rated wattage is not enough… you also need quality. You cannot skimp on the PSU. It’s OK to get cheap motherboards, CPUs, RAM, and hard drives, but the PSU is intimately connected to the GPU.

Using a $75 PSU like the CoolerMaster GX is just a bad idea in general.
What symptoms do you get with a low quality PSU? Hard to diagnose failures. Sometimes the GPU isn’t detected, sometimes it works fine but only for some benchmarks, sometimes it works fine for weeks and corrupts data only one time in a thousand.
These are all hard to diagnose, especially the 1 in 1000 failures.

Wattage rating is not enough to consider… you also really need to check quality. Price usually tells.
The Thermaltake Toughpower PSUs are my current pick, but PC Power and Cooling is also reliable.
I’ve also used Corsair and Antec in the past… those are also good brands but I STILL had a PSU instability with an Antec once.

With a $2000 C2050, please spend $250 on your PSU.
Wattage rating should always be generous for even more safety. You can measure your peak power use with a Kill-o-Watt and multiply that by 1.5 to get a good baseline of what your PSU should supply.

SPWorley · November 19, 2010, 5:59pm

I’ll repeat the advice that always comes up with GPU hardware issues… Tim pointed it out already in this thread.

Check your PSU.

Rated wattage is not enough… you also need quality. You cannot skimp on the PSU. It’s OK to get cheap motherboards, CPUs, RAM, and hard drives, but the PSU is intimately connected to the GPU.

Using a $75 PSU like the CoolerMaster GX is just a bad idea in general.
What symptoms do you get with a low quality PSU? Hard to diagnose failures. Sometimes the GPU isn’t detected, sometimes it works fine but only for some benchmarks, sometimes it works fine for weeks and corrupts data only one time in a thousand.
These are all hard to diagnose, especially the 1 in 1000 failures.

Wattage rating is not enough to consider… you also really need to check quality. Price usually tells.
The Thermaltake Toughpower PSUs are my current pick, but PC Power and Cooling is also reliable.
I’ve also used Corsair and Antec in the past… those are also good brands but I STILL had a PSU instability with an Antec once.

With a $2000 C2050, please spend $250 on your PSU.
Wattage rating should always be generous for even more safety. You can measure your peak power use with a Kill-o-Watt and multiply that by 1.5 to get a good baseline of what your PSU should supply.

plegresley · November 19, 2010, 6:49pm

I’ll repeat the advice that always comes up with GPU hardware issues… Tim pointed it out already in this thread.

Check your PSU.

Rated wattage is not enough… you also need quality. You cannot skimp on the PSU. It’s OK to get cheap motherboards, CPUs, RAM, and hard drives, but the PSU is intimately connected to the GPU.

Using a $75 PSU like the CoolerMaster GX is just a bad idea in general.

What symptoms do you get with a low quality PSU? Hard to diagnose failures. Sometimes the GPU isn’t detected, sometimes it works fine but only for some benchmarks, sometimes it works fine for weeks and corrupts data only one time in a thousand.

These are all hard to diagnose, especially the 1 in 1000 failures.

Wattage rating is not enough to consider… you also really need to check quality. Price usually tells.

The Thermaltake Toughpower PSUs are my current pick, but PC Power and Cooling is also reliable.

I’ve also used Corsair and Antec in the past… those are also good brands but I STILL had a PSU instability with an Antec once.

With a $2000 C2050, please spend $250 on your PSU.

Wattage rating should always be generous for even more safety. You can measure your peak power use with a Kill-o-Watt and multiply that by 1.5 to get a good baseline of what your PSU should supply.

Changing the power supply generally isn’t an option with workstations from the Tier 1 vendors. They often use custom form factors and/or connections. Plus it defeats the purpose of buying from a Tier 1 because you void the warranty and have no service/support.

plegresley · November 19, 2010, 6:49pm

I’ll repeat the advice that always comes up with GPU hardware issues… Tim pointed it out already in this thread.

Check your PSU.

Rated wattage is not enough… you also need quality. You cannot skimp on the PSU. It’s OK to get cheap motherboards, CPUs, RAM, and hard drives, but the PSU is intimately connected to the GPU.

Using a $75 PSU like the CoolerMaster GX is just a bad idea in general.

What symptoms do you get with a low quality PSU? Hard to diagnose failures. Sometimes the GPU isn’t detected, sometimes it works fine but only for some benchmarks, sometimes it works fine for weeks and corrupts data only one time in a thousand.

These are all hard to diagnose, especially the 1 in 1000 failures.

Wattage rating is not enough to consider… you also really need to check quality. Price usually tells.

The Thermaltake Toughpower PSUs are my current pick, but PC Power and Cooling is also reliable.

I’ve also used Corsair and Antec in the past… those are also good brands but I STILL had a PSU instability with an Antec once.

With a $2000 C2050, please spend $250 on your PSU.

Wattage rating should always be generous for even more safety. You can measure your peak power use with a Kill-o-Watt and multiply that by 1.5 to get a good baseline of what your PSU should supply.

Changing the power supply generally isn’t an option with workstations from the Tier 1 vendors. They often use custom form factors and/or connections. Plus it defeats the purpose of buying from a Tier 1 because you void the warranty and have no service/support.

SPWorley · November 19, 2010, 11:01pm

But that’s not an issue because of that support. If your hardware is failing, then that warranty and support will make it their problem, not yours.

Side comment: You’re right that Tier 1 vendor custom builds are limiting, though. I consulted with a company who had bought an IBM workstation with a Tesla. The workstation was even configurable for dual Teslas, but they bought with just one card. I wanted to add a second GPU, and lo and behold IBM had used a custom PSU with exactly one 6 pin and one 8 pin power cable, so I couln’t just add an extra. And the custom PSU form factor prevented swapping it out, and IBM didn’t sell the “two GPU” PSU separately. I ended up using one of those 5.25" drive bay aux PSUs to drive the second board. But only after digging up a PCI video card since the motherboard only had two PCIe slots! It turned me off of such custom machines… you get what you pay for, but absolutely nothing past that.

SPWorley · November 19, 2010, 11:01pm

But that’s not an issue because of that support. If your hardware is failing, then that warranty and support will make it their problem, not yours.

Side comment: You’re right that Tier 1 vendor custom builds are limiting, though. I consulted with a company who had bought an IBM workstation with a Tesla. The workstation was even configurable for dual Teslas, but they bought with just one card. I wanted to add a second GPU, and lo and behold IBM had used a custom PSU with exactly one 6 pin and one 8 pin power cable, so I couln’t just add an extra. And the custom PSU form factor prevented swapping it out, and IBM didn’t sell the “two GPU” PSU separately. I ended up using one of those 5.25" drive bay aux PSUs to drive the second board. But only after digging up a PCI video card since the motherboard only had two PCIe slots! It turned me off of such custom machines… you get what you pay for, but absolutely nothing past that.

Topic		Replies	Views
Tesla C2050 acting flakey? CUDA Programming and Performance	13	2865	December 17, 2010
Tesla C2050 slower than GeForce 8800? CUDA Programming and Performance	14	20940	April 20, 2011
HW Power Brake Slowdown CUDA Programming and Performance	7	5085	February 27, 2021
Win7 x64 with 2x C2050 Install problems CUDA Programming and Performance	16	40605	November 18, 2010
Recommended NVIDIA Card? CUDA Programming and Performance	18	8575	July 16, 2009
Driver Installation for Tesla K80 - Problems CUDA Setup and Installation	17	6725	January 18, 2020
Building your own personal supercomputer What we did and what problems we had CUDA Programming and Performance	23	11564	May 6, 2010
Problems after inserting a P100 CUDA Setup and Installation	35	3902	October 31, 2021
PC turn off when run CUDA samples CUDA Programming and Performance	11	9495	October 27, 2008
K80 crashed or wrong computation results on K80 CUDA Programming and Performance	13	4989	September 20, 2015

Strange freezes with Tesla C2050 - Help needed! Help needed!!!!

Related topics