cuSolverMG and CMP 170 HX

godom · July 11, 2024, 3:59am

I’ve written some code that uses cuSolverMG to solve for eigenvalues and vectors. I have a motley collection of cards. One 1080 Ti, one 3080 Ti, and two 170 HX. cuSolverMG works fine on a single 170 HX or with one 170 HX combined with one or more of the RTX cards. But two 170 HX fail to diagonalize the matrices.

nvidia-smi shows the cards working normally until the matrix should have been solved. The program then appears to hang for a few minutes. After that, the program exits and the eigenvalues and vectors are all NaN.

I know the 170 HX cards have throttled FP but I thought they should work normally otherwise. Any help is appreciated.

njuffa · July 11, 2024, 7:01am

If these cards are graphics capable (don’t know, have never used them), slow-running kernels due to throttled FP could exceed the time limit on compute kernels imposed by the operating systems GUI watch dog timer. A typical value for this limit is 2 seconds.

When the watchdog timer expires, the operating system resets the GUI subsystem causing the CUDA context to be destroyed, which in turn causes the CUDA-accellerated app to fail. If the application does not not properly catch this condition, all kind of invalid data can be generated.

On Windows I have observed random instances of noticeable delay until such an abnormally terminated application returns control to the command line.

godom · July 11, 2024, 2:02pm

These cards aren’t graphics-capable. Also, I’ve verified that when one 170 HX is used alone or one is combined with the RTX cards, the eigenvalues and eigenvectors generated are correct.

The 170 HX was made for mining cryptocurrency. Lately they have come way down in price. I got a few and was checking if they could be repurposed. For eigensolving, they are equivalent in speed to the 1080 Ti. But they use less than half the wattage of the 1080 Ti or the 3080 Ti.

Robert_Crovella · July 11, 2024, 3:10pm

I have no idea what the issue is. A few generic suggestions:

use rigorous error checking throughout. Test every CUDA or CUDA library API call that returns an error code.
run the code with compute-sanitizer, both in passing an failing test cases.

gerd4 · July 11, 2024, 4:50pm

May I ask if you are running Linux or Windows? I am wondering if the second CMP 170HX bottlenecks the PCIe bus channel/lane allocation?

Also isn’t the 170HX powered from the rear and not the PCIe connection? do you have stable, and well grounded power to that board vs chassis/mothberboard (ground loop)?

Also can anyone confirm if they have successfully used 170HX with CUDA? - I seemed to have read on some tech sheet it is not supported.

Just sharing some of my experiences over the years troubleshooting multiple concurrent bus based designs.

Hope helpful, look forward to hearing back.

Also, I do see the price has really dropped from initial $5K release price a few years ago, my only concern is picking up cards that have been through the trenches as cryptocurrency mining machines, would have surely been stressed, so wondering about longevity when ‘turned out to pasture’ on the used marketplace - anyone have experience?

Thank you.

godom · July 11, 2024, 5:15pm

I’m running Linux. I’ve successfully run CUDA code on each 170 HX individually. Both (170 HX A + 3080 Ti) and (170 HX B + 1080 Ti + 3080 Ti) also run CUDA code successfully. But (170 HX A + 170 HX B) fails.

While doing debugging, I have found a problem with my motherboard. The machine does not boot when two 1080 Tis and one 3080 Ti are plugged in. It doesn’t boot with those three cards even if no 170 HXs are plugged in. There’s also an issue with cusolverMgDeviceSelect() on this machine. The devices have different IDs in cusolverMgDeviceSelect() versus nvidia-smi.

The motherboard is an MSI TRX40 Aorus Master. It had a firmware issue with video cards that was fixed in a previous BIOS update. Maybe there was a regression in the latest BIOS. The latest BIOS fixes a remote exploit so I don’t want to revert it.

But there’s a chance this is a problem with my motherboard. I need to rule that out. I’m going to setup another machine soon-ish and test on it. I’ll update this thread with whatever I discover.

godom · July 11, 2024, 5:36pm

I forgot to reply to your last paragraph. The cards I have were previously used in mining. They don’t have any fans of their own, so the moving part that typically breaks isn’t there. I haven’t had any video cards die outside that in the past. knocks on wood We attached a simple blower to the back to blow air through. The ‘low’ speed is fast enough to keep the card cooler than RTX cards while also being reasonable quiet.

I need to find the link but I read that FMA-using code will run very slowly on these cards. I guess diagonalization doesn’t use that much because these cards do that well. But YMMV with the workload.

gerd4 · July 11, 2024, 6:32pm

definitely want to keep the BIOS upgrade to not allow an exploit, I seem to have read that the vBios for the CMP 170HX was perhaps card specific and had to have a script executed to register the card, not sure if this is the case in Linux - check revision of the vBIOS are on the CMPs

I did come upon this in researching IF the 170HX makes any sense for anything but crypto mining in multiple 170HX configurations and offer it if you have not seen/read it

Link: All GB/s without FLOPS - Nvidia CMP 170HX Review, Performance Lockdown Workaround, Teardown, Watercooling, and Repair

gerd4 · July 11, 2024, 7:00pm

these comments at the link I provided seem to limit the full I implementation of CUDA of the 170HX, have you found a work around?

…" CUDA

To disable FP contraction, pass the option -fmad=false to nvcc .

Again, like OpenCL, it’s also possible to use FMA and MAD via multiple built-in functions. Since nvcc is proprietary, again, there’s no transparent way to disable FMA and MAD globally at the compiler level - apart from reverse engineering and modifying the Nvidia binaries, or implementing a preprocessor or debugger-like interceptor to modify the source on-the-fly.

On the other hand, just like OpenCL, it’s possible to compile CUDA via LLVM/clang and target Nvidia PTX. Thus, in principle, it should be possible to create a patched LLVM/clang for this purpose in analogous to PoCL. "…

seems like a lot of work around and unstable

Forgot link: All GB/s without FLOPS - Nvidia CMP 170HX Review, Performance Lockdown Workaround, Teardown, Watercooling, and Repair

gerd4 · July 12, 2024, 2:16am

To use CUDA were there a lot of workarounds required, or was it fairly straightforward on 170HX boards vs other GPUs?

godom · July 12, 2024, 6:42pm

I didn’t make any workarounds or changes to use CUDA on the 170 HX. It’s worked as-is for me so far. But the only thing I’ve used it for so far is diagonalization.

godom · July 14, 2024, 7:22pm

I figured out the problem. I needed to update the driver. The driver I was using was released before the 170HX even existed. Now it’s surprising that a single card could get the correct answer.

Thanks everyone for your help ^_^.

gerd4 · July 15, 2024, 4:12pm

Was update on nVidea website, or on another one? - what was rev date?

godom · July 16, 2024, 5:00pm

It was the latest driver on NVIDIA’s website. Version 550.100.

gerd4 · October 8, 2024, 8:49pm

Is anyone using Ubuntu server with only CLI, or are you using workstation GUI?

Thank you

using server instal CLI, have a single CMP 170HX and can not get motherboard ( ONDA B250-D12P-D3) to recognize card in slot1 or get drivers installed

godom · October 10, 2024, 4:53am

I’m using the card in an Ubuntu system without no video out. I’m not familiar with your motherboard. But if the board has on-board graphics, you’ll need to disable any other drivers that might have been enabled when you installed the OS. After that, you should able to install the Nvidia driver. I wasn’t able to see the card until the driver was installed.

Topic		Replies	Views
[SOLVED] Run CUDA on dedicated NVIDIA GPU while connecting monitors to Intel HD graphics, is this possible? CUDA Setup and Installation	15	71890	December 9, 2018
Cuda-gdb doesn't break and/or step into Kernels CUDA Programming and Performance	26	53761	August 1, 2011
cuda-gdb hang and compiled program spewing nonsense CUDA Programming and Performance	7	2248	February 15, 2011
GPU in state where results are not reproducible! CUDA Programming and Performance	50	16703	November 2, 2012
CUDA very slow performance CUDA Programming and Performance	21	16737	March 6, 2020
An Interesting Development - New CMP architecture CUDA Programming and Performance	31	2647	April 20, 2021
why "all CUDA-capable devices are busy or unavailable" ? CUDA Programming and Performance	34	64338	April 20, 2011
deviceQuery passes and then fails CUDA Setup and Installation	4	2150	July 6, 2016
Maxwell suddernly becomes 10x slower CUDA Programming and Performance	15	4568	February 24, 2016
GTX 1080 very bad result for mining CUDA Programming and Performance	24	36756	October 2, 2017

cuSolverMG and CMP 170 HX

Related topics