cuSolverMG and CMP 170 HX

I’ve written some code that uses cuSolverMG to solve for eigenvalues and vectors. I have a motley collection of cards. One 1080 Ti, one 3080 Ti, and two 170 HX. cuSolverMG works fine on a single 170 HX or with one 170 HX combined with one or more of the RTX cards. But two 170 HX fail to diagonalize the matrices.

nvidia-smi shows the cards working normally until the matrix should have been solved. The program then appears to hang for a few minutes. After that, the program exits and the eigenvalues and vectors are all NaN.

I know the 170 HX cards have throttled FP but I thought they should work normally otherwise. Any help is appreciated.

If these cards are graphics capable (don’t know, have never used them), slow-running kernels due to throttled FP could exceed the time limit on compute kernels imposed by the operating systems GUI watch dog timer. A typical value for this limit is 2 seconds.

When the watchdog timer expires, the operating system resets the GUI subsystem causing the CUDA context to be destroyed, which in turn causes the CUDA-accellerated app to fail. If the application does not not properly catch this condition, all kind of invalid data can be generated.

On Windows I have observed random instances of noticeable delay until such an abnormally terminated application returns control to the command line.

These cards aren’t graphics-capable. Also, I’ve verified that when one 170 HX is used alone or one is combined with the RTX cards, the eigenvalues and eigenvectors generated are correct.

The 170 HX was made for mining cryptocurrency. Lately they have come way down in price. I got a few and was checking if they could be repurposed. For eigensolving, they are equivalent in speed to the 1080 Ti. But they use less than half the wattage of the 1080 Ti or the 3080 Ti.

I have no idea what the issue is. A few generic suggestions:

  • use rigorous error checking throughout. Test every CUDA or CUDA library API call that returns an error code.
  • run the code with compute-sanitizer, both in passing an failing test cases.

May I ask if you are running Linux or Windows? I am wondering if the second CMP 170HX bottlenecks the PCIe bus channel/lane allocation?

Also isn’t the 170HX powered from the rear and not the PCIe connection? do you have stable, and well grounded power to that board vs chassis/mothberboard (ground loop)?

Also can anyone confirm if they have successfully used 170HX with CUDA? - I seemed to have read on some tech sheet it is not supported.

Just sharing some of my experiences over the years troubleshooting multiple concurrent bus based designs.

Hope helpful, look forward to hearing back.

Also, I do see the price has really dropped from initial $5K release price a few years ago, my only concern is picking up cards that have been through the trenches as cryptocurrency mining machines, would have surely been stressed, so wondering about longevity when ‘turned out to pasture’ on the used marketplace - anyone have experience?

Thank you.

I’m running Linux. I’ve successfully run CUDA code on each 170 HX individually. Both (170 HX A + 3080 Ti) and (170 HX B + 1080 Ti + 3080 Ti) also run CUDA code successfully. But (170 HX A + 170 HX B) fails.

While doing debugging, I have found a problem with my motherboard. The machine does not boot when two 1080 Tis and one 3080 Ti are plugged in. It doesn’t boot with those three cards even if no 170 HXs are plugged in. There’s also an issue with cusolverMgDeviceSelect() on this machine. The devices have different IDs in cusolverMgDeviceSelect() versus nvidia-smi.

The motherboard is an MSI TRX40 Aorus Master. It had a firmware issue with video cards that was fixed in a previous BIOS update. Maybe there was a regression in the latest BIOS. The latest BIOS fixes a remote exploit so I don’t want to revert it.

But there’s a chance this is a problem with my motherboard. I need to rule that out. I’m going to setup another machine soon-ish and test on it. I’ll update this thread with whatever I discover.

I forgot to reply to your last paragraph. The cards I have were previously used in mining. They don’t have any fans of their own, so the moving part that typically breaks isn’t there. I haven’t had any video cards die outside that in the past. knocks on wood We attached a simple blower to the back to blow air through. The ‘low’ speed is fast enough to keep the card cooler than RTX cards while also being reasonable quiet.

I need to find the link but I read that FMA-using code will run very slowly on these cards. I guess diagonalization doesn’t use that much because these cards do that well. But YMMV with the workload.

definitely want to keep the BIOS upgrade to not allow an exploit, I seem to have read that the vBios for the CMP 170HX was perhaps card specific and had to have a script executed to register the card, not sure if this is the case in Linux - check revision of the vBIOS are on the CMPs

I did come upon this in researching IF the 170HX makes any sense for anything but crypto mining in multiple 170HX configurations and offer it if you have not seen/read it

Link: All GB/s without FLOPS - Nvidia CMP 170HX Review, Performance Lockdown Workaround, Teardown, Watercooling, and Repair

these comments at the link I provided seem to limit the full I implementation of CUDA of the 170HX, have you found a work around?

…" CUDA

To disable FP contraction, pass the option -fmad=false to nvcc .

Again, like OpenCL, it’s also possible to use FMA and MAD via multiple built-in functions. Since nvcc is proprietary, again, there’s no transparent way to disable FMA and MAD globally at the compiler level - apart from reverse engineering and modifying the Nvidia binaries, or implementing a preprocessor or debugger-like interceptor to modify the source on-the-fly.

On the other hand, just like OpenCL, it’s possible to compile CUDA via LLVM/clang and target Nvidia PTX. Thus, in principle, it should be possible to create a patched LLVM/clang for this purpose in analogous to PoCL. "…

seems like a lot of work around and unstable

Forgot link: All GB/s without FLOPS - Nvidia CMP 170HX Review, Performance Lockdown Workaround, Teardown, Watercooling, and Repair

To use CUDA were there a lot of workarounds required, or was it fairly straightforward on 170HX boards vs other GPUs?

I didn’t make any workarounds or changes to use CUDA on the 170 HX. It’s worked as-is for me so far. But the only thing I’ve used it for so far is diagonalization.

I figured out the problem. I needed to update the driver. The driver I was using was released before the 170HX even existed. Now it’s surprising that a single card could get the correct answer.

Thanks everyone for your help ^_^.

Was update on nVidea website, or on another one? - what was rev date?

It was the latest driver on NVIDIA’s website. Version 550.100.