CMP products also lacking mem_temp display. NIDIA-Microsoft relations are more realistic reason IMHO…
Interesting. Anyway for the benefit of @nadeemm since there seems to be some confusion about what is being requested:
This is from Windows. We’d like to be able to read the Memory Junction temperature of GDDR6X in linux as we can in Windows. Not memory case temp.
We are all waiting for it but as they don’t provide the firmware update tool for linux either.
It seems they just don’t care for linux users.
Well, actually they do.
nvflash. For some reason it’s not distributed by Nvidia themselves (neither is the Windows version), but it is an Nvidia product. It’s hosted by TechPowerUp here. It’s in the AUR even.
NVIDIA Firmware Update Utility (Version 5.660.0) Copyright (C) 1993-2020, NVIDIA Corporation. All rights reserved. -- Primary Commands -- Update VBIOS firmware: nvflash [options] <filename> Save VBIOS firmware to file: nvflash [options] --save <filename> Display firmware bytes: nvflash [options] --display [bytes] Change the start address: nvflash [options] --offset [start] Display firmware bytes in ASCII: nvflash [options] --string Check for supported EEPROM: nvflash [options] --check Display VBIOS version: nvflash [options] --version [<filename>] List adapters: nvflash [options] --list Compare adapter firmware: nvflash [options] --compare <filename> Verify adapter firmware: nvflash [options] --verify <filename> Verify adapter IFR firmware: nvflash [options] --verify --ifronly <filename> Display GPU ECID/PDI: nvflash [options] --ecid Display License information: nvflash [options] --licinfo <filename> Generate a License Request File: nvflash [options] --licreq <filename>,<reqType> Provide a HULK license file: nvflash [options] --license <filename> List out all the PCI devices: nvflash [options] --lspci Access PCI Configure register: nvflash [options] --setpci Display tool building information:nvflash [options] --buildinfo Display GMAC MCU version: nvflash [options] --querygmac Update GMAC MCU firmware: nvflash [options] --proggmac <filename>.rom Save GMAC MCU firmware to file: nvflash [options] --savegmac <filename>.rom List GMAC MCUs: nvflash [options] --listgmac Write protect EEPROM: nvflash [options] --protecton Remove write protect: nvflash [options] --protectoff Press 'Enter' to continue, or 'Q' to quit.
NVIDIA Firmware Update Utility (Version 5.660.0) Copyright (C) 1993-2020, NVIDIA Corporation. All rights reserved.
Now I came to this thread because I also am furious about the lack of memory temperature readings especially since I have a 3090 and that’s a serious issue on those cards, but still they do actually have a native Linux firmware flashing utility, and it’s exactly what I used to flashed the Resizable Bar VBIOS update from EVGA onto my 3090. The link you posted was for a ReBAR-specific firmware updating tool (no idea why they needed that), but the regular firmware flashing tool (which works for updating to Resizable BAR support) has a native Linux version.
I got my 3090 @ Micro Center on launch day so obviously it didn’t have ReBAR support, and so this tool was how I got it:
As far as the original topic goes, I know there are sensors for the memory thermals, so there’s literally no way it’s not possible to expose that through the driver to the user.
Honestly NVAPI not existing for Linux is already ridiculously stupid (especially since NVAPI stuff like DLSS actually work in Wine/Proton). But what’s more stupid is that (almost) all of Nvidia’s hardware monitoring and control is tied to
libxnvctrl, the NV-CONTROL X extension, which means
nvidia-settings and anything else that uses NV-CONTROL (like GWE and any community fan control or overclocking utility/application) don’t work in Wayland. This is outrageous, and they need to move to a sysfs-based approach like AMD, which has nothing to do with which display-server you’re running (or if you’re running one at all).
Although I think most of the main points have been covered, I would like to add in my voice to request this to become a reality. I would really like to be able to have memory temperature information on Linux. It’s been almost a year since this (relatively) simple feature has been requested, and in my opinion this would create a huge QoL improvement for many users. Please add it in as soon as possible.
This thread is now officially comical. First @wpierce you say a developer is on it and it’s a priority then 4 or 5 months later nadeemm comes in and says it won’t be provided, either mistakenly or deliberately conflating Tcase for Tjunction and then spamming the thread with “please open another thread” and “what’s your use case” requests. To what end??? Disperse the concentration of ire maybe?? omg so much laughing-too-hard my kidney hurts.
I’ve found my alternative so I shouldn’t really care much but I will give you some benefit of doubt and expend the effort here to let you know this bullshit costs you DOLLARS.
We run a decent (midsized) mining operation and have a fair number of 10xx and 20xx cards. We snapped up some 3060 and 3070 cards shortly after launch (march/april of '20?) and they ran fine. We waited on add’l power to the facility so we did not aggressively pursue new cards however we did clear out some old 10xx cards and I did gain a few 3080’s in the meantime (mix of FE, Strix, Aorus and FTW3 cards - about 15 in total). I noticed quickly the variability in clocking mems and the Internet was kind enough to provide answers - mem temp throttles. We op almost exclusively on HiveOS but ofc keep some Windows around, including a test bench or two that I also use for baselining and determining OCs on new model cards.
CONCLUSIVE - the laggards hitting mem temps in the 110s to 121C ranges. I also registered by clamp meter on the wall cord an increase in amp draw (not sure how/if that relates but it def means I need more headroom in my power supplies for these guzzlers.
Bottom line was 6 of 15 3080s couldn’t achieve expected hashrates. Repadded myself. 2 of 6 improved but still getting uncomfortably close to 110 C mem temps in windows on all but unacceptably conservative clocks (two others had to be pulled lower than stock clocks and power limited hard, incl one 3080 FE, just to stop from crashing) Repadded AGAIN using some unobtanium padding, then some ultra-fine-future-nanoparticle-promise-you-the-world stuff with an equally ultra price tag. Finally got some acceptable (sort of) results but not impressive. Not impressive because why do I gotta be my own home-grown mechanic on like $15k of new gear?? And who tf is gonna do this x however many we buy when we get our elec service upgraded?? Yeah, I’m looking forward individually benching 100% of and then cracking open 30-40% of new GPUs before they even start working towards ROI. Because I can’t even reliably field-monitor them. Drink that green Kool-Aid, Oh Yeah!!
So, to make a longer story sort a bit shorter when our add’l 600A service came online we stocked up on some Team Red. Nicer distribution (our usual connect only did Green so we made a new connect. I guess because they are the “underdog” here they try harder. Like Avis in the 90’s) And ofc a LOT less headaches with the hardware.
40x 6800XT, 18x 6900XT and a smattering of 6600s. This is first round. Around 68k iirc. That’s $68k of NOT nVidia cards and $68k closer that AMD is to eating your lunch.
I’m now looking forward to getting hands on some of the new Intel Arc cards. Hear they should be very power conscious and be very competitive on alt algos and won’t have to deal with any of the LHR bullshit on ETHASH either.
I’m hearing from my client that the next round (Feb-ish) should be about another $60k spend and they expect to have about $100k to spend by June/July if The Winter arrives.
Was a learning curve getting OCs right on the Red 6xxx series but it’s done. I’ll happily buy truckloads of more Red if you don’t
B) AT LEAST get some Tj reporting into the Linux kernal/driver/api/whatever the problem is
A) get your QC in line and fix the g-damned thermal padding issues.
Yes, your cards do better on the alt coins than the Reds, which is WHY we buy green in the first place.
But if your QC is asleep at the wheel and your driver development is lying drunk on the floor then fk it. I’d rather stable and earning 24x7x365 Red cards than twitchy, finnicky Green momma’s boys that need to be coddled all the time or just plain underperform. I’m not allowing my clients to pay a premium for crap that only performs as good as the Red equivalent because it has to be bottle fed AND can’t be monitored to know when the next tantrum is coming. I can pay less and be treated better for the same output.
And I’m not buying 3070’s either because how tf do I know if/when THEY start misbehaving? Oopsie. Guess you should have thought that this level of laziness (dgaf??) actually costs real sales.
So? How’s that @nadeem? is close to $200k in sales enough “ammo” or should I open another thread? 🙄
Honestly, you all should be ashamed to share this drivel in public and even more ashamed to attach your real names to it. But it IS some of the funniest shit I’ve read on the internet in a while… 🤣🤣🤣🤣🤣😂😂🙃🙃🙃🙃🤣🤣🤣🤣🤣🤣🤣🤣
And if you DO get your collective heads out of your assess and fix it, I’ll consider start recommending Green again. I consult and manage multiple farms. I am not tied to one client and I am not married to any “team”. I go where the hearth is warm, the wine flows and the steak is cooked pink.
Although maybe Intel will come eat everybody’s lunch. Now THAT would be fun to watch, lolol.
Good luck with this.
You’re not the target audience for these cards, and you’re making the planet a worse place for billions of other people.
What you are doing is immoral from every standpoint.
If nVidia or their distributorships were being honest with you they would differ greatly.
At the scale of this farm (mid-smallish), CMP cards are more difficult to attain than 3090’s to the average gamer or DL/ML researcher. You think we’re running around emptying Micro Center shelves or flogging Best Buy on launch day? Who do you think is providing these cards? Do you think nVidia has no power to stop it if they wanted to?
You’ve been played my friend but not by me or the mining industry.
Consider: If you are the squirrel in this world than we are just about the avg Koala or maybe Dingo.
Now ask who are the real 800lb gorillas fisting the money here?
Quite judgmental for someone so massively ill informed.
Both about myself (whom you know nothing about) and the quite nascent mining industry (which you seem to know about as little of.)
It’s an overplayed narrative so you could be forgiven but for your lack of an open mind and rush to judge instead of dialogue.
Within the space there is tremendous reliance and also investment in renewables.
China “chasing away” it’s mining farms did the planet a relatively huge favor as they were most of the dirtiest parts of the mining industry (very much reliance on coal fired power. Substantially less in the US where a large section of those ops ended up and most of them seeking cheap renewable energy, for example TX). Miners tend to invest in cheap energy (either by building or investing in local utility/state solar farms and similar.)
When we build we design towards largely overprovisioning solar capacity (panels are cheap relative to battery capacity) and ensure we run purely on solar during the day and over provision so that we can also feed the grid (it helps them with daytime surge usage when utils need help the most) Good designs imho generally provide more to the grid during the day than will be used from the grid overnight. Essentially carbon neutral at worst and carbon negative most of the time.
So how’s YOUR PERSONAL carbon footprint compare Mr. Concerned? Is YOUR life carbon negative?
Something about stones and glass houses…
Again, quite judgmental. Also improperly placed anger mostly from being ill informed.
I hope that self righteous arrogance (and ignorance) of yours doesn’t work it’s way into any of the models you are teaching (though how could it not) and hopefully those models won’t affect anyone’s personal well-being. Certainly enough has been said about the prejudices present in ML and AI algorithms negatively affecting some people’s lives.
Perhaps I should judge YOU immoral for your involvement in the industry without knowing you any more than you know me. Turnabout after all IS fair play…
And I’ll also ignore the foolishness of you not realizing we are seeking the same outcomes here so technically I’m your ally in this. I don’t spend enough to have nVidia deliver containers to me off the ship but I do spend more than you. Which might be enough to get the nvid devs in this thread taken seriously by there higher ups and get us both the feature we want.
Or you could just maintain your holier-than-thou attitude and keep pounding sand till you turn blue in the face. Honey or vinegar. Choose wisely.
Whataboutism at its finest, and a false equivalency.
Oh - I must have missed that the long-game goal of cryptocurrency mining was sustainable investment in renewable energy sources and a reduction of global industrial environmental impact, rather than benefiting early adopters by legitimising a novel investment vehicle. You learn something new every day.
I came here, to a SOFTWARE DEVELOPERS FORUM, to add a voice for a feature I require. You decided to try to make some political statement out of it by attacking me directly. Good luck with your life. Not that you deserve any.
“Never wrestle with pigs. You both get dirty and the pig likes it.”
I agree, there would be fewer issues if “technical” people considered the ethical and societal impacts of their actions and work rather than maintaining false or fleeting comfort in their “ignorant bliss”.
Just popping in to say, as somebody who managed to purchase a single 2070S card a couple of years ago when GPU prices were “normal” (at least almost) for a while, mostly to run games and other 3D graphics apps, but also mine and sometimes run other CUDA stuff on the side, I’d appreciate it a lot if the Linux driver package some day achieved complete feature parity with the Windows equivalent (sans Direct3D obviously, and GFEx I never installed even on Windows).
It’s seriously annoying how it’s not even possible to properly undervolt the GPU on Linux, it’s sucking tens of watts more power while mining Ethereum than on Windows for no good reason. Actually, I’ve resorted to just running Windows (despite hating what it has become) on the machine for now ONLY because of this. It’s a real shame, games and the apps I use generally work great on Linux these days (and the CUDA stuff often targets Linux/Unix ONLY), well enough for me anyway.
Personally I couldn’t care less the driver and libraries are closed source as long as generally Everything Just Works and (in many cases better than on AMD too ;) at least as long as they don’t become actively user-hostile software like Windows is these days, and even on my non-Nvidia iGPU only laptop I’m running Xorg instead of Wayland ATM as I’m typing this (xfce/xfwm4 doesn’t support Wayland anyway), but seriously, please don’t skimp on features like these even though I realize they may not get used on professional CAD workstations, datacenters with lots of cards running GPU compute stuff, etc. which are probably scenarios where most of the Unix driver use happens instead of hobby desktop computing/gaming and very small scale mining.
Would hate to be forced to go look at AMD or Intel for my next card in a few years.
I have 3090 ASUS OEM model, When gpu temperature goes beyond 60+ degree, hash rate going down from 117 to upto 100. Is it due to memory junction temperature increase? During this time thermal throttling will happen? or it is safer that hash rate going down during temp increases, means gpu is in safer side? I’m new to mining, Please advise.
Due to linux, im not able to see memory junc temp. Nvidia, Save My GPU in Linux environment. Provide fix to see my memory junc temp.
There´s a lot linux user and we need this tool, we do not know why nvidia have this tools for us…
This pisses me off so much. Nvidia’s lab of support, when it 100% seems possible too.
Apparently nvtool is already capable of reading memtemps for certain professional level cards. hiveos added support a few months back:
- Added display temperature of memory for Nvidia GPUs equipped with HBM/HBM2 memory e.g. A100, CMP 170HX, etc
nvtoolto v1.57 (added memory temperature reporting using option
--memtempfor GPUs with HBM/HBM2 memory; added option
--throttleto show throttle reason which also reported by
nvidia-infotool, so you can look all info using it)
Please Nvidia, add display temperature of memory for RTX 3000 series in linux.
We don’t want them burnt.
If it wasn’t deep learning I wouldn’t have bought nVidia. nVidia are evil as I’ve started to realise within just 3 days of buying 3090. RAM is melting.
So it’s already 3+ months we are waiting for NVidia reply!
Do they even care?
Really need to know the mem temps while running linux. Doesn’t seem like that big of a request… Might have to think about switching to AMD
Almost a full year since the issue was raised and so far no visable action taken to address this fault… kind of a shame