This is turning out to be very useful to me … In case it’s useful to you:
73k binary, uses /proc etc. for lightweight integration.
Paul
This is turning out to be very useful to me … In case it’s useful to you:
73k binary, uses /proc etc. for lightweight integration.
Paul
Thanks for the link, I will move this to GB10 projects
This little utility is turning out to be gold-dust for me. I’ve made few tweaks, can log to CSV, it correctly follows NVidia’s calculations to get the memory accurate. It shows cached vs actual memory usage and it shows a swap line if swap is enabled. Still around 74k-ish … oh, and it’s text, you can paste it to your AI assistant!
This is a very cool little program considering the packaged top in ubuntu 24.04 gets the memory stats wrong.
Something that I think could improve this would be identification on the cores to indicate which are x925 and which are x725.
ID Frequency
0 2808000
1 2808000
2 2808000
3 2808000
4 2808000
5 3900000
6 3900000
7 3900000
8 3900000
9 3900000
10 2808000
11 2808000
12 2808000
13 2808000
14 2808000
15 3900000
16 3900000
17 3900000
18 3900000
19 3900000
It looks like 5-9 and 15-19 are the larger cores @ 3.9 x925 and 0-4 and 10-14 are the smaller cores @ 2.8 x725
Edit: additionally, cores 0-9 share caches(cluster 1) and cores 10-19 share caches(cluster 2), nice info to have if pinning processes
Nice idea. /proc/cpuinfo has it, along with a ton of others.
v1.2.1 on github
Note: The governor on this spark specific kernel, permanently locks all cores at max freq (so I didn’t both putting it on the screen - maybe when we get the workstation)
It’s a pretty cool project and something I was looking for. I allowed myself to submit a PR that adds a /metrics endpoint for Prometheus – that way you can monitor your systems remotely. The system overhead is minuscule: Add optional Prometheus metrics exporter by SeraphimSerapis · Pull Request #1 · wentbackward/nv-monitor · GitHub
Hah! I need this too and didn’t even think about using nv-monitor to achieve it - we are better together 😊. Nice PR, merged! Nicely solves the problem of monitoring a spark cluster too! I’m adding to my grafana dashboard right now! Thank you @serapis for this excellent addition!
Something that was driving me crazy, having to install benchmarking tools and all their dependencies, just to check plumbing - demo-load will run sinusoidal loads across all cores and the GPU - again small single binary, zero dependencies, quick and easy to see data from all nodes.
Some cosmetic updates for multiple architectures (Arm, x86/64, VM).
Another small thing considering you brought up aesthetics.
The memory/buffer text matches the grid/bar with the green mapping to active memory and the blue mapping to cache, although the cpu/gpu history text doesn’t always match due to it scaling blue→yellow→ red with load.
I think the scaling colors makes sense per core since it can get visually confusing with 20 cores, there it works just fine.
For the single cpu/gpu history at the bottom, to me it would make sense for the color to be static matching the text and for a time scale to be shown.
Just nitpicks.
Yeah the histogram was a bit of a quick hack. I’ll definitely be taking a look at that in the next day or two - one thing that annoys me is when all the columns blend together (when everything’s red, running at max). So yes some kind of scale ( probably t minus x, as it depends on the refresh speed ).
The new histogram is live. Thanks for the suggestion @trystan1 !!
I also added spacing between the bars and some numbers on the x axis to anchor the visuals
Paul
I’m a bit annoyed that GitHub doesn’t show you as a contributor @serapis - I’ll write something in the README as I want to thank you again for this addition.
Oh that’s awfully nice of you but please don’t worry. I’m just happy to see this tool out there.
Yeah it’s quite nice. Good work @wentbackward. I am considering either using it as a monitoring backend with sparkrun or perhaps we package it as a service to run on sparks as a lightweight daemon (it already does that w/ prometheus metrics… thanks @serapis).
sparkrun has some rudimentary monitoring capabilities and I was considering making something light and small to gather metrics without wasting many resources… and lo and behold, someone did it!
Still sorta debating what makes sense, but I did start to play with it a bit. If it were to become a core component in sparkrun, then I guess maybe make it part of the setup wizard for sparkrun to (optionally) install a systemd service for nv-monitor?
I don’t know – basically thinking aloud, but I like the idea of a daemon that doesn’t use a lot of resources and actually gives us what we need!
I hadn’t seen Sparkrun before. I’ll have to have a play - definitely the same ethos, cut all the baggage and get the job done!!
I’d be happy to contribute to help improve the user experience for everyone. I love the idea of making this a well-rounded package.
Like the one time I don’t include sparkrun as a link. I feel like I spam it too much and was trying to cut down!
Very good and very useful tool, currently using it with Prometheus. Thanks to everyone.