Small monitor program for DGX Spark

wentbackward · March 19, 2026, 9:58pm

This is turning out to be very useful to me … In case it’s useful to you:

GitHub - wentbackward/nv-monitor: A lightweight terminal system monitor built for the **NVIDIA DGX Spark** (Grace CPU + GB10 GPU). · GitHub

73k binary, uses /proc etc. for lightweight integration.

Paul

aniculescu · March 23, 2026, 1:36pm

Thanks for the link, I will move this to GB10 projects

wentbackward · March 28, 2026, 7:53pm

This little utility is turning out to be gold-dust for me. I’ve made few tweaks, can log to CSV, it correctly follows NVidia’s calculations to get the memory accurate. It shows cached vs actual memory usage and it shows a swap line if swap is enabled. Still around 74k-ish … oh, and it’s text, you can paste it to your AI assistant!

trystan1 · March 30, 2026, 2:30am

This is a very cool little program considering the packaged top in ubuntu 24.04 gets the memory stats wrong.

Something that I think could improve this would be identification on the cores to indicate which are x925 and which are x725.

ID Frequency

0 2808000
1 2808000
2 2808000
3 2808000
4 2808000

5 3900000
6 3900000
7 3900000
8 3900000
9 3900000

10 2808000
11 2808000
12 2808000
13 2808000
14 2808000

15 3900000
16 3900000
17 3900000
18 3900000
19 3900000

It looks like 5-9 and 15-19 are the larger cores @ 3.9 x925 and 0-4 and 10-14 are the smaller cores @ 2.8 x725

Edit: additionally, cores 0-9 share caches(cluster 1) and cores 10-19 share caches(cluster 2), nice info to have if pinning processes

wentbackward · March 30, 2026, 5:13am

Nice idea. /proc/cpuinfo has it, along with a ton of others.

0xd85 = Cortex-X925 (cores 5-9, 15-19) — 3900 MHz max, the performance cores
0xd87 = Cortex-X725 (cores 0-4, 10-14) — 2808 MHz max, the efficiency cores

v1.2.1 on github

Note: The governor on this spark specific kernel, permanently locks all cores at max freq (so I didn’t both putting it on the screen - maybe when we get the workstation)

wentbackward · March 30, 2026, 6:14am

File size is now 74kb. Lots of new features, full docs on github.

serapis · March 30, 2026, 8:07am

It’s a pretty cool project and something I was looking for. I allowed myself to submit a PR that adds a /metrics endpoint for Prometheus – that way you can monitor your systems remotely. The system overhead is minuscule: Add optional Prometheus metrics exporter by SeraphimSerapis · Pull Request #1 · wentbackward/nv-monitor · GitHub

wentbackward · March 30, 2026, 12:31pm

Hah! I need this too and didn’t even think about using nv-monitor to achieve it - we are better together 😊. Nice PR, merged! Nicely solves the problem of monitoring a spark cluster too! I’m adding to my grafana dashboard right now! Thank you @serapis for this excellent addition!

wentbackward · March 30, 2026, 2:29pm

Demo Load

Something that was driving me crazy, having to install benchmarking tools and all their dependencies, just to check plumbing - demo-load will run sinusoidal loads across all cores and the GPU - again small single binary, zero dependencies, quick and easy to see data from all nodes.

Some cosmetic updates for multiple architectures (Arm, x86/64, VM).

ARM / Spark

X86 (tested on my laptop with WSL2)

trystan1 · March 30, 2026, 2:44pm

Another small thing considering you brought up aesthetics.

The memory/buffer text matches the grid/bar with the green mapping to active memory and the blue mapping to cache, although the cpu/gpu history text doesn’t always match due to it scaling blue→yellow→ red with load.

I think the scaling colors makes sense per core since it can get visually confusing with 20 cores, there it works just fine.

For the single cpu/gpu history at the bottom, to me it would make sense for the color to be static matching the text and for a time scale to be shown.

Just nitpicks.

wentbackward · March 30, 2026, 7:22pm

Yeah the histogram was a bit of a quick hack. I’ll definitely be taking a look at that in the next day or two - one thing that annoys me is when all the columns blend together (when everything’s red, running at max). So yes some kind of scale ( probably t minus x, as it depends on the refresh speed ).

wentbackward · April 1, 2026, 1:34am

The new histogram is live. Thanks for the suggestion @trystan1 !!

I also added spacing between the bars and some numbers on the x axis to anchor the visuals

Paul

wentbackward · April 1, 2026, 1:36am

I’m a bit annoyed that GitHub doesn’t show you as a contributor @serapis - I’ll write something in the README as I want to thank you again for this addition.

serapis · April 1, 2026, 3:08am

Oh that’s awfully nice of you but please don’t worry. I’m just happy to see this tool out there.

dbsci · April 1, 2026, 5:35am

Yeah it’s quite nice. Good work @wentbackward. I am considering either using it as a monitoring backend with sparkrun or perhaps we package it as a service to run on sparks as a lightweight daemon (it already does that w/ prometheus metrics… thanks @serapis).

sparkrun has some rudimentary monitoring capabilities and I was considering making something light and small to gather metrics without wasting many resources… and lo and behold, someone did it!

Still sorta debating what makes sense, but I did start to play with it a bit. If it were to become a core component in sparkrun, then I guess maybe make it part of the setup wizard for sparkrun to (optionally) install a systemd service for nv-monitor?

I don’t know – basically thinking aloud, but I like the idea of a daemon that doesn’t use a lot of resources and actually gives us what we need!

wentbackward · April 1, 2026, 5:41am

I hadn’t seen Sparkrun before. I’ll have to have a play - definitely the same ethos, cut all the baggage and get the job done!!

serapis · April 1, 2026, 5:42am

I’d be happy to contribute to help improve the user experience for everyone. I love the idea of making this a well-rounded package.

dbsci · April 1, 2026, 5:44am

Like the one time I don’t include sparkrun as a link. I feel like I spam it too much and was trying to cut down!

vedcsolution · April 1, 2026, 6:14am

Very good and very useful tool, currently using it with Prometheus. Thanks to everyone.

Topic		Replies	Views
Native Monitoring Terminal App DGX Spark / GB10 Projects monitoring	1	211	May 25, 2026
A TOP monitor program specific for the DGX SPARK DGX Spark / GB10 Projects	9	2583	March 16, 2026
WebUI monitoring of the DGX Spark Profiling Linux Targets	2	63	June 10, 2026
Sparkview — GPU monitor tool with GB10-aware unified memory handling DGX Spark / GB10	38	1743	April 25, 2026
Cluster Monitoring Tool - Docker Image DGX Spark / GB10 open-source-software , monitoring , spark , dgx	3	304	May 2, 2026
Built a real-time Hardware + vLLM Metrics Dashboard DGX Spark / GB10 Projects	9	1018	April 25, 2026
NVTOP with DGX Spark unified memory support DGX Spark / GB10	38	2988	June 19, 2026
DGX Dashboard metrics DGX Spark / GB10	5	1004	October 27, 2025
[Community project] A dependency-free monitoring dashboard built for the DGX Spark DGX Spark / GB10 Projects	5	213	July 17, 2026
Dear @nvidia — nvidia-smi is broken on the DGX Spark DGX Spark / GB10	10	564	April 25, 2026

Small monitor program for DGX Spark

Demo Load

ARM / Spark

X86 (tested on my laptop with WSL2)

Related topics