A TOP monitor program specific for the DGX SPARK

Hey all I have put together a TOP like program to monitor your Disk, and network IO performance, as well as your CPU and GPU load and temperature real time values, in a nice SSH supported TUI,

Check it out at GitHub - GigCoder-ai/dgxtop: An Enhanced TOP program to monitor your Nvidia DGX SPARK's Hardware

This was written in Python, with GLM-4.5 Air 235b running in a Ray Cluster across 2 sparks.

Please provide feedback, and let me know what you think!

Max

7 Likes

Wow! this is super nice!!!

Can it display the status of two sparks on one top instance? That’s probably a hard edge case since it has to communicate via ethernet…

Nice work @maxvamp !

May I suggest to get the information from procfs and sysfs instead of relying on other apps, or use nvidia-nvml-dev Python wheel. See Package Index

Like the GPU stats, with the default update interval nvidia-smi is being run ever second:

 cmd = [
         "nvidia-smi",
          f"--query-gpu={self.QUERY_FIELDS}",
          "--format=csv,noheader,nounits",
]

Details at dgxtop/dgxtop/gpu_monitor.py at b443dc63d2beeda075ef8b47325673665d06958c · GigCoder-ai/dgxtop · GitHub

1 Like

It be cool if it would show processes and gpu usage like in nvidia-smi

(edit: I did a fork and quick implementation of this idea GitHub - sonusflow/dgxtop: An Enhanced TOP program to monitor your Nvidia DGX SPARK's Hardware )