Contribution: sprofile tool for CPU, RAM and GPU reporting of slurm jobs

I’d like to share a tool which collects actually consumed resources from a slurm job.

For CPU and RAM it reads cgroup accounting data.
For GPU utilization it collects accounting data from the driver via the nvml library.

The output looks like this:

-- sprofile report (node27) --
  Time:       0:00:25  /  1:00:00
  CPU load:       0.9  /   2.0
  RAM peak:        3G  /    8G
  GPU load:       0.9  /   1.0
  GPU peak mem:    3G  /   32G
  GPU energy:     0.0kWh

It is akin to DGCM but more suited to collect data at the job-level granularity. Furthermore, it can be installed without administrator rights by a user which is more convenient.
I hope it can help cluster users adjust their resource reservations better.