Release Notes for Nvidia Bright Cluster Manager 9.2-8

Release notes for Bright 9.2-8

== General ==

  • Integration with Run:ai

=Fixed Issues=

  • mlnx-ofed57: An issue where the mst service does not start when using mlnx-ofed57 due to the mst sysvinit service file being packaged as a systemd unit file

== CMDaemon ==
=New Features=

  • Add a cm-package-release-info tool which can determine the Bright 9.X-Y version of installed packages


  • Introduce a new CPUUsage metric for compute and head nodes to show the percentage of the CPU usage
  • Exclude the ram and loop devices from /sys/block/ from being sampled by the SysBlockStat monitoring data producer
  • Allow for special OID for PDU load to be specified via the revision property

=Fixed Issues=

  • Disable the cgroup job metrics collection for users (user IDs) that cannot be found on the nodes
  • An issue where the monitoringdrop command may drop the data only for the head node
  • An issue with creating the LSF configuration when some node is converted from a compute to a submit-only host
  • An issue where cloud node that have never been booted may have a status “unknown error”
  • Automatically start slurmdbd when Slurm configuration is frozen in cmd.conf
  • An issue where the lite daemon may not reconnect when the websocket is closed cleanly on the server side

== Head Node Installer ==
=Fixed Issues=

  • An issue with head node installations with Lmod where the DefaultModules.lua module file is not created by default, resulting in messages about empty LMOD_SYSTEM_DEFAULT_MODULES environment variable

== Machine Learning ==
=New Features=

  • Introduced ML package cm-onnx-pytorch- * -cuda11.7- *
  • Introduced ML package cm-gpytorch-*-cuda11.7- *
  • Introduced ML package cm-fastai2- * -cuda11.7-*
  • Introduced ML package cm-pytorch-extra- * -cuda11.7-*
  • Introduced ML package cm-xgboost- * -cuda11.7-
  • Introduced ML package cm-pytorch-cuda11.7
  • Introduced ML package cm-cub-cuda11.7
  • Introduced ML package cm-tensorflow2- * -cuda11.7-*
  • Introduced ML package cm-opencv4- * -cuda11.7-*
  • Introduced ML package cm-ml-pythondeps- * -cuda11.7-*
  • Updated cm-gcc9-* packages to v9.5.0
  • Updated cm-pytorch-* packages to v1.13.0
  • Updated cm-tensorflow2-* packages to v2.10.0
  • Updated cm-gpytorch-* packages to v1.9.0
  • Updated cm-fastai2-* packages to v2.7.0
  • Updated cm-xgboost-* packages to v1.6.2
  • Deprecated ML packages for CUDA 11.2 and introduced new variants for CUDA 11.7


  • Deprecated cm-openmpi4-cuda11.2-ofed47-gcc9 and cm-openmpi4-cuda11.2-ofed51-gcc9 packages

== cm-wlm-setup ==
=New Features=

  • Allow to use the “master” keyword in the cm-wlm-setup configuration as a placeholder for the real head node host name