Release Notes for Nvidia Bright Cluster Manager 9.2-10

Release notes for Bright 9.2-10

== General ==
=New Features=

  • Added CUDA toolkit packages for ARM64

=Improvements=

  • Updated python2-pyasn1 to 0.4.2-3.2.1
  • Updated cm-nvhpc to 23.1

=Fixed Issues=

  • Decreased the size of the rescue environment ramdisk, to allow some Dell hardware to boot in UEFI mode (e.g., Dell PowerEdge R650)
  • An issue with cm-chroot-sw-img unable to execute a shell in the software image when the user has defined a $SHELL environment variable for a shell that is not present in the software image

== CMDaemon ==
=Improvements=

  • On Ubuntu base distribution, inet=manual is now added to the configuration files for network interfaces without an IP, so that they can be brought up by the operating system
  • Allow the option to specify inet=manual for network interfaces on Ubuntu base distribution (set revision inet=manual)
  • Process the custom /cm/conf/{category,node} files also after an imageupdate of a running node so that the customizations are preserved

=Fixed Issues=

  • CMDaemon crash on Rocky 9.1 / RHEL 9.1 base distribution when a regular user logs in to the user portal or Bright View
  • An issue where overriding the kernel module parameter at the node-level does not take effect and the kernel module parameters are rather inherited from the software image
  • An issue where automatic FSExport are not added for nodes in a type3 network setup
  • An issue with monitoring data plots consisting of consolidated and raw data sources
  • Delay the start of the slurmd service until after the MIG configuration is updated when the compute node is booted
  • An issue where the Etcd health check script may fail for nodes with tagged VLAN interfaces
  • An issue where MIG operations may not be able to complete, which can prevent future MIG operations
  • An issue where MIG apply may timeout when CMDaemon is starting because the timeout is too short
  • An issue with the automatic switch of the monitoring node when the passive head node goes down for a prolonged period of time
  • In some cases, CMDaemon crash in the sysinfo implementation when the CMDaemon service is stopping
  • A CMDaemon memory leak when the Slurm placeholders maxnodes value is less than nodes in the queue
  • An issue where the bond primary=name directive is not written for the underlying physical network interface on Ubuntu

== Bright View ==
=Fixed Issues=

  • An issue with the reinstall and sync Bright View actions for software images
  • An issue with assigning the slurmclient role directly to a compute node in order to override the slurmclient role at the overlay or the category level
  • An issue where Bright View may show “No response from the server yet because client quit” upon logout

== Cluster Tools ==
=Improvements=

  • Automatically detect environmental proxies in cm-diagnose

=Fixed Issues=

  • An issue where the cm-restore-db-password script may not reset the database user password in mysql for Slurm

== Machine Learning ==
=Fixed Issues=

  • Updated cm-tensorflow2-* to 2.11.0
  • An issue where importing the tensorflow python module with tensorflow2-py39-cuda11.2-gcc9 yields a RequestsDependencyWarning message

== cm-kubernetes-setup ==
=New Features=

  • Added Network Operator to the cm-kubernetes-setup script

=Improvements=

  • Wait explicitly for the Ingress Controller to be up & running to avoid potential issues later when deploying other operators

=Fixed Issues=

  • Clean up the nginxreverseproxy and nginx.conf configurations when Kubernetes is uninstalled

== cm-scale ==
=Fixed Issues=

  • An issue where cm-scale tries to match the Kubernetes pods or jobs labels to the node’s labels. This is now disabled by default

== cmburn ==
=Improvements=

  • Added cm-gpu-burn package for Rocky9/RHEL9 base distributions

== licensing ==
=Fixed Issues=

  • Remove the license expiration warning on the secondary head node after installing a new license

== openpbs20 ==
=Fixed Issues=

  • An issue with updating the pbspro/openpbs hooks when a new pbspro/openpbs package is installed

== openpbs22.05 ==
=Fixed Issues=

  • An issue with updating the pbspro/openpbs hooks when a new pbspro/openpbs package is installed

== pbspro2021 ==
=Fixed Issues=

  • An issue with updating the pbspro/openpbs hooks when a new pbspro/openpbs package is installed

== pbspro2022 ==
=Fixed Issues=

  • An issue with updating the pbspro/openpbs hooks when a new pbspro/openpbs package is installed

== slurm ==
=New Features=

  • Rebuild Slurm with CUDA 11.8

== slurm22.05 ==
=Improvements=

  • Updated Slurm to 22.05.8