Release Notes for Nvidia Bright Cluster Manager 9.2-6

== General ==
= New Features=

  • Add support for weave networking in Kubernetes
  • Add support for installing Kubernetes 1.24

= Improvements =

  • Ubuntu 20.04: update to 20.04.5.
  • Update coredns to 1.9.4 for Kubernetes 1.24
  • Update lua to 5.4.4 (CVE-2022-28805)
  • mlnx-ofed57: added mlnx-ofed57 packages for installing the Mellanox 5.7 OFED stack.

= Known issues =

  • Upgrading the CM packages on SLES12 can result in a conflict between the cm-dhcp and the base distro dhcp packages. It is safe to answer “yes” to replace the conflicting dhcp files with files from cm-dhcp

== CMDaemon ==
=Improvements=

  • exclude all /snap/.* mount points from the “procmounts” sampler, which otherwise creates unnecessary metrics in CMDaemon
  • Copy the file cluster.csr.new to all head nodes during install-license
  • Increase the default for Kubernetes kubelet’s --max-pods from 50 to 110 for new installations
  • Allow the administrator to configure the proxy-mode in the Kubernetes kube-proxy role via the revision parameter

= Fixed Issues=

  • An issue with the JSON whoami API call returning a username instead of a profile
  • An issue with removing OSDs from a Ceph cluster if the corresponding OSD nodes are down
  • An issue where the version config file timestamps (versionconfigfiles=yes) are always set to the Unix epoch (1970)
  • An issue with setting cmjob and cm-scale constrains for LSF which overwrites other already specified job requirements
  • In some cases, the passive head node may be listed as DOWN due to a race condition between watching for new HPC jobs and CMDaemon loading all nodes information
  • An issue where a cloud director power off may hang for up to a minute if the node is already off
  • An issue with merging CMDaemon monitoring execution multiplexers into one, which results in only the last multiplexer being taken into account
  • An issue where CMDaemon may not (re)generate the Slurm logrotate files in some cases, such as when the files are modified or deleted outside of CMDaemon

== Bright View ==
= Fixed Issues =

  • An issue with showing the correct queue name of WLM jobs
  • An issue with removing Slurm generic resources (GRES) with Bright View
  • An issue with updating Slurm queue parameters in Bright View

== Machine Learning ==
= New Features =

  • Deprecated ML package cm-chainer-py39-cuda11.2-gcc9
  • Introduced ML package cm-cutensor-cuda11.7
  • Introduced ML package cm-ml-distdeps-cuda11.7
  • Introduced ML package cm-nccl2-cuda11.7-gcc9
  • Introduced ML package cm-cudnn8.5-cuda11.7

== cm-kubernetes-setup ==
= Improvements =

  • Ensure kernel modules required for istio are loaded on the hosts when setting up Kubernetes

= Fixed Issues =

  • An issue with conflicting containerd / containerd.io packages on Ubuntu when installing nvidia-docker2
  • Crash when uninstalling Kubernetes if a user with role bindings exists outside of Bright LDAP

== cm-scale ==
= Fixed Issues =

  • An issue with starting cloned nodes when using the default resources configuration

== cm-setup ==
=Fixed Issues=

  • Make cm-*-setup configuration file permissions more restrictive

== cm-wlm-setup ==
=Fixed Issues=

  • An issue with setting up pyxis/enroot in non-default/additional software images

== cmha-setup ==
=Fixed Issues=

  • An issue with generating the disk layout XML file for the primary head node when cloning it to the secondary if there is a commented-out entry for swap in the primary head node fstab file

== cmsh ==
=New Features=

  • Allow the --start and --end arguments in a rangequery command to be specified as date/time stamps

= Improvements=

  • Print a warning if devices with duplicate IPs are being added or cloned

=Fixed Issues=

  • An issue where the “–user” option of the cmsh “rshell” command does not take effect

== pbspro2022 ==
=Improvements=

  • Add support for PBS Pro 2022