Release Notes for Nvidia Bright Cluster Manager 9.1-14

== General ==
= New Features

  • Support for SLES15sp4

= Improvements

  • Update lua to 5.4.4 (CVE-2022-28805)
  • Update openssl to version 1.1.1q
  • Update cuda-dcgm to version 2.4.6.1
  • Update mlnx-ofed49 to version 4.9-5.1.0.0
  • Updated mlnx-ofed56 to version 5.6-2.0.9.0
  • Added cuda11.7 packages

== CMDaemon ==
= New Features

  • Added CMDaemon advanced configuration options for customizing global nginx.conf values

= Improvements

  • Remove the user home directories asynchronously, which resolves an issue with the delete process appearing hung when the user’s home directory contains many files
  • Performance improvements in CMDaemon to decrease the start-up time of the head node CMDaemon on clusters with many pending WLM jobs
  • Reduce the verbosity of the ‘result for obsolete tracker’ messages, so that they are no longer included by default in the CMDaemon log file
  • CMDaemon will now produce a warning when a new image is added without an underlying directory structure
  • Improved logic when invalidating the nscd hosts cache on the compute nodes, to avoid cases where an outdated cache interferes with hostnames lookup
  • CMDaemon certificates are now generated with a start date of 1 calendar day before the issue date, instead of the Unix epoch
  • An issue where cm-manipulate-advanced-config.py is missing a python import, resulting in a crash when executed
  • Ensure that the Kubernetes NetworkPolicy feature works when the kube-proxy masqueradeAll flag is disabled
  • An issue where monitoring data for labeled entities is not preserved after the entity has been dropped and automatically re-added

= Fixed Issues

  • In some cases, an issue with re-establishing the edge nodes sessions to the director after a restart of CMDaemon
  • An issue with CMDaemon events delivery to edge nodes, which can result in outdated information about committed entities
  • Fix an issue with setting up Kubernetes where if the passive head node is the active leader according to Etcd, then Kubernetes will not always be able to initialize properly
  • In rare occasions CMDaemon can hang while stopping due to a blocking SSL read operation
  • An issue where PBS queue options set in CMDaemon may not be set in the PBS server configuration
  • Send the rsyslog log to both head nodes for on-prem compute nodes
  • Fix an issue with generating a valid Kubernetes kubeconfig for users with special characters in their login name. Performance improvements of the user manager
  • Fix rare crash in CMDaemon while cloning an image
  • An issue where CMDaemon may not (re)generate the Slurm logrotate files in some cases, such as when the files are modified or deleted outside of CMDaemon
  • Fix possible crash in the provisioning status code
  • Fix a potential buffer overflow
  • An issue where CMDaemon can crash if the Bright View monitoring tree call does not pass a context
  • Add full support for multi-value http request parameters, to resolve an issue where the “CMDaemon ready” service is not able to handle a list of services by name
  • In some cases, terminating spot instances with CMDaemon may fail if the spot request has been canceled outside of CMDaemon
  • In some cases, an issue with upgrading from Bright 8.x to 9.1 due to an invalid SQL statement
  • An issue where CMDaemon may occasionally hang on SSL_read while stopping
  • An issue where the default gateway may not be set on a cluster with an aliased external network interface

== Bright View ==
= Fixed Issues

  • An issue where SLURM/GRES file templates are not read-only in Bright View and can be modified

== Node Installer ==
= Improvements

  • Allow the node-installer to continue with configuring IPMI after encountering a failure to set username and password when the user already exists

= Fixed Issues

  • An issue in the ilo_power.pl script, which can break the remote power management for nodes that use an ilo0 interface for the power control

== User Portal ==
= Fixed Issues

  • An issue where the user portal can show only 1 core for a compute node, regardless of the actual number of cores

== cm-create-image ==
= Fixed Issues

  • An issue where images created with cm-create-image do not preserve the xattrs of the base tar image
  • An issue where node-installer images created using the cm-create-image tool do not have an updated rsyslog.conf file

== cm-kubernetes ==
= Improvements

  • Make the PersistentVolumeClaims privilege part of the default list of privileges for the users when Kubernetes is set up

== cm-kubernetes-setup ==
= Improvements

  • Enable by default the selection of newer Kubernetes versions in the cm-kubernetes-setup screens
  • An issue with Kubernetes on Edge deployments, where the stage “waiting for Root Service Account” is performed too early and may not complete successfully in some cases
  • Fix an issue where Kubernetes version >= 1.21 is not deployed with masqueradeAll=false for kube-proxy, preventing NetworkPolicies from working
  • In the Kubernetes module files, remove the MANPATH definitions which are no longer used
  • In some cases, an issue with setting the Kubernetes labels for control-plane, master, and worker

= Fixed Issues

  • Ensure cm-kubernetes-setup --default-cni-bin-dir flag updates all relevant roles

== cm-setup ==
= Fixed Issues

  • Make cm-*-setup configuration file permissions more restrictive

== cm-uge ==
= Improvements

  • Update the default settings in cm-uge to allow running OpenMPI jobs without involving ssh

== cm-wlm-setup ==
= Improvements

  • Automatically remove the WLM settings from the Auto Scaler configuration when the WLM is disabled

= Fixed Issues

  • An issue with making the pbs.service file available on the compute nodes with offloaded PBSPro server role, which prevents the PBSPro server from starting during the setup

== cmsh ==
= New Features

  • Allow the --start and --end arguments in rangequery command to be specified as date/time stamps

= Improvements

  • Warn that using rshell to image as non-root user might not work (As there is no ldap inside the image)
  • Added --filter option to the environment command, to allow for showing only the entities that match the specified regex

= Fixed Issues

  • An issue where the --user option of the cmsh “rshell” command does not take effect
  • An issue where the monitoringdrop command does not drop the measurables when multiple measurables are specified on the command line
  • An issue where the cmsh device switchport update may not save on commit
  • An issue where the XSD validation is not always loaded in cmsh when configuring a disk setup for the compute nodes

== hwloc ==
= Improvements

  • Update cm-hwloc2 to 2.7.1

== ml ==
= New Features

  • Introduced packages cm-cudnn8.2-cuda11.4
  • Introduced packages cm-cudnn8.4-cuda11.4

== openpbs22.05 ==
= Improvements

  • Add OpenPBS 22.05 integration

== pythoncm ==
= Improvements

  • pythoncm now includes periodic checks during the provisioning wait, to ensure that tools such as cm-wlm-setup do not time out while the nodes are being provisioned

== slurm ==
= Improvements

  • Add slurmrestd service file to the Slurm packages

= Fixed Issues

  • Rebuild the Ubuntu Slurm packages with cm-pmix3

== slurm22.05 ==
= Improvements

  • Update to 22.05.3
  • Add Slurm 22.05 support