Release Notes for Nvidia Bright Cluster Manager 9.2-5

== General ==
= New Features

  • Support for SLES15sp4

= Improvements

  • cuda-dcgm: updated to version 2.4.6.1
  • mlnx-ofed49: updated to version 4.9-5.1.0.0.
  • cuda 11.7: added cuda11.7 packages

== CMDaemon ==
= New Features

  • Added CMDaemon advanced configuration options for customizing global nginx.conf values

= Improvements

  • Remove the user home directories asynchronously, which resolves an issue with the delete process appearing hung when the user’s home directory contains many files
  • Reduce verbosity of the result for obsolete tracker messages, so that they are no longer included by default in the CMDaemon log file
  • CMDaemon will now produce a warning when a new image is added without an underlying directory structure

= Fixed Issues

  • An issue where the “GPU settings duplicates” check can give false positives and doesn’t allow commits
  • In some cases, an issue with re-establishing the edge nodes sessions to director after a restart of CMDaemon
  • An issue with CMDaemon events delivery to edge nodes, which can result in an outdated information about committed entities
  • In rare occasions CMDaemon can hang while stopping due to a blocking SSL read operation
  • An issue where PBS queue options set in CMDaemon may not be set in the PBS server configuration
  • An issue with the WLM job monitoring on compute nodes with multiple WLM client roles, which can prevent the monitoring data sampling on these nodes
  • Fixed rare crash in CMDaemon while cloning an image
  • An issue with listing the software packages in Ubuntu images
  • Fix an issue with an extra white space at the end of the CMD_NODE_INSTALLER_PATH environment variable passed to the ilo_power.pl script, which affects the power operations against nodes with iLO BMCs
  • Fix possible crash in the provisioning status code
  • Fix possible deadlock in the Prometheus manager
  • Fix a potential buffer overflow
  • An issue with the removal of cgroups of completed pyxis Slurm jobs, where error messages are recorded in the CMDaemon log file when “enroot remove” places processes in the cgroup that CMDaemon is trying to remove

== Bright View ==
= Fixed Issues

  • An issue with showing the list of available WLM job queues in the PBS Pro Client role
  • An issue with adding Slurm WLM job queues to Slurm roles, which results in JSON errors when attempting to update the settings
  • An issue with updating the BMC settings with Bright View, which results in an error message “Operation Failed! No EntityDefinition”
  • An issue with committing partition changes in Bright View, which results in an error message ‘No EntityDefinition for entity type “undefined”’
  • An issue with managing users with Bright View, where attempting to delete a user results in an error message “Unable to make service call”

== Head Node Installer ==
= Fixed Issues

  • An issue where selecting a different timezone during the Bright head node installation prevents CMDaemon from initializing and starting during the first boot of the head node

== User Portal ==
= Fixed Issues

  • An issue where the user portal can show only 1 core for a compute node, regardless of the actual number of cores

== cm-image ==
= Fixed Issues

  • An issue with creating Ubuntu2004 images on RHEL7/Centos7 head nodes, which results in a crash of the cm-image utility when copying the CM repo files

== cm-kubernetes-setup ==
= Improvements

  • An issue with Kubernetes on Edge deployments, where the stage “waiting for Root Service Account” is performed too early and may not complete successfully in some cases
  • An issue where CNI may not be deployed on the secondary head node of an HA cluster
  • In the Kubernetes module files, remove the MANPATH definitions which are no longer used
  • In some cases, an issue with setting the Kubernetes labels for control-plane, master, and worker

= Fixed Issues

  • Add the NVIDIA GPU Operator generated toolkit files to the rsync excluded lists to prevent unnecessary recreation of the toolkit files

== cm-wlm-setup ==
= Fixed Issues

  • An issue with making the pbs.service file available on the compute nodes with offloaded PBSPro server role, which prevents the PBSPro server from starting during the setup

== cmsh ==
= Improvements

  • Added --filter option to the environment command, to allow for showing only the entities that match the specified regex

= Fixed Issues

  • An issue where the monitoringdrop command does not drop the measurables when multiple measurables are specified on the command line
  • An issue where the cmsh device switchport update may not save on commit

== hwloc ==
= Improvements
-Update cm-hwloc2 to 2.7.1

== pythoncm ==
= Improvements

  • Fix a typo in create-ramdisk-task.py example, which prevents it from running

== slurm22.05 ==
= Improvements

  • Update to 22.05.3
  • Add Slurm 22.05 support