== General ==
== New Features
- Added mlnx-ofed23.07 package
- Added cm-pmix4 package
== Improvements ==
- Added drainstatus to cm-diagnose
- Updated cuda-driver package to 535.104.12
- Updated cm-libprometheus package to 0.47.0
- Updated cm-openssl package to 3.1.3
== CMDaemon ==
== New Features ==
- Added advanced config flag DisableRemoteShell to disable all remote shell RPC
- Added events for Cumulus service management operations
== Improvements ==
- Added cmsh clone device option to increment IP addresses by values other than 1
- Allow lite node IP to be set during cmsh device add
- Display an error when setting an invalid software image in cmsh
- Update /etc/resolv.conf via netconfig on SLES15 instead of writing file
- Created the ability to add model/serial number information to new switches (ZTP)
- Kill active ramdisk create process when software image is removed
== Fixed Issues ==
- Fixed provisioning trigger when an image name starts with the name of another image
- Allow cm-cmd-ports --get to work without an active cmd
- Prevent “Reboot required: Interfaces have been modified” event from being shown for a node if the node has a VLAN interface on a Bridge interface that includes a bond interface
- Fixed cm-burn unsuccessful completion in the absence of both a pre and post section
- Image updates on provisioning nodes now wait for provisioning operations on other nodes to complete before proceeding.
- Allow appending or skipping adding a Slurm drain reason when healthcheck fails with drain action enabled
- Fixed crash of pythoncm parallel node termination function
- Fixed an edge case that causes hostlist generation failures when there are 3 numeric fields in the hostname
- Fixed service management for cm-lite-daemon
== cm-scale ==
== Fixed Issues ==
- Allow to start terminated cloud nodes whose state is one of the node installer ones
- Terminate useless AWS spot instance requests
- Fixed the termination of cloud nodes when multiple clone operations are issued in parallel
- Fixed the startup of nodes by cm-scale if Slurm job predicted start time is set by Slurm in the future
- Fixed handling of job arrays with range from 1 to >1 figure number
== Cloud ==
== New Features ==
- Added support for AWS FSx on Ubuntu for cmjob
== Improvements ==
- Improved error message when starting a cloud node with incorrect VPC/subnet configuration
== Fixed Issues ==
- Fixed issue with cm-cloud-storage-setup when using us-east-1 region
- Prevent cloud instance termination when cloud director is down from being listed as UP+terminated
- Fixed starting spot instances after a no-capacity in availability zone scenario occurs
- Unfulfilled spot instance requests stay in PENDING state until fulfilled or terminated
- Store availability zones for networks created by COD or manually, which enables AutoScaler to distribute loads between availability zones in COD deployments
== Kubernetes ==
== New Features ==
- Added support for NGC token authentication in cm-kubernetes-setup
== Improvements ==
- Improved the wizard when it should fail earlier than it actually does (incorrect return code checks caused the installer to confusingly fail at later stages)
- Kubernetes wizard errors will now show more context information where possible
- Increased timeouts for kubeadm init and clusterctl init operations to effectively handle slow connections
== Fixed Issues ==
- Add user wizard will use BCM user name and not commonName
== Workload Management ==
== New Features ==
- Added enroot and enroot+caps packages
== Fixed Issues ==
- Update AWS spot instances state in Slurm when they are terminated outside of BCM
== Container Engines ==
== Improvements ==
- Improved internal IP detection logic for etcd (similarly to internal IP detection for Kubernetes Calico and Flannel)
== Monitoring ==
== New Features ==
- Added Prometheus /rules and /alert and /alertmanagers end points
- Added operstate metrics (operational state i.e., UP / DOWN ) via cm-lite-daemon for Cumulus switches
== Improvements ==
- Display K/M/G in cmsh for consolidated averages when no unit is set for a metric
== Fixed Issues ==
- Added support to run healthcheck with storcli software next to megacli software
== Cluster on Demand ==
== Improvements ==
- Improved the display of the EULA when running from docker image
- Allow CMDaemon to work with cluster-on-demand cluster spanning multiple regions (requires manual setup)
== Base View==
== Improvements ==
- Provide notifications in Base View if BCM package updates are available
- Visualize licensed GPU used and available in Base View