Release notes for Bright 9.1-15
== General ==
=Improvements=
- Add CUDA 11.8 packages
- Add mlnx-ofed57 packages for the Mellanox 5.7 OFED stack
- Ubuntu 20.04: update to 20.04.5.
- Upgrade pyxis to 0.14.0
- Update mlnx-ofed54 to version 5.4-3.5.8.0
=Fixed Issues=
- Automatically start slurmdbd when Slurm configuration is frozen in cmd.conf
- An issue where the power script execution environment does not include the CMD_NODE_INSTALLER_PATH variable, which prevents custom power scripts (such as ilo_power.pl) from performing power operations
= Deprecated features=
- OpenShift integration
== CMDaemon ==
=Improvements=
- Exclude all /snap/.* mount points from the “procmounts” sampler, which otherwise creates unnecessary metrics in CMDaemon
- Copy the file cluster.csr.new on all headnodes during install-license
- Increase the default for Kubernetes kubelet’s --max-pods from 50 to 110 for new installations
=Fixed Issues=
- An issue with removing job queues when using the JobQueue remove pythoncm call
- An issue with updating the Slurm configuration when the secondary head node is the active head node
- An issue with the json whoami API call returning a username instead of a profile
- An issue with removing OSDs from a Ceph cluster if the corresponding OSD nodes are down
- An issue where the version config file timestamps (versionconfigfiles=yes) are always set to the Unix epoch (1970)
- An issue where a cloud director power off may hang for up to a minute if the node is already off
- An issue with merging CMDaemon monitoring execution multiplexers into one, which results in only the last multiplexer taken into account
== Bright View ==
=Fixed Issues=
- An issue where the main menu is not shown for logged-in users with a read only profile
== Head Node Installer ==
=Fixed Issues=
- An issue with head node installations with Lmod where the DefaultModules.lua module file is not created by default, resulting in messages about empty LMOD_SYSTEM_DEFAULT_MODULES environment variable
== Machine Learning ==
=New Features=
- Updated cm-cub-* packages to v1.17.2
- Deprecated ML package cm-chainer-py39-cuda11.2-gcc9
- Introduced ML package cm-cutensor-cuda11.7
- Introduced ML package cm-ml-distdeps-cuda11.7
- Introduced ML package cm-nccl2-cuda11.7-gcc9
- Introduced ML package cm-cudnn8.5-cuda11.7
=Improvements=
- Update cm-openmpi4-* -cuda-* packages to v4.1.4
== cm-clone-install ==
=New Features=
- Do not include loop device mounts (if present) when generating the disk setup XML for the head node for cloning when using cm-clone-install
== cm-scale ==
=Fixed Issues=
- An issue with starting nodes for multi-node jobs requesting more memory per node than the available memory divided by the number of requested nodes
== cmsh ==
=Improvements=
- Add --update-containers support to the cmsh device foreach command
== pbspro2022 ==
=Improvements=
- Add support for PBS Pro 2022
== slurm21.08 ==
=Fixed Issues=
- Incorrect path to the failedprejob and allprejob directories, causing the prolog-prejob script to fail
== slurm22.05 ==
=Improvements=
- Upgrade Slurm to 22.05.6