I have to connect my nodes to external lustre server , every time I compile lustre clients separately on multiple nodes and when nodes reboots , all configuration goes off .
- How to have persistent kernel modules on compute nodes like lustre client
- can we add lustre client to module load xxx ?
- sync or update boot image on head node so that next time , all nodes comes up with updated modules .
[root@node001 lustre_client-cent7.9]# modinfo lustre
description: Lustre Client File System
author: OpenSFS, Inc. <http://www.lustre.org/>
vermagic: 3.10.0-1160.49.1.el7.x86_64 SMP mod_unload modversions
[root@node001 lustre_client-cent7.9]# module add lustre
ERROR: Unable to locate a modulefile for 'lustre'
After installing the lustre client packages on the the test node, you’ll need to grab the image. Please refer to “5.6 Updating Running Nodes”, “Updating A Stored Image From A Running Node” in the admin manual.
Once the changes are grabbed to the image, you can reboot the node to check that all the changes applied to the node are persistent.
Hi Adel , will look into it , thanks for sharing .
I wanted to know if we can add lustre client to → “module load xxx” ?
Kernel modules should be loaded on boot time. The ‘module load’ command is mainly used to automatically edit env variables. You’ll need to add /etc/modprobe.d/lustre.conf in the software image of the nodes to let the lnet module use a particular interface to reach the MGS server. A line similar to the following should work if you’ll be using IB interfaces:
options lnet networks=“o2ib2(ib0)”
Updated the image and then rebooted using section 5.6 , boot failed.
How to revert back ?
I mistakenly did this and ran image update .
now the option to delete is greyed out.
There is no way to revert the changes that you have grabbed from a compute node to the software image unless you have made a clone of the image before grabbing the changes.
You don’t need to add the lustre.ko in the list of kernel modules. The lustre kernel module will be added once the system attempts to mount the lustre filesystem.
Also, the error says that the NFS kernel module is missing and probably has nothing to do with grabbing the image. My guess, from the screenshot, is that you have added kernel modules to the category of the nodes which has overridden the kernel modules from the image itself. You need to clear out the kernel modules from the category and just keep the kernel modules in the image.
Cleared the category and issue seems to be fixed but provisioning is taking lot of time
Hi Adel ,
Using Auto option and formatted all the storage devices on the node , still failing with this error . This node was working fine until i decommissioned it and starting to reinstall again . Please refer the attached screenshot
I did Grab to image on running node , used a cloned image to grab the changes , made another category for the changed image and rebooted the node , I was not able to get lustre modules which i installed .
Please have a look at screenshots
The screenshots don’t say much about what may have gone wrong. If you have installed the lustre modules in the kernel shown in the screen shot and the grabimage image went well, then you should be able to find the lustre.ko under the same location. My guess is that you may have either not installed the lustre modules or you may have installed them for a different kernel. If you have installed the modules for different kernel, then you’ll need to make sure that the kernel itself is installed and the kernelversion of the software is set to the kernel version for which you have installed the lustre modules.