- mounting /scratch (first)
[abc@dtn1 ~ ]$ sudo mount -t lustre csmds1.ib@o2ib:csmds2.ib@o2ib:/scratch /scratch
[sudo] password for abc:
mount.lustre: mount csmds1.ib@o2ib:csmds2.ib@o2ib:/scratch at /scratch failed: Input/output error
Is the MGS running?
Apr 4 10:22:30 dtn1 kernel: LNet: HW NUMA nodes: 2, HW CPU cores: 16, npartitions: 2
Apr 4 10:22:30 dtn1 kernel: alg: No test for adler32 (adler32-zlib)
Apr 4 10:22:30 dtn1 kernel: alg: No test for crc32 (crc32-table)
Apr 4 10:22:30 dtn1 kernel: alg: No test for crc32 (crc32-pclmul)
Apr 4 10:16:14 dtn1 kernel: Lustre: Lustre: Build Version: 2.10.7
Apr 4 10:16:14 dtn1 kernel: LNet: Added LNI 172.16.3.19@o2ib [8/256/0/180]
Apr 4 10:16:16 dtn1 kernel: LNet: 4565:0:(o2iblnd_cb.c:3192:kiblnd_check_conns()) Timed out tx for 172.16.3.219@o2ib: 4294746 seconds
Apr 4 10:16:16 dtn1 kernel: Lustre: 4575:0:(client.c:2116:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1554398174/real 1554398176] req@ffff88086df21c80 x1629904619700240/t0(0) o250->MGC172.16.3.219@o2ib@172.16.3.219@o2ib:26/25 lens 520/544 e 0 to 1 dl 1554398179 ref 1 fl Rpc:eXN/0/ffffffff rc 0/-1
Apr 4 10:16:20 dtn1 kernel: LustreError: 4534:0:(mgc_request.c:251:do_config_log_add()) MGC172.16.3.219@o2ib: failed processing log, type 1: rc = -5
Apr 4 10:16:29 dtn1 kernel: LustreError: 4604:0:(mgc_request.c:603:do_requeue()) failed processing log: -5
Apr 4 10:16:42 dtn1 kernel: LNet: 4565:0:(o2iblnd_cb.c:3192:kiblnd_check_conns()) Timed out tx for 172.16.3.220@o2ib: 4294772 seconds
Apr 4 10:16:42 dtn1 kernel: Lustre: 4575:0:(client.c:2116:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1554398199/real 1554398202] req@ffff88105dba8cc0 x1629904619700304/t0(0) o250->MGC172.16.3.219@o2ib@172.16.3.220@o2ib:26/25 lens 520/544 e 0 to 1 dl 1554398204 ref 1 fl Rpc:eXN/0/ffffffff rc 0/-1
Apr 4 10:16:51 dtn1 kernel: LustreError: 15c-8: MGC172.16.3.219@o2ib: The configuration from log ‘scratch-client’ failed (-5). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information.
Apr 4 10:16:51 dtn1 kernel: Lustre: Unmounted scratch-client
Apr 4 10:16:51 dtn1 kernel: LustreError: 4534:0:(obd_mount.c:1582:lustre_fill_super()) Unable to mount (-5)
Then …
retry mount twice …
Apr 4 10:31:03 dtn1 kernel: LustreError: 4660:0:(obd_config.c:1361:class_process_proc_param()) scratch-clilov-ffff88086e728800: unknown config parameter ‘lov.qos_threshold_rr=100’
Apr 4 10:31:03 dtn1 kernel: Lustre: Mounted scratch-client
Apr 4 10:31:23 dtn1 kernel: Lustre: 4618:0:(client.c:2116:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1554399063/real 0] req@ffff880864b0a680 x1629905458561424/t0(0) o8->scratch-OST0001-osc-ffff88086e728800@172.16.3.222@o2ib:28/4 lens 520/544 e 0 to 1 dl 1554399083 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
Apr 4 10:31:33 dtn1 kernel: LustreError: 4664:0:(llite_lib.c:1773:ll_statfs_internal()) obd_statfs fails: rc = -5
Apr 4 10:32:35 dtn1 kernel: LustreError: 4665:0:(llite_lib.c:1773:ll_statfs_internal()) obd_statfs fails: rc = -5
Apr 4 10:32:36 dtn1 kernel: LustreError: 4666:0:(llite_lib.c:1773:ll_statfs_internal()) obd_statfs fails: rc = -5
Apr 4 10:32:50 dtn1 kernel: Lustre: Unmounted scratch-client
Apr 4 10:32:56 dtn1 kernel: LustreError: 4687:0:(obd_config.c:1361:class_process_proc_param()) scratch-clilov-ffff880865264c00: unknown config parameter ‘lov.qos_threshold_rr=100’
Apr 4 10:32:56 dtn1 kernel: Lustre: Mounted scratch-client
- Mounts works as normal now
Extra info, before attempting the mount on a fresh boot:
[root@dtn1 ~]# mount -t lustre csmds1.ib@o2ib:csmds2.ib@o2ib:/scratch /scratch
mount.lustre: mount csmds1.ib@o2ib:csmds2.ib@o2ib:/scratch at /scratch failed: Input/output error
Is the MGS running?
[root@dtn1 ~]# lctl ping 172.16.3.219@o2ib
12345-0@lo
12345-172.16.3.219@o2ib
[root@dtn1 ~]# lctl ping 172.16.3.220@o2ib
failed to ping 172.16.3.220@o2ib: Input/output error
[root@dtn1 ~]# lctl ping 172.16.3.220@o2ib
12345-0@lo
12345-172.16.3.220@o2ib
FWIW, this is on a client with an EDR adapter. But we’ve also seen this behavior on a host with FDR.
Chris