I am seeing a behavior with tunefs.lustre. After changing the failover node and trying to mount an OST, getting getting the following error: The target service's index is already in use. (/dev/sdd) After the above error, and performing --writeconf once, I can repeat these steps (see below) any number of times and any OSS without --writeconf. This is an effort to mount an OST to a new OSS. I reproduced this issue after simplifying some steps and reproducing the behavior (see below) consistently. I was wondering if anyone could help me to understand this? [root@OSS-2 opc]# lctl list_nids 10.99.101.18@tcp1 [root@OSS-2 opc]# [root@OSS-2 opc]# mkfs.lustre --reformat --ost --fsname="testfs" --index="64" --mgsnode "10.99.101.6@tcp1" --mgsnode "10.99.101.7@tcp1" --servicenode "10.99.101.18@tcp1" "/dev/sdd" Permanent disk data: Target: testfs:OST0040 Index: 64 Lustre FS: testfs Mount type: ldiskfs Flags: 0x1062 (OST first_time update no_primnode ) Persistent mount opts: ,errors=remount-ro Parameters: mgsnode=10.99.101.6@tcp1:10.99.101.7@tcp1 failover.node=10.99.101.18@tcp1 device size = 51200MB formatting backing filesystem ldiskfs on /dev/sdd target name testfs:OST0040 kilobytes 52428800 options -J size=1024 -I 512 -i 69905 -q -O extents,uninit_bg,mmp,dir_nlink,quota,project,huge_file,^fast_commit,flex_bg -G 256 -E resize="4290772992",lazy_journal_init="0",lazy_itable_init="0" -F mkfs_cmd = mke2fs -j -b 4096 -L testfs:OST0040 -J size=1024 -I 512 -i 69905 -q -O extents,uninit_bg,mmp,dir_nlink,quota,project,huge_file,^fast_commit,flex_bg -G 256 -E resize="4290772992",lazy_journal_init="0",lazy_itable_init="0" -F /dev/sdd 52428800k Writing CONFIGS/mountdata [root@OSS-2 opc]# tunefs.lustre --dryrun /dev/sdd checking for existing Lustre data: found Read previous values: Target: testfs-OST0040 Index: 64 Lustre FS: testfs Mount type: ldiskfs Flags: 0x1062 (OST first_time update no_primnode ) Persistent mount opts: ,errors=remount-ro Parameters: mgsnode=10.99.101.6@tcp1:10.99.101.7@tcp1 failover.node=10.99.101.18@tcp1 Permanent disk data: Target: testfs:OST0040 Index: 64 Lustre FS: testfs Mount type: ldiskfs Flags: 0x1062 (OST first_time update no_primnode ) Persistent mount opts: ,errors=remount-ro Parameters: mgsnode=10.99.101.6@tcp1:10.99.101.7@tcp1 failover.node=10.99.101.18@tcp1 exiting before disk write. [root@OSS-2 opc]# [root@OSS-2 opc]# tunefs.lustre --erase-param failover.node --servicenode 10.99.101.18@tcp1 /dev/sdd checking for existing Lustre data: found Read previous values: Target: testfs-OST0040 Index: 64 Lustre FS: testfs Mount type: ldiskfs Flags: 0x1062 (OST first_time update no_primnode ) Persistent mount opts: ,errors=remount-ro Parameters: mgsnode=10.99.101.6@tcp1:10.99.101.7@tcp1 failover.node=10.99.101.18@tcp1 Permanent disk data: Target: testfs:OST0040 Index: 64 Lustre FS: testfs Mount type: ldiskfs Flags: 0x1062 (OST first_time update no_primnode ) Persistent mount opts: ,errors=remount-ro Parameters: mgsnode=10.99.101.6@tcp1:10.99.101.7@tcp1 failover.node=10.99.101.18@tcp1 Writing CONFIGS/mountdata [root@OSS-2 opc]# mkdir /testfs-OST0040 [root@OSS-2 opc]# mount -t lustre /dev/sdd /testfs-OST0040 mount.lustre: increased '/sys/devices/platform/host5/session3/target5:0:0/5:0:0:1/block/sdd/queue/max_sectors_kb' from 1024 to 16384 [root@OSS-2 opc]# [root@OSS-2 opc]# tunefs.lustre --dryrun /dev/sdd checking for existing Lustre data: found Read previous values: Target: testfs-OST0040 Index: 64 Lustre FS: testfs Mount type: ldiskfs Flags: 0x1002 (OST no_primnode ) Persistent mount opts: ,errors=remount-ro Parameters: mgsnode=10.99.101.6@tcp1:10.99.101.7@tcp1 failover.node=10.99.101.18@tcp1 Permanent disk data: Target: testfs-OST0040 Index: 64 Lustre FS: testfs Mount type: ldiskfs Flags: 0x1002 (OST no_primnode ) Persistent mount opts: ,errors=remount-ro Parameters: mgsnode=10.99.101.6@tcp1:10.99.101.7@tcp1 failover.node=10.99.101.18@tcp1 exiting before disk write. [root@OSS-2 opc]# Going over to OSS-3 and trying to mount OST. At this stage OSS-2 is completely powered off. [root@OSS-3 opc]# lctl list_nids 10.99.101.19@tcp1 [root@OSS-3 opc]# Parameters looks same as OSS-2 [root@OSS-3 opc]# tunefs.lustre --dryrun /dev/sdd checking for existing Lustre data: found Read previous values: Target: testfs-OST0040 Index: 64 Lustre FS: testfs Mount type: ldiskfs Flags: 0x1002 (OST no_primnode ) Persistent mount opts: ,errors=remount-ro Parameters: mgsnode=10.99.101.6@tcp1:10.99.101.7@tcp1 failover.node=10.99.101.18@tcp1 Permanent disk data: Target: testfs-OST0040 Index: 64 Lustre FS: testfs Mount type: ldiskfs Flags: 0x1002 (OST no_primnode ) Persistent mount opts: ,errors=remount-ro Parameters: mgsnode=10.99.101.6@tcp1:10.99.101.7@tcp1 failover.node=10.99.101.18@tcp1 exiting before disk write. [root@OSS-3 opc]# Changing failover node to current node. [root@OSS-3 opc]# tunefs.lustre --erase-param failover.node --servicenode 10.99.101.19@tcp1 /dev/sdd checking for existing Lustre data: found Read previous values: Target: testfs-OST0040 Index: 64 Lustre FS: testfs Mount type: ldiskfs Flags: 0x1002 (OST no_primnode ) Persistent mount opts: ,errors=remount-ro Parameters: mgsnode=10.99.101.6@tcp1:10.99.101.7@tcp1 failover.node=10.99.101.18@tcp1 Permanent disk data: Target: testfs-OST0040 Index: 64 Lustre FS: testfs Mount type: ldiskfs Flags: 0x1042 (OST update no_primnode ) Persistent mount opts: ,errors=remount-ro Parameters: mgsnode=10.99.101.6@tcp1:10.99.101.7@tcp1 failover.node=10.99.101.19@tcp1 After it completes the write, for some reason this OST is being marked as 'first_time' flag 0x1062 in next command. [root@OSS-3 opc]# tunefs.lustre --dryrun /dev/sdd checking for existing Lustre data: found Read previous values: Target: testfs-OST0040 Index: 64 Lustre FS: testfs Mount type: ldiskfs Flags: 0x1062 (OST first_time update no_primnode ) Persistent mount opts: ,errors=remount-ro Parameters: mgsnode=10.99.101.6@tcp1:10.99.101.7@tcp1 failover.node=10.99.101.19@tcp1 Permanent disk data: Target: testfs:OST0040 Index: 64 Lustre FS: testfs Mount type: ldiskfs Flags: 0x1062 (OST first_time update no_primnode ) Persistent mount opts: ,errors=remount-ro Parameters: mgsnode=10.99.101.6@tcp1:10.99.101.7@tcp1 failover.node=10.99.101.19@tcp1 exiting before disk write. [root@OSS-3 opc]# Mount doesn't work here because it is marked as first time and this OST is not first time as it was already mounted using OST-2 OSS, and MGS knows about it. [root@OSS-3 opc]# mkdir /testfs-OST0040 [root@OSS-3 opc]# mount -t lustre /dev/sdd /testfs-OST0040 mount.lustre: mount /dev/sdd at /testfs-OST0040 failed: Address already in use The target service's index is already in use. (/dev/sdd) [root@OSS-3 opc]# From here, if I do tunefs.lustre with --writeconf, it works. Once this is done, repeating the above experiment any number of times on any servers works fine as expected without using --writeconf. (FYI Note: --writeconfig is mentioned as a dangerous command)