initramfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Cathy Zhou <cathy.zhou-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
To: initramfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Cc: Si-Wei <si-wei.liu-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>,
	Vasanth Vemula
	<vasanth.vemula-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>,
	Yun Zhou <cathy.zhou-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
Subject: iscsi netboot failure in case of multiple interfaces
Date: Wed, 8 Nov 2017 10:55:45 -0800	[thread overview]
Message-ID: <3d6d6a40-692f-6c7f-fadd-ef6ccaab510d@oracle.com> (raw)

Hi,

First I am sorry for the length of the email. Hopefully my description 
of the problem is clear. Any questions/suggestions are welcome. Please 
reply all as some of us on not on the alias.

-----

We are running into a failure with iscsi netboot with the following boot 
options:

       "... rd.luks=0 rd.lvm=0 rd.md=0 rd.dm=0 ip=dhcp 
netroot=iscsi:169.254.0.2::::iqn.2015-02.oracle.boot:uefi 
iscsi_param=node.session.timeo.replacement_timeout=6000"

On the system, there are two interfaces (say eth0 and eth1) which are 
able to get dhcp offers successfully, but only one of them (eth0) is 
able to reach the specified iscsi target. After some debugging, we 
believe we've found the root cause. Here is what happened:

1. eth0 successfully got the dhcp offer, the iscsiroot script was run 
and eventually ran "system-run" to start the "oneshot" iscsistart service.
2. Before step 1 succeeds, the iscsiroot script was also run for eth1 
and it checks the status of the first iscsistart service instance, which 
was still "activating". So the iscsistart service was restarted and that 
killed the first instance. But the second instance also fails because of 
the "existing session" error.

Here are the questions we have:

a. We found in the iscsiroot script, the iscsid service was started 
before the iscsistart service. Because of this, the creation of the mgmt 
ipc socket by iscsistart failed, and iscsi login session request was 
handled by iscsid instead. In step 2 above, the first iscsistart 
instance was killed but not the iscsid daemon, hence the "existing 
error" as the first login session still existed in iscsid.

The questions is if the iscsid service is really required by the 
iscisroot script? Because the existence of iscsid, after the second 
iscsistart instance, we saw unexpected iscsid service unavailability 
because iscsid is stopped by the iscsistart instance (iscsistart.c calls 
stop_event_loop() to stop event loop in order to exits itself, but since 
the MGMT_IPC_IMMEDIATE_STOP request was handled by iscsid, iscsid's 
event loop was stopped and iscsid exited instead).

The related error messages we saw in the log:

    "iscsistart[836]: iscsistart: Can not bind IPC socket
     iscsistart[836]: iscsistart: Could not setup mgmt ipc
     ...
     iscsiadm[970]: iscsiadm: can not connect to iSCSI daemon (111)!
     iscsiadm[970]: iscsiadm: initiator reported error (20 - could not 
connect to iscsid)
     ..."

I tried to change the iscsiroot.sh script to stop the iscsid service 
instead of restarting iscsid service, it seems it fixed our problem. But 
I am not sure if this fix has any side-effect.

b. If stopping iscsid service is not the ideal fix, I am wondering if we 
can mimic the legacy way (prior systemd) and start the iscsistart 
service independently for each interface. This means, iscsistart service 
will be started for each interface without affecting the iscsistart 
instances which are already run for other interfaces. It may mean the 
service name needs to include $netif to uniquely identify each instance.

Thanks very much! Looking forward to your suggestions!

- Cathy

             reply	other threads:[~2017-11-08 18:55 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-11-08 18:55 Cathy Zhou [this message]
     [not found] ` <3d6d6a40-692f-6c7f-fadd-ef6ccaab510d-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2017-11-13 18:43   ` iscsi netboot failure in case of multiple interfaces Cathy Zhou
     [not found]     ` <98407a21-3e1d-6355-1ba5-f732afb674d7-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2017-11-15 13:12       ` Martin Wilck
     [not found]         ` <1510751540.17501.55.camel-l3A5Bk7waGM@public.gmane.org>
2017-11-15 19:17           ` Cathy Zhou
     [not found]             ` <d15c3567-509e-7bcd-62e9-6930b203753b-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2017-11-15 19:47               ` si-wei liu
     [not found]                 ` <08997d6f-0b94-af50-d407-0c4803a70697-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2017-11-16 13:03                   ` Martin Wilck
     [not found]                     ` <1510837385.10823.6.camel-l3A5Bk7waGM@public.gmane.org>
2017-11-17 18:34                       ` Cathy Zhou
     [not found]                         ` <0a3f48bc-d110-e163-22a5-795c2d554297-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2017-11-17 19:11                           ` Martin Wilck
     [not found]                             ` <1510945872.16284.37.camel-l3A5Bk7waGM@public.gmane.org>
2017-11-18  0:09                               ` si-wei liu
     [not found]                                 ` <70c4311d-ef4b-ee59-712d-930e173e289e-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2017-11-18  0:17                                   ` Martin Wilck
     [not found]                                     ` <1510964277.7555.4.camel-l3A5Bk7waGM@public.gmane.org>
2017-11-18  0:42                                       ` si-wei liu
     [not found]                                         ` <c8e88ebe-4a91-738a-d1c4-067e9f0c4df6-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2017-11-21 21:34                                           ` Martin Wilck
     [not found]                                             ` <1511300063.25459.148.camel-l3A5Bk7waGM@public.gmane.org>
2017-11-21 21:58                                               ` si-wei liu
     [not found]                                                 ` <90870cf5-a5a0-6a2f-4568-4998b69d4aa3-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2017-11-21 22:03                                                   ` Martin Wilck

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3d6d6a40-692f-6c7f-fadd-ef6ccaab510d@oracle.com \
    --to=cathy.zhou-qhclzuegtsvqt0dzr+alfa@public.gmane.org \
    --cc=initramfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=si-wei.liu-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org \
    --cc=vasanth.vemula-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).