From: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com>
To: martin.petersen@oracle.com, michael.christie@oracle.com,
mlombard@redhat.com
Cc: target-devel@vger.kernel.org, Paul Dagnelie <pcd@delphix.com>
Subject: Hang in iscsit_access_np() related to tpg->np_login_sem
Date: Fri, 3 Feb 2023 09:16:15 -0800 [thread overview]
Message-ID: <CAMbhmBwe7KU8sHPLRgjGOrKPt44HMytaTbavBeFk1+uVvGzVmQ@mail.gmail.com> (raw)
Hi folks,
I'd like to inquire some info related to the following patch:
https://www.spinics.net/lists/target-devel/msg18875.html
We've been hitting a similar issue in the production environments
of our customers leading to the same symptoms. We get constant
"iSCSI Login timeout on Network Portal 0.0.0.0:3260" messages
because the iSCSI Target login thread will wait on the np_login_sem
semaphore until it gets interrupted by the timer timeout. Here is our
stack trace of the thread waiting:
0xffff8bdf62f2ac80 INTERRUPTIBLE 1
__schedule+0x2c1
schedule+0x33
schedule_timeout+0x205
__down_interruptible+0xbb
down_interruptible+0x4b
iscsit_access_np+0x5a
iscsi_target_locate_portal+0x429
__iscsi_target_login_thread+0x332
iscsi_target_login_thread+0x6f3
kthread+0x120
ret_from_fork+0x1f
During that time there is no other login or login-related thread which
leads us to believe that another thread probably got the semaphore
but never actually released it.
Looking through the login code it seems like there are two functions that
are expected to call up() on that semaphore by calling iscsit_deaccess_np():
A] __iscsi_target_login_thread(): This is the same thread that acquired
the semaphore (by calling iscsit_access_np()).
B] iscsi_target_do_login_rx(): This is a delayed worker thread spawned
by the thread in [A]
Looking at both of those codepaths it seems like there is one case for each
path that we never call iscsit_deaccess_np() to release the semaphore.
For [A] that is if iscsi_target_start_negotiation() returns 0 towards the
end of that function.
For [B] that is if iscsi_target_do_login() returns 0 AND
iscsi_target_sk_check_and_clear(conn, LOGIN_FLAGS_WRITE_ACTIVE)
returns 0.
Since we have no expertise in this part of the kernel I wanted to ask you
all, are the two above scenarios expected to not release the semaphore
on purpose or is any of them a bug? If they are not bugs, where is the
semaphore expected to be released?
Any explanation or insight will be very appreciated.
Regards,
Serapheim
next reply other threads:[~2023-02-03 17:16 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-02-03 17:16 Serapheim Dimitropoulos [this message]
2023-02-07 12:30 ` Hang in iscsit_access_np() related to tpg->np_login_sem Maurizio Lombardi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAMbhmBwe7KU8sHPLRgjGOrKPt44HMytaTbavBeFk1+uVvGzVmQ@mail.gmail.com \
--to=serapheim.dimitro@delphix.com \
--cc=martin.petersen@oracle.com \
--cc=michael.christie@oracle.com \
--cc=mlombard@redhat.com \
--cc=pcd@delphix.com \
--cc=target-devel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).