ConnMan network manager
 help / color / mirror / Atom feed
From: Grant Erickson <gerickson@nuovations.com>
To: connman@lists.linux.dev
Subject: Request for Comment: Implementing Non-link-level Failure Monitoring / Failover
Date: Mon, 16 Oct 2023 08:46:42 -0700	[thread overview]
Message-ID: <1B47C370-F3D5-4F60-9753-46D39CAB6728@nuovations.com> (raw)

Connman Community:

I spent most of Friday doing integration testing between connmand and my application infrastructure inflicting various communications failures and observing how the event chain responds.

Perhaps as no surprise to those here, connmand appears to only be responsive to local link failures. That is, with Ethernet, only a first hop link loss or cable pull at the device under test (DUT) seems to cause a failover to Wi-Fi. Likewise, with Wi-Fi, only a complete loss of association to the access point seems to cause a failover to Cellular.

So, for example, removing an Ethernet link several hops away from the DUT or from the Wi-Fi access point, inducing DNS failures, etc. do not seem to cause default service failovers. However, I can clearly see the WiSPR liveness probes failing with 4xx status—so the right signals seem to be present to induce failover.

If the community is supportive, I would propose some upstream changes—the precise details TBD—that:

    1. After some configurable number of WiSPR liveness probes failing on the currently “online” link, connmand declares a service no longer “online”.
        a. “online” to “ready” does not “flap” as it does now for the default service.
        b. Once it is “online”, it stays “online” until those configurable number of WiSPR liveness probes fail.
    2. In response to the configurable number of liveness problems, moves the default route to the next available service (for example, Wi-Fi or Cellular from Ethernet or Wi-Fi, respectively).
        a. For the “no longer ‘online’” service, connmand would then go into TBD (Fibonacci?) back off for WiSPR probes and would promote the service back to default only when some configurable number of WiSPR liveness probes succeed.

Thoughts?

Best,

Grant

-- 
Principal
Nuovations

gerickson@nuovations.com
http://www.nuovations.com/


             reply	other threads:[~2023-10-16 15:46 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-16 15:46 Grant Erickson [this message]
2023-10-17 15:55 ` Request for Comment: Implementing Non-link-level Failure Monitoring / Failover Grant Erickson
2023-11-08 14:47   ` Marcel Holtmann
2023-11-17 19:12     ` Grant Erickson
2023-11-23 15:22       ` Marcel Holtmann
2023-11-24 16:45         ` Grant Erickson
2023-10-19 17:26 ` Grant Erickson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1B47C370-F3D5-4F60-9753-46D39CAB6728@nuovations.com \
    --to=gerickson@nuovations.com \
    --cc=connman@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).