* [RFC 0/2] fuse: introduce fuse server recovery mechanism @ 2024-05-24 6:40 Jingbo Xu 2024-05-24 6:40 ` [RFC 1/2] fuse: introduce recovery mechanism for fuse server Jingbo Xu ` (3 more replies) 0 siblings, 4 replies; 15+ messages in thread From: Jingbo Xu @ 2024-05-24 6:40 UTC (permalink / raw To: miklos, linux-fsdevel; +Cc: linux-kernel, winters.zc Background ========== The fd of '/dev/fuse' serves as a message transmission channel between FUSE filesystem (kernel space) and fuse server (user space). Once the fd gets closed (intentionally or unintentionally), the FUSE filesystem gets aborted, and any attempt of filesystem access gets -ECONNABORTED error until the FUSE filesystem finally umounted. It is one of the requisites in production environment to provide uninterruptible filesystem service. The most straightforward way, and maybe the most widely used way, is that make another dedicated user daemon (similar to systemd fdstore) keep the device fd open. When the fuse daemon recovers from a crash, it can retrieve the device fd from the fdstore daemon through socket takeover (Unix domain socket) method [1] or pidfd_getfd() syscall [2]. In this way, as long as the fdstore daemon doesn't exit, the FUSE filesystem won't get aborted once the fuse daemon crashes, though the filesystem service may hang there for a while when the fuse daemon gets restarted and has not been completely recovered yet. This picture indeed works and has been deployed in our internal production environment until the following issues are encountered: 1. The fdstore daemon may be killed by mistake, in which case the FUSE filesystem gets aborted and irrecoverable. 2. In scenarios of containerized deployment, the fuse daemon is deployed in a container POD, and a dedicated fdstore daemon needs to be deployed for each fuse daemon. The fdstore daemon could consume a amount of resources (e.g. memory footprint), which is not conducive to the dense container deployment. 3. Each fuse daemon implementation needs to implement its own fdstore daemon. If we implement the fuse recovery mechanism on the kernel side, all fuse daemon implementations could reuse this mechanism. What we do ========== Basic Recovery Mechanism ------------------------ We introduce a recovery mechanism for fuse server on the kernel side. To do this: 1. Introduce a new "tag=" mount option, with which users could identify a fuse connection with a unique name. 2. Introduce a new FUSE_DEV_IOC_ATTACH ioctl, with which the fuse server could reconnect to the fuse connection corresponding to the given tag. 3. Introduce a new FUSE_HAS_RECOVERY init flag. The fuse server should advertise this feature if it supports server recovery. With the above recovery mechanism, the whole time sequence is like: - At the initial mount, the fuse filesystem is mounted with "tag=" option - The fuse server advertises FUSE_HAS_RECOVERY flag when replying FUSE_INIT - When the fuse server crashes and the (/dev/fuse) device fd is closed, the fuse connection won't be aborted. - The requests submitted after the server crash will keep staying in the iqueue; the processes submitting the requests will hang there - The fuse server gets restarted and recovers the previous state before crash (including the negotiation results of the last FUSE_INIT) - The fuse server opens /dev/fuse and gets a new device fd, and then runs FUSE_DEV_IOC_ATTACH ioctl on the new device fd to retrieve the fuse connection with the tag previously used to mount the fuse filesystem - The fuse server issues a FUSE_NOTIFY_RESEND notification to request the kernel to resend those inflight requests that have been sent to the fuse server before the server crash but not been replied yet - The fuse server starts to process requests normally (those queued in iqueue and those resent by FUSE_NOTIFY_RESEND) In summary, the requests submitted after the server crash will stay in the iqueue and get serviced once the fuse server recovers from the crash and retrieve the previous fuse connection. As for the inflight requests that have been sent to the fuse server before the server crash but not been replied yet, the fuse server could request the kernel to resend those inflight requests through FUSE_NOTIFY_RESEND notification type. Security Enhancement --------------------- Besides, we offer a uid-based security enhancement for the fuse server recovery mechanism. Otherwise any malicious attacker could kill the fuse server and take the filesystem service over with the recovery mechanism. To implement this, we introduce a new "rescue_uid=" mount option specifying the expected uid of the legal process running the fuse server. Then only the process with the matching uid is permissible to retrieve the fuse connection with the server recovery mechanism. Limitation ========== 1. The current mechanism won't resend a new FUSE_INIT request to fuse server and start a new negotiation when the fuse server attempts to re-attach to the fuse connection through FUSE_DEV_IOC_ATTACH ioctl. Thus the fuse server needs to recover the previous state before crash (including the negotiation results of the last FUSE_INIT) by itself. PS. Thus I had to do hacking tricks on libfuse passthrough_ll daemon when testing the recovery feature. 2. With the current recovery mechanism, the fuse filesystem won't get aborted when the fuse server crashes. A following umount will get hung there. The call stack shows the hang task is waiting for FUSE_GETATTR on the mntpoint: [<0>] request_wait_answer+0xe1/0x200 [<0>] fuse_simple_request+0x18e/0x2a0 [<0>] fuse_do_getattr+0xc9/0x180 [<0>] vfs_statx+0x92/0x170 [<0>] vfs_fstatat+0x7c/0xb0 [<0>] __do_sys_newstat+0x1d/0x40 [<0>] do_syscall_64+0x60/0x170 [<0>] entry_SYSCALL_64_after_hwframe+0x76/0x7e It's not fixed yet in this RFC version. 3. I don't know if a kernel based recovery mechanism is welcome on the community side. Any comment is welcome. Thanks! [1] https://copyconstruct.medium.com/file-descriptor-transfer-over-unix-domain-sockets-dcbbf5b3b6ec [2] https://copyconstruct.medium.com/seamless-file-descriptor-transfer-between-processes-with-pidfd-and-pidfd-getfd-816afcd19ed4 Jingbo Xu (2): fuse: introduce recovery mechanism for fuse server fuse: uid-based security enhancement for the recovery mechanism fs/fuse/dev.c | 55 ++++++++++++++++++++++++++++++++++++++- fs/fuse/fuse_i.h | 15 +++++++++++ fs/fuse/inode.c | 46 +++++++++++++++++++++++++++++++- include/uapi/linux/fuse.h | 7 +++++ 4 files changed, 121 insertions(+), 2 deletions(-) -- 2.19.1.6.gb485710b ^ permalink raw reply [flat|nested] 15+ messages in thread
* [RFC 1/2] fuse: introduce recovery mechanism for fuse server 2024-05-24 6:40 [RFC 0/2] fuse: introduce fuse server recovery mechanism Jingbo Xu @ 2024-05-24 6:40 ` Jingbo Xu 2024-05-24 6:40 ` [RFC 2/2] fuse: uid-based security enhancement for the recovery mechanism Jingbo Xu ` (2 subsequent siblings) 3 siblings, 0 replies; 15+ messages in thread From: Jingbo Xu @ 2024-05-24 6:40 UTC (permalink / raw To: miklos, linux-fsdevel; +Cc: linux-kernel, winters.zc Introduce failover mechanism for fuse server, with which the fuse connection could keep alive while the fuse server crashes. The fuse server could re-attach to the fuse connection after crash and recover the filesystem service. The requests submitted after the server crash will stay in the iqueue and get serviced once the fuse server recovers from the crash and retrieve the previous fuse connection. As for the inflight requests that have been sent to the fuse server before the server crash and not been replied yet, the fuse server could request the kernel to resend those inflight requests through FUSE_NOTIFY_RESEND notification type. To implement the above mechanism: 1. Introduce a new "tag=" mount option, with which useres could identify a fuse connection with a unique name. 2. Introduce a new FUSE_DEV_IOC_ATTACH ioctl, with which the fuse server could reconnect to the fuse connection corresponding to the given tag. 3. Introduce a new FUSE_HAS_RECOVERY init flag. The fuse server should advertise this feature if it supports server recovery. Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com> --- fs/fuse/dev.c | 43 ++++++++++++++++++++++++++++++++++++++- fs/fuse/fuse_i.h | 7 +++++++ fs/fuse/inode.c | 35 ++++++++++++++++++++++++++++++- include/uapi/linux/fuse.h | 7 +++++++ 4 files changed, 90 insertions(+), 2 deletions(-) diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c index 3ec8bb5e68ff..7599138baac0 100644 --- a/fs/fuse/dev.c +++ b/fs/fuse/dev.c @@ -2271,7 +2271,7 @@ int fuse_dev_release(struct inode *inode, struct file *file) end_requests(&to_end); /* Are we the last open device? */ - if (atomic_dec_and_test(&fc->dev_count)) { + if (atomic_dec_and_test(&fc->dev_count) && !fc->recovery) { WARN_ON(fc->iq.fasync != NULL); fuse_abort_conn(fc); } @@ -2376,6 +2376,44 @@ static long fuse_dev_ioctl_backing_close(struct file *file, __u32 __user *argp) return fuse_backing_close(fud->fc, backing_id); } +static inline bool fuse_device_attach_match(struct fuse_conn *fc, + const char *tag) +{ + if (!fc->recovery) + return false; + + return !strncmp(fc->tag, tag, FUSE_TAG_NAME_MAX); +} + +static int fuse_device_attach(struct file *file, const char *tag) +{ + struct fuse_conn *fc; + + list_for_each_entry(fc, &fuse_conn_list, entry) { + if (!fuse_device_attach_match(fc, tag)) + continue; + return fuse_device_clone(fc, file); + } + return -ENOTTY; +} + +static long fuse_dev_ioctl_attach(struct file *file, __u32 __user *argp) +{ + struct fuse_ioctl_attach attach; + int res; + + if (copy_from_user(&attach, argp, sizeof(attach))) + return -EFAULT; + + if (attach.tag[0] == '\0') + return -EINVAL; + + mutex_lock(&fuse_mutex); + res = fuse_device_attach(file, attach.tag); + mutex_unlock(&fuse_mutex); + return res; +} + static long fuse_dev_ioctl(struct file *file, unsigned int cmd, unsigned long arg) { @@ -2391,6 +2429,9 @@ static long fuse_dev_ioctl(struct file *file, unsigned int cmd, case FUSE_DEV_IOC_BACKING_CLOSE: return fuse_dev_ioctl_backing_close(file, argp); + case FUSE_DEV_IOC_ATTACH: + return fuse_dev_ioctl_attach(file, argp); + default: return -ENOTTY; } diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h index f23919610313..e9832186f84f 100644 --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -575,6 +575,7 @@ struct fuse_fs_context { unsigned int max_read; unsigned int blksize; const char *subtype; + const char *tag; /* DAX device, may be NULL */ struct dax_device *dax_dev; @@ -860,6 +861,9 @@ struct fuse_conn { /** Passthrough support for read/write IO */ unsigned int passthrough:1; + /** Support for fuse server recovery */ + unsigned int recovery:1; + /** Maximum stack depth for passthrough backing files */ int max_stack_depth; @@ -917,6 +921,9 @@ struct fuse_conn { /** IDR for backing files ids */ struct idr backing_files_map; #endif + + /* Tag of the connection used by fuse server recovery */ + const char *tag; }; /* diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c index 99e44ea7d875..1ab245d6ade3 100644 --- a/fs/fuse/inode.c +++ b/fs/fuse/inode.c @@ -733,6 +733,7 @@ enum { OPT_ALLOW_OTHER, OPT_MAX_READ, OPT_BLKSIZE, + OPT_TAG, OPT_ERR }; @@ -747,6 +748,7 @@ static const struct fs_parameter_spec fuse_fs_parameters[] = { fsparam_u32 ("max_read", OPT_MAX_READ), fsparam_u32 ("blksize", OPT_BLKSIZE), fsparam_string ("subtype", OPT_SUBTYPE), + fsparam_string ("tag", OPT_TAG), {} }; @@ -830,6 +832,15 @@ static int fuse_parse_param(struct fs_context *fsc, struct fs_parameter *param) ctx->blksize = result.uint_32; break; + case OPT_TAG: + if (ctx->tag) + return invalfc(fsc, "Multiple tags specified"); + if (strlen(param->string) > FUSE_TAG_NAME_MAX) + return invalfc(fsc, "Tag name too long"); + ctx->tag = param->string; + param->string = NULL; + return 0; + default: return -EINVAL; } @@ -843,6 +854,7 @@ static void fuse_free_fsc(struct fs_context *fsc) if (ctx) { kfree(ctx->subtype); + kfree(ctx->tag); kfree(ctx); } } @@ -969,6 +981,7 @@ void fuse_conn_put(struct fuse_conn *fc) } if (IS_ENABLED(CONFIG_FUSE_PASSTHROUGH)) fuse_backing_files_free(fc); + kfree(fc->tag); call_rcu(&fc->rcu, delayed_release); } } @@ -1331,6 +1344,8 @@ static void process_init_reply(struct fuse_mount *fm, struct fuse_args *args, } if (flags & FUSE_NO_EXPORT_SUPPORT) fm->sb->s_export_op = &fuse_export_fid_operations; + if (flags & FUSE_HAS_RECOVERY) + fc->recovery = 1; } else { ra_pages = fc->max_read / PAGE_SIZE; fc->no_lock = 1; @@ -1378,7 +1393,7 @@ void fuse_send_init(struct fuse_mount *fm) FUSE_HANDLE_KILLPRIV_V2 | FUSE_SETXATTR_EXT | FUSE_INIT_EXT | FUSE_SECURITY_CTX | FUSE_CREATE_SUPP_GROUP | FUSE_HAS_EXPIRE_ONLY | FUSE_DIRECT_IO_ALLOW_MMAP | - FUSE_NO_EXPORT_SUPPORT | FUSE_HAS_RESEND; + FUSE_NO_EXPORT_SUPPORT | FUSE_HAS_RESEND | FUSE_HAS_RECOVERY; #ifdef CONFIG_FUSE_DAX if (fm->fc->dax) flags |= FUSE_MAP_ALIGNMENT; @@ -1520,6 +1535,17 @@ void fuse_dev_free(struct fuse_dev *fud) } EXPORT_SYMBOL_GPL(fuse_dev_free); +static bool fuse_find_conn_tag(const char *tag) +{ + struct fuse_conn *fc; + + list_for_each_entry(fc, &fuse_conn_list, entry) { + if (!strcmp(fc->tag, tag)) + return true; + } + return false; +} + static void fuse_fill_attr_from_inode(struct fuse_attr *attr, const struct fuse_inode *fi) { @@ -1727,6 +1753,8 @@ int fuse_fill_super_common(struct super_block *sb, struct fuse_fs_context *ctx) fc->destroy = ctx->destroy; fc->no_control = ctx->no_control; fc->no_force_umount = ctx->no_force_umount; + fc->tag = ctx->tag; + ctx->tag = NULL; err = -ENOMEM; root = fuse_get_root_inode(sb, ctx->rootmode); @@ -1742,6 +1770,11 @@ int fuse_fill_super_common(struct super_block *sb, struct fuse_fs_context *ctx) if (ctx->fudptr && *ctx->fudptr) goto err_unlock; + if (fc->tag && fuse_find_conn_tag(fc->tag)) { + pr_err("tag %s already exist\n", fc->tag); + goto err_unlock; + } + err = fuse_ctl_add_conn(fc); if (err) goto err_unlock; diff --git a/include/uapi/linux/fuse.h b/include/uapi/linux/fuse.h index d08b99d60f6f..054d6789b2fc 100644 --- a/include/uapi/linux/fuse.h +++ b/include/uapi/linux/fuse.h @@ -463,6 +463,7 @@ struct fuse_file_lock { #define FUSE_PASSTHROUGH (1ULL << 37) #define FUSE_NO_EXPORT_SUPPORT (1ULL << 38) #define FUSE_HAS_RESEND (1ULL << 39) +#define FUSE_HAS_RECOVERY (1ULL << 40) /* Obsolete alias for FUSE_DIRECT_IO_ALLOW_MMAP */ #define FUSE_DIRECT_IO_RELAX FUSE_DIRECT_IO_ALLOW_MMAP @@ -1079,12 +1080,18 @@ struct fuse_backing_map { uint64_t padding; }; +struct fuse_ioctl_attach { +#define FUSE_TAG_NAME_MAX 128 + char tag[FUSE_TAG_NAME_MAX]; +}; + /* Device ioctls: */ #define FUSE_DEV_IOC_MAGIC 229 #define FUSE_DEV_IOC_CLONE _IOR(FUSE_DEV_IOC_MAGIC, 0, uint32_t) #define FUSE_DEV_IOC_BACKING_OPEN _IOW(FUSE_DEV_IOC_MAGIC, 1, \ struct fuse_backing_map) #define FUSE_DEV_IOC_BACKING_CLOSE _IOW(FUSE_DEV_IOC_MAGIC, 2, uint32_t) +#define FUSE_DEV_IOC_ATTACH _IOW(FUSE_DEV_IOC_MAGIC, 3, struct fuse_ioctl_attach) struct fuse_lseek_in { uint64_t fh; -- 2.19.1.6.gb485710b ^ permalink raw reply related [flat|nested] 15+ messages in thread
* [RFC 2/2] fuse: uid-based security enhancement for the recovery mechanism 2024-05-24 6:40 [RFC 0/2] fuse: introduce fuse server recovery mechanism Jingbo Xu 2024-05-24 6:40 ` [RFC 1/2] fuse: introduce recovery mechanism for fuse server Jingbo Xu @ 2024-05-24 6:40 ` Jingbo Xu 2024-05-27 15:16 ` [RFC 0/2] fuse: introduce fuse server " Miklos Szeredi 2024-05-28 8:38 ` Christian Brauner 3 siblings, 0 replies; 15+ messages in thread From: Jingbo Xu @ 2024-05-24 6:40 UTC (permalink / raw To: miklos, linux-fsdevel; +Cc: linux-kernel, winters.zc Offer a uid-based security enhancement for the fuse server recovery mechanism. Otherwise any malicious attacker could kill the fuse server and take the filesystem service over with the recovery mechanism. Introduce a new "rescue_uid=" mount option specifying the expected uid of the legal process running the fuse server. Then only the process with the matching uid is permissible to retrieve the fuse connection with the server recovery mechanism. Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com> --- fs/fuse/dev.c | 12 ++++++++++++ fs/fuse/fuse_i.h | 8 ++++++++ fs/fuse/inode.c | 13 ++++++++++++- 3 files changed, 32 insertions(+), 1 deletion(-) diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c index 7599138baac0..9db35a2bbd85 100644 --- a/fs/fuse/dev.c +++ b/fs/fuse/dev.c @@ -2376,12 +2376,24 @@ static long fuse_dev_ioctl_backing_close(struct file *file, __u32 __user *argp) return fuse_backing_close(fud->fc, backing_id); } +static inline bool fuse_device_attach_permissible(struct fuse_conn *fc) +{ + const struct cred *cred = current_cred(); + + return (uid_eq(cred->euid, fc->rescue_uid) && + uid_eq(cred->suid, fc->rescue_uid) && + uid_eq(cred->uid, fc->rescue_uid)); +} + static inline bool fuse_device_attach_match(struct fuse_conn *fc, const char *tag) { if (!fc->recovery) return false; + if (!fuse_device_attach_permissible(fc)) + return false; + return !strncmp(fc->tag, tag, FUSE_TAG_NAME_MAX); } diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h index e9832186f84f..c43026d7229c 100644 --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -560,6 +560,7 @@ struct fuse_fs_context { unsigned int rootmode; kuid_t user_id; kgid_t group_id; + kuid_t rescue_uid; bool is_bdev:1; bool fd_present:1; bool rootmode_present:1; @@ -571,6 +572,7 @@ struct fuse_fs_context { bool no_control:1; bool no_force_umount:1; bool legacy_opts_show:1; + bool rescue_uid_present:1; enum fuse_dax_mode dax_mode; unsigned int max_read; unsigned int blksize; @@ -616,6 +618,9 @@ struct fuse_conn { /** The group id for this mount */ kgid_t group_id; + /* The expected user id of the fuse server */ + kuid_t rescue_uid; + /** The pid namespace for this mount */ struct pid_namespace *pid_ns; @@ -864,6 +869,9 @@ struct fuse_conn { /** Support for fuse server recovery */ unsigned int recovery:1; + /** Is rescue_uid specified? */ + unsigned int rescue_uid_present:1; + /** Maximum stack depth for passthrough backing files */ int max_stack_depth; diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c index 1ab245d6ade3..3b00482293b6 100644 --- a/fs/fuse/inode.c +++ b/fs/fuse/inode.c @@ -734,6 +734,7 @@ enum { OPT_MAX_READ, OPT_BLKSIZE, OPT_TAG, + OPT_RESCUE_UID, OPT_ERR }; @@ -749,6 +750,7 @@ static const struct fs_parameter_spec fuse_fs_parameters[] = { fsparam_u32 ("blksize", OPT_BLKSIZE), fsparam_string ("subtype", OPT_SUBTYPE), fsparam_string ("tag", OPT_TAG), + fsparam_u32 ("rescue_uid", OPT_RESCUE_UID), {} }; @@ -841,6 +843,13 @@ static int fuse_parse_param(struct fs_context *fsc, struct fs_parameter *param) param->string = NULL; return 0; + case OPT_RESCUE_UID: + ctx->rescue_uid = make_kuid(fsc->user_ns, result.uint_32); + if (!uid_valid(ctx->rescue_uid)) + return invalfc(fsc, "Invalid rescue_uid"); + ctx->rescue_uid_present = true; + break; + default: return -EINVAL; } @@ -1344,7 +1353,7 @@ static void process_init_reply(struct fuse_mount *fm, struct fuse_args *args, } if (flags & FUSE_NO_EXPORT_SUPPORT) fm->sb->s_export_op = &fuse_export_fid_operations; - if (flags & FUSE_HAS_RECOVERY) + if (flags & FUSE_HAS_RECOVERY && fc->rescue_uid_present) fc->recovery = 1; } else { ra_pages = fc->max_read / PAGE_SIZE; @@ -1753,6 +1762,8 @@ int fuse_fill_super_common(struct super_block *sb, struct fuse_fs_context *ctx) fc->destroy = ctx->destroy; fc->no_control = ctx->no_control; fc->no_force_umount = ctx->no_force_umount; + fc->rescue_uid = ctx->rescue_uid; + fc->rescue_uid_present = ctx->rescue_uid_present; fc->tag = ctx->tag; ctx->tag = NULL; -- 2.19.1.6.gb485710b ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [RFC 0/2] fuse: introduce fuse server recovery mechanism 2024-05-24 6:40 [RFC 0/2] fuse: introduce fuse server recovery mechanism Jingbo Xu 2024-05-24 6:40 ` [RFC 1/2] fuse: introduce recovery mechanism for fuse server Jingbo Xu 2024-05-24 6:40 ` [RFC 2/2] fuse: uid-based security enhancement for the recovery mechanism Jingbo Xu @ 2024-05-27 15:16 ` Miklos Szeredi 2024-05-28 2:45 ` Jingbo Xu 2024-05-28 8:38 ` Christian Brauner 3 siblings, 1 reply; 15+ messages in thread From: Miklos Szeredi @ 2024-05-27 15:16 UTC (permalink / raw To: Jingbo Xu; +Cc: linux-fsdevel, linux-kernel, winters.zc On Fri, 24 May 2024 at 08:40, Jingbo Xu <jefflexu@linux.alibaba.com> wrote: > 3. I don't know if a kernel based recovery mechanism is welcome on the > community side. Any comment is welcome. Thanks! I'd prefer something external to fuse. Maybe a kernel based fdstore (lifetime connected to that of the container) would a useful service more generally? Thanks, Miklos ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC 0/2] fuse: introduce fuse server recovery mechanism 2024-05-27 15:16 ` [RFC 0/2] fuse: introduce fuse server " Miklos Szeredi @ 2024-05-28 2:45 ` Jingbo Xu 2024-05-28 3:08 ` Jingbo Xu 2024-05-28 7:45 ` Miklos Szeredi 0 siblings, 2 replies; 15+ messages in thread From: Jingbo Xu @ 2024-05-28 2:45 UTC (permalink / raw To: Miklos Szeredi; +Cc: linux-fsdevel, linux-kernel, winters.zc On 5/27/24 11:16 PM, Miklos Szeredi wrote: > On Fri, 24 May 2024 at 08:40, Jingbo Xu <jefflexu@linux.alibaba.com> wrote: > >> 3. I don't know if a kernel based recovery mechanism is welcome on the >> community side. Any comment is welcome. Thanks! > > I'd prefer something external to fuse. Okay, understood. > > Maybe a kernel based fdstore (lifetime connected to that of the > container) would a useful service more generally? Yeah I indeed had considered this, but I'm afraid VFS guys would be concerned about why we do this on kernel side rather than in user space. I'm not sure what the VFS guys think about this and if the kernel side shall care about this. Many thanks! -- Thanks, Jingbo ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC 0/2] fuse: introduce fuse server recovery mechanism 2024-05-28 2:45 ` Jingbo Xu @ 2024-05-28 3:08 ` Jingbo Xu 2024-05-28 4:02 ` Gao Xiang 2024-05-28 7:46 ` Miklos Szeredi 2024-05-28 7:45 ` Miklos Szeredi 1 sibling, 2 replies; 15+ messages in thread From: Jingbo Xu @ 2024-05-28 3:08 UTC (permalink / raw To: Miklos Szeredi; +Cc: linux-fsdevel, linux-kernel, winters.zc On 5/28/24 10:45 AM, Jingbo Xu wrote: > > > On 5/27/24 11:16 PM, Miklos Szeredi wrote: >> On Fri, 24 May 2024 at 08:40, Jingbo Xu <jefflexu@linux.alibaba.com> wrote: >> >>> 3. I don't know if a kernel based recovery mechanism is welcome on the >>> community side. Any comment is welcome. Thanks! >> >> I'd prefer something external to fuse. > > Okay, understood. > >> >> Maybe a kernel based fdstore (lifetime connected to that of the >> container) would a useful service more generally? > > Yeah I indeed had considered this, but I'm afraid VFS guys would be > concerned about why we do this on kernel side rather than in user space. > > I'm not sure what the VFS guys think about this and if the kernel side > shall care about this. > There was an RFC for kernel-side fdstore [1], though it's also implemented upon FUSE. [1] https://lore.kernel.org/all/CA+a=Yy5rnqLqH2iR-ZY6AUkNJy48mroVV3Exmhmt-pfTi82kXA@mail.gmail.com/T/ -- Thanks, Jingbo ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC 0/2] fuse: introduce fuse server recovery mechanism 2024-05-28 3:08 ` Jingbo Xu @ 2024-05-28 4:02 ` Gao Xiang 2024-05-28 8:43 ` Christian Brauner 2024-05-28 7:46 ` Miklos Szeredi 1 sibling, 1 reply; 15+ messages in thread From: Gao Xiang @ 2024-05-28 4:02 UTC (permalink / raw To: Jingbo Xu, Miklos Szeredi, Christian Brauner Cc: linux-fsdevel, linux-kernel, winters.zc On 2024/5/28 11:08, Jingbo Xu wrote: > > > On 5/28/24 10:45 AM, Jingbo Xu wrote: >> >> >> On 5/27/24 11:16 PM, Miklos Szeredi wrote: >>> On Fri, 24 May 2024 at 08:40, Jingbo Xu <jefflexu@linux.alibaba.com> wrote: >>> >>>> 3. I don't know if a kernel based recovery mechanism is welcome on the >>>> community side. Any comment is welcome. Thanks! >>> >>> I'd prefer something external to fuse. >> >> Okay, understood. >> >>> >>> Maybe a kernel based fdstore (lifetime connected to that of the >>> container) would a useful service more generally? >> >> Yeah I indeed had considered this, but I'm afraid VFS guys would be >> concerned about why we do this on kernel side rather than in user space. Just from my own perspective, even if it's in FUSE, the concern is almost the same. I wonder if on-demand cachefiles can keep fds too in the future (thus e.g. daemonless feature could even be implemented entirely with kernel fdstore) but it still has the same concern or it's a source of duplication. Thanks, Gao Xiang >> >> I'm not sure what the VFS guys think about this and if the kernel side >> shall care about this. >> > > There was an RFC for kernel-side fdstore [1], though it's also > implemented upon FUSE. > > [1] > https://lore.kernel.org/all/CA+a=Yy5rnqLqH2iR-ZY6AUkNJy48mroVV3Exmhmt-pfTi82kXA@mail.gmail.com/T/ > > > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC 0/2] fuse: introduce fuse server recovery mechanism 2024-05-28 4:02 ` Gao Xiang @ 2024-05-28 8:43 ` Christian Brauner 2024-05-28 9:13 ` Gao Xiang 0 siblings, 1 reply; 15+ messages in thread From: Christian Brauner @ 2024-05-28 8:43 UTC (permalink / raw To: Gao Xiang Cc: Jingbo Xu, Miklos Szeredi, linux-fsdevel, linux-kernel, winters.zc On Tue, May 28, 2024 at 12:02:46PM +0800, Gao Xiang wrote: > > > On 2024/5/28 11:08, Jingbo Xu wrote: > > > > > > On 5/28/24 10:45 AM, Jingbo Xu wrote: > > > > > > > > > On 5/27/24 11:16 PM, Miklos Szeredi wrote: > > > > On Fri, 24 May 2024 at 08:40, Jingbo Xu <jefflexu@linux.alibaba.com> wrote: > > > > > > > > > 3. I don't know if a kernel based recovery mechanism is welcome on the > > > > > community side. Any comment is welcome. Thanks! > > > > > > > > I'd prefer something external to fuse. > > > > > > Okay, understood. > > > > > > > > > > > Maybe a kernel based fdstore (lifetime connected to that of the > > > > container) would a useful service more generally? > > > > > > Yeah I indeed had considered this, but I'm afraid VFS guys would be > > > concerned about why we do this on kernel side rather than in user space. > > Just from my own perspective, even if it's in FUSE, the concern is > almost the same. > > I wonder if on-demand cachefiles can keep fds too in the future > (thus e.g. daemonless feature could even be implemented entirely > with kernel fdstore) but it still has the same concern or it's > a source of duplication. > > Thanks, > Gao Xiang > > > > > > > I'm not sure what the VFS guys think about this and if the kernel side > > > shall care about this. Fwiw, I'm not convinced and I think that's a big can of worms security wise and semantics wise. I have discussed whether a kernel-side fdstore would be something that systemd would use if available multiple times and they wouldn't use it because it provides them with no benefits over having it in userspace. Especially since it implements a lot of special semantics and policy that we really don't want in the kernel. I think that's just not something we should do. We should give userspace all the means to implement fdstores in userspace but not hold fds ourselves. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC 0/2] fuse: introduce fuse server recovery mechanism 2024-05-28 8:43 ` Christian Brauner @ 2024-05-28 9:13 ` Gao Xiang 2024-05-28 9:32 ` Christian Brauner 0 siblings, 1 reply; 15+ messages in thread From: Gao Xiang @ 2024-05-28 9:13 UTC (permalink / raw To: Christian Brauner Cc: Jingbo Xu, Miklos Szeredi, linux-fsdevel, linux-kernel, winters.zc Hi Christian, On 2024/5/28 16:43, Christian Brauner wrote: > On Tue, May 28, 2024 at 12:02:46PM +0800, Gao Xiang wrote: >> >> >> On 2024/5/28 11:08, Jingbo Xu wrote: >>> >>> >>> On 5/28/24 10:45 AM, Jingbo Xu wrote: >>>> >>>> >>>> On 5/27/24 11:16 PM, Miklos Szeredi wrote: >>>>> On Fri, 24 May 2024 at 08:40, Jingbo Xu <jefflexu@linux.alibaba.com> wrote: >>>>> >>>>>> 3. I don't know if a kernel based recovery mechanism is welcome on the >>>>>> community side. Any comment is welcome. Thanks! >>>>> >>>>> I'd prefer something external to fuse. >>>> >>>> Okay, understood. >>>> >>>>> >>>>> Maybe a kernel based fdstore (lifetime connected to that of the >>>>> container) would a useful service more generally? >>>> >>>> Yeah I indeed had considered this, but I'm afraid VFS guys would be >>>> concerned about why we do this on kernel side rather than in user space. >> >> Just from my own perspective, even if it's in FUSE, the concern is >> almost the same. >> >> I wonder if on-demand cachefiles can keep fds too in the future >> (thus e.g. daemonless feature could even be implemented entirely >> with kernel fdstore) but it still has the same concern or it's >> a source of duplication. >> >> Thanks, >> Gao Xiang >> >>>> >>>> I'm not sure what the VFS guys think about this and if the kernel side >>>> shall care about this. > > Fwiw, I'm not convinced and I think that's a big can of worms security > wise and semantics wise. I have discussed whether a kernel-side fdstore > would be something that systemd would use if available multiple times > and they wouldn't use it because it provides them with no benefits over > having it in userspace. As far as I know, currently there are approximately two ways to do failover mechanisms in kernel. The first model much like a fuse-like model: in this mode, we should keep and pass fd to maintain the active state. And currently, userspace should be responsible for the permission/security issues when doing something like passing fds. The second model is like one device-one instance model, for example ublk (If I understand correctly): each active instance (/dev/ublkbX) has their own unique control device (/dev/ublkcX). Users could assign/change DAC/MAC for each control device. And failover recovery just needs to reopen the control device with proper permission and do recovery. So just my own thought, kernel-side fdstore pseudo filesystem may provide a DAC/MAC mechanism for the first model. That is a much cleaner way than doing some similar thing independently in each subsystem which may need DAC/MAC-like mechanism. But that is just my own thought. Thanks, Gao Xiang > > Especially since it implements a lot of special semantics and policy > that we really don't want in the kernel. I think that's just not > something we should do. We should give userspace all the means to > implement fdstores in userspace but not hold fds ourselves. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC 0/2] fuse: introduce fuse server recovery mechanism 2024-05-28 9:13 ` Gao Xiang @ 2024-05-28 9:32 ` Christian Brauner 2024-05-28 9:58 ` Gao Xiang 0 siblings, 1 reply; 15+ messages in thread From: Christian Brauner @ 2024-05-28 9:32 UTC (permalink / raw To: Gao Xiang Cc: Jingbo Xu, Miklos Szeredi, linux-fsdevel, linux-kernel, winters.zc On Tue, May 28, 2024 at 05:13:04PM +0800, Gao Xiang wrote: > Hi Christian, > > On 2024/5/28 16:43, Christian Brauner wrote: > > On Tue, May 28, 2024 at 12:02:46PM +0800, Gao Xiang wrote: > > > > > > > > > On 2024/5/28 11:08, Jingbo Xu wrote: > > > > > > > > > > > > On 5/28/24 10:45 AM, Jingbo Xu wrote: > > > > > > > > > > > > > > > On 5/27/24 11:16 PM, Miklos Szeredi wrote: > > > > > > On Fri, 24 May 2024 at 08:40, Jingbo Xu <jefflexu@linux.alibaba.com> wrote: > > > > > > > > > > > > > 3. I don't know if a kernel based recovery mechanism is welcome on the > > > > > > > community side. Any comment is welcome. Thanks! > > > > > > > > > > > > I'd prefer something external to fuse. > > > > > > > > > > Okay, understood. > > > > > > > > > > > > > > > > > Maybe a kernel based fdstore (lifetime connected to that of the > > > > > > container) would a useful service more generally? > > > > > > > > > > Yeah I indeed had considered this, but I'm afraid VFS guys would be > > > > > concerned about why we do this on kernel side rather than in user space. > > > > > > Just from my own perspective, even if it's in FUSE, the concern is > > > almost the same. > > > > > > I wonder if on-demand cachefiles can keep fds too in the future > > > (thus e.g. daemonless feature could even be implemented entirely > > > with kernel fdstore) but it still has the same concern or it's > > > a source of duplication. > > > > > > Thanks, > > > Gao Xiang > > > > > > > > > > > > > I'm not sure what the VFS guys think about this and if the kernel side > > > > > shall care about this. > > > > Fwiw, I'm not convinced and I think that's a big can of worms security > > wise and semantics wise. I have discussed whether a kernel-side fdstore > > would be something that systemd would use if available multiple times > > and they wouldn't use it because it provides them with no benefits over > > having it in userspace. > > As far as I know, currently there are approximately two ways to do > failover mechanisms in kernel. > > The first model much like a fuse-like model: in this mode, we should > keep and pass fd to maintain the active state. And currently, > userspace should be responsible for the permission/security issues > when doing something like passing fds. > > The second model is like one device-one instance model, for example > ublk (If I understand correctly): each active instance (/dev/ublkbX) > has their own unique control device (/dev/ublkcX). Users could > assign/change DAC/MAC for each control device. And failover > recovery just needs to reopen the control device with proper > permission and do recovery. > > So just my own thought, kernel-side fdstore pseudo filesystem may > provide a DAC/MAC mechanism for the first model. That is a much > cleaner way than doing some similar thing independently in each > subsystem which may need DAC/MAC-like mechanism. But that is > just my own thought. The failover mechanism for /dev/ublkcX could easily be implemented using the fdstore. The fact that they rolled their own thing is orthogonal to this imho. Implementing retrieval policies like this in the kernel is slowly advancing into /proc/$pid/fd/ levels of complexity. That's all better handled with appropriate policies in userspace. And cachefilesd can similarly just stash their fds in the fdstore. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC 0/2] fuse: introduce fuse server recovery mechanism 2024-05-28 9:32 ` Christian Brauner @ 2024-05-28 9:58 ` Gao Xiang 0 siblings, 0 replies; 15+ messages in thread From: Gao Xiang @ 2024-05-28 9:58 UTC (permalink / raw To: Christian Brauner Cc: Jingbo Xu, Miklos Szeredi, linux-fsdevel, linux-kernel, winters.zc On 2024/5/28 17:32, Christian Brauner wrote: > On Tue, May 28, 2024 at 05:13:04PM +0800, Gao Xiang wrote: >> Hi Christian, >> >> On 2024/5/28 16:43, Christian Brauner wrote: >>> On Tue, May 28, 2024 at 12:02:46PM +0800, Gao Xiang wrote: >>>> >>>> >>>> On 2024/5/28 11:08, Jingbo Xu wrote: >>>>> >>>>> >>>>> On 5/28/24 10:45 AM, Jingbo Xu wrote: >>>>>> >>>>>> >>>>>> On 5/27/24 11:16 PM, Miklos Szeredi wrote: >>>>>>> On Fri, 24 May 2024 at 08:40, Jingbo Xu <jefflexu@linux.alibaba.com> wrote: >>>>>>> >>>>>>>> 3. I don't know if a kernel based recovery mechanism is welcome on the >>>>>>>> community side. Any comment is welcome. Thanks! >>>>>>> >>>>>>> I'd prefer something external to fuse. >>>>>> >>>>>> Okay, understood. >>>>>> >>>>>>> >>>>>>> Maybe a kernel based fdstore (lifetime connected to that of the >>>>>>> container) would a useful service more generally? >>>>>> >>>>>> Yeah I indeed had considered this, but I'm afraid VFS guys would be >>>>>> concerned about why we do this on kernel side rather than in user space. >>>> >>>> Just from my own perspective, even if it's in FUSE, the concern is >>>> almost the same. >>>> >>>> I wonder if on-demand cachefiles can keep fds too in the future >>>> (thus e.g. daemonless feature could even be implemented entirely >>>> with kernel fdstore) but it still has the same concern or it's >>>> a source of duplication. >>>> >>>> Thanks, >>>> Gao Xiang >>>> >>>>>> >>>>>> I'm not sure what the VFS guys think about this and if the kernel side >>>>>> shall care about this. >>> >>> Fwiw, I'm not convinced and I think that's a big can of worms security >>> wise and semantics wise. I have discussed whether a kernel-side fdstore >>> would be something that systemd would use if available multiple times >>> and they wouldn't use it because it provides them with no benefits over >>> having it in userspace. >> >> As far as I know, currently there are approximately two ways to do >> failover mechanisms in kernel. >> >> The first model much like a fuse-like model: in this mode, we should >> keep and pass fd to maintain the active state. And currently, >> userspace should be responsible for the permission/security issues >> when doing something like passing fds. >> >> The second model is like one device-one instance model, for example >> ublk (If I understand correctly): each active instance (/dev/ublkbX) >> has their own unique control device (/dev/ublkcX). Users could >> assign/change DAC/MAC for each control device. And failover >> recovery just needs to reopen the control device with proper >> permission and do recovery. >> >> So just my own thought, kernel-side fdstore pseudo filesystem may >> provide a DAC/MAC mechanism for the first model. That is a much >> cleaner way than doing some similar thing independently in each >> subsystem which may need DAC/MAC-like mechanism. But that is >> just my own thought. > > The failover mechanism for /dev/ublkcX could easily be implemented using > the fdstore. The fact that they rolled their own thing is orthogonal to > this imho. Implementing retrieval policies like this in the kernel is > slowly advancing into /proc/$pid/fd/ levels of complexity. That's all > better handled with appropriate policies in userspace. And cachefilesd > can similarly just stash their fds in the fdstore. Ok, got it. I just would like to know what kernel fdstore currently sounds like (since Miklos mentioned it so I wonder if it's feasible since it can benefit to non-fuse cases). I think userspace fdstore works for me (unless some other interesting use cases for evaluation later). Jingbo has an internal requirement for fuse, that is a pure fuse stuff, and that is out of my scope though. Thanks, Gao Xiang ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC 0/2] fuse: introduce fuse server recovery mechanism 2024-05-28 3:08 ` Jingbo Xu 2024-05-28 4:02 ` Gao Xiang @ 2024-05-28 7:46 ` Miklos Szeredi 1 sibling, 0 replies; 15+ messages in thread From: Miklos Szeredi @ 2024-05-28 7:46 UTC (permalink / raw To: Jingbo Xu; +Cc: linux-fsdevel, linux-kernel, winters.zc On Tue, 28 May 2024 at 05:08, Jingbo Xu <jefflexu@linux.alibaba.com> wrote: > There was an RFC for kernel-side fdstore [1], though it's also > implemented upon FUSE. I strongly believe that this needs to be disassociated from fuse. It could be a pseudo filesystem, though. Thanks, Miklos ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC 0/2] fuse: introduce fuse server recovery mechanism 2024-05-28 2:45 ` Jingbo Xu 2024-05-28 3:08 ` Jingbo Xu @ 2024-05-28 7:45 ` Miklos Szeredi 1 sibling, 0 replies; 15+ messages in thread From: Miklos Szeredi @ 2024-05-28 7:45 UTC (permalink / raw To: Jingbo Xu; +Cc: linux-fsdevel, linux-kernel, winters.zc On Tue, 28 May 2024 at 04:45, Jingbo Xu <jefflexu@linux.alibaba.com> wrote: > Yeah I indeed had considered this, but I'm afraid VFS guys would be > concerned about why we do this on kernel side rather than in user space. > > I'm not sure what the VFS guys think about this and if the kernel side > shall care about this. Yes, that is indeed something that needs to be discussed. I often find, that when discussing something like this a lot of good ideas can come from different directions, so it can help move things forward. Try something really simple first, and post a patch. Don't overthink the first version. Thanks, Miklos ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC 0/2] fuse: introduce fuse server recovery mechanism 2024-05-24 6:40 [RFC 0/2] fuse: introduce fuse server recovery mechanism Jingbo Xu ` (2 preceding siblings ...) 2024-05-27 15:16 ` [RFC 0/2] fuse: introduce fuse server " Miklos Szeredi @ 2024-05-28 8:38 ` Christian Brauner 2024-05-28 9:45 ` Jingbo Xu 3 siblings, 1 reply; 15+ messages in thread From: Christian Brauner @ 2024-05-28 8:38 UTC (permalink / raw To: Jingbo Xu; +Cc: miklos, linux-fsdevel, linux-kernel, winters.zc On Fri, May 24, 2024 at 02:40:28PM +0800, Jingbo Xu wrote: > Background > ========== > The fd of '/dev/fuse' serves as a message transmission channel between > FUSE filesystem (kernel space) and fuse server (user space). Once the > fd gets closed (intentionally or unintentionally), the FUSE filesystem > gets aborted, and any attempt of filesystem access gets -ECONNABORTED > error until the FUSE filesystem finally umounted. > > It is one of the requisites in production environment to provide > uninterruptible filesystem service. The most straightforward way, and > maybe the most widely used way, is that make another dedicated user > daemon (similar to systemd fdstore) keep the device fd open. When the > fuse daemon recovers from a crash, it can retrieve the device fd from the > fdstore daemon through socket takeover (Unix domain socket) method [1] > or pidfd_getfd() syscall [2]. In this way, as long as the fdstore > daemon doesn't exit, the FUSE filesystem won't get aborted once the fuse > daemon crashes, though the filesystem service may hang there for a while > when the fuse daemon gets restarted and has not been completely > recovered yet. > > This picture indeed works and has been deployed in our internal > production environment until the following issues are encountered: > > 1. The fdstore daemon may be killed by mistake, in which case the FUSE > filesystem gets aborted and irrecoverable. That's only a problem if you use the fdstore of the per-user instance. The main fdstore is part of PID 1 and you can't kill that. So really, systemd needs to hand the fds from the per-user instance to the main fdstore. > 2. In scenarios of containerized deployment, the fuse daemon is deployed > in a container POD, and a dedicated fdstore daemon needs to be deployed > for each fuse daemon. The fdstore daemon could consume a amount of > resources (e.g. memory footprint), which is not conducive to the dense > container deployment. > > 3. Each fuse daemon implementation needs to implement its own fdstore > daemon. If we implement the fuse recovery mechanism on the kernel side, > all fuse daemon implementations could reuse this mechanism. You can just the global fdstore. That is a design limitation not an inherent limitation. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC 0/2] fuse: introduce fuse server recovery mechanism 2024-05-28 8:38 ` Christian Brauner @ 2024-05-28 9:45 ` Jingbo Xu 0 siblings, 0 replies; 15+ messages in thread From: Jingbo Xu @ 2024-05-28 9:45 UTC (permalink / raw To: Christian Brauner; +Cc: miklos, linux-fsdevel, linux-kernel, winters.zc Hi, Christian, Thanks for the review. On 5/28/24 4:38 PM, Christian Brauner wrote: > On Fri, May 24, 2024 at 02:40:28PM +0800, Jingbo Xu wrote: >> Background >> ========== >> The fd of '/dev/fuse' serves as a message transmission channel between >> FUSE filesystem (kernel space) and fuse server (user space). Once the >> fd gets closed (intentionally or unintentionally), the FUSE filesystem >> gets aborted, and any attempt of filesystem access gets -ECONNABORTED >> error until the FUSE filesystem finally umounted. >> >> It is one of the requisites in production environment to provide >> uninterruptible filesystem service. The most straightforward way, and >> maybe the most widely used way, is that make another dedicated user >> daemon (similar to systemd fdstore) keep the device fd open. When the >> fuse daemon recovers from a crash, it can retrieve the device fd from the >> fdstore daemon through socket takeover (Unix domain socket) method [1] >> or pidfd_getfd() syscall [2]. In this way, as long as the fdstore >> daemon doesn't exit, the FUSE filesystem won't get aborted once the fuse >> daemon crashes, though the filesystem service may hang there for a while >> when the fuse daemon gets restarted and has not been completely >> recovered yet. >> >> This picture indeed works and has been deployed in our internal >> production environment until the following issues are encountered: >> >> 1. The fdstore daemon may be killed by mistake, in which case the FUSE >> filesystem gets aborted and irrecoverable. > > That's only a problem if you use the fdstore of the per-user instance. > The main fdstore is part of PID 1 and you can't kill that. So really, > systemd needs to hand the fds from the per-user instance to the main > fdstore. Systemd indeed has implemented its own fdstore mechanism in the user space. Nowadays more and more fuse daemons are running inside containers, but a container generally has no systemd inside it. > >> 2. In scenarios of containerized deployment, the fuse daemon is deployed >> in a container POD, and a dedicated fdstore daemon needs to be deployed >> for each fuse daemon. The fdstore daemon could consume a amount of >> resources (e.g. memory footprint), which is not conducive to the dense >> container deployment. >> >> 3. Each fuse daemon implementation needs to implement its own fdstore >> daemon. If we implement the fuse recovery mechanism on the kernel side, >> all fuse daemon implementations could reuse this mechanism. > > You can just the global fdstore. That is a design limitation not an > inherent limitation. What I initially mean is that each fuse daemon implementation (e.g. s3fs, ossfs, and other vendors) needs to make its own but similar mechanism for daemon failover. There has not been a common component for fdstore in container scenarios just like systemd fdstore. I'd admit that it's controversial to implement a kernel-side fdstore. Thus I only implement a failover mechanism for fuse server in this RFC patch. But I also understand Miklos's concern as what we really need to support daemon failover is just something like fdstore to keep the device fd alive. -- Thanks, Jingbo ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2024-05-28 9:58 UTC | newest] Thread overview: 15+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-05-24 6:40 [RFC 0/2] fuse: introduce fuse server recovery mechanism Jingbo Xu 2024-05-24 6:40 ` [RFC 1/2] fuse: introduce recovery mechanism for fuse server Jingbo Xu 2024-05-24 6:40 ` [RFC 2/2] fuse: uid-based security enhancement for the recovery mechanism Jingbo Xu 2024-05-27 15:16 ` [RFC 0/2] fuse: introduce fuse server " Miklos Szeredi 2024-05-28 2:45 ` Jingbo Xu 2024-05-28 3:08 ` Jingbo Xu 2024-05-28 4:02 ` Gao Xiang 2024-05-28 8:43 ` Christian Brauner 2024-05-28 9:13 ` Gao Xiang 2024-05-28 9:32 ` Christian Brauner 2024-05-28 9:58 ` Gao Xiang 2024-05-28 7:46 ` Miklos Szeredi 2024-05-28 7:45 ` Miklos Szeredi 2024-05-28 8:38 ` Christian Brauner 2024-05-28 9:45 ` Jingbo Xu
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).