From: ChenQi <Qi.Chen@windriver.com>
To: Richard Purdie <richard.purdie@linuxfoundation.org>,
bitbake-devel@lists.openembedded.org
Subject: Re: [bitbake-devel][PATCH] fetch2/git.py: fix the try_premirror logic
Date: Wed, 31 Jan 2024 11:41:37 +0800 [thread overview]
Message-ID: <ce3c2fd0-50d1-87de-8981-6817aaad5abc@windriver.com> (raw)
In-Reply-To: <915d6f72d58debc70a36fd82a03c2ccdf5be7239.camel@linuxfoundation.org>
On 1/30/24 22:56, Richard Purdie wrote:
> On Tue, 2024-01-30 at 19:11 +0800, Chen Qi via lists.openembedded.org
> wrote:
>> From: Chen Qi <Qi.Chen@windriver.com>
>>
>> For gitsm recipes, it's possible that some URL is used more than once.
>> e.g.,
>> A -> B:rev1 (B is a submodule of A)
>> A -> C (C is a submodule of A)
>> C -> B:rev2 (B is a submodule of C)
>> A anc C are both using B as submodules, but on different revs.
>> Now if we have:
>> B:rev1 -> D
>> B:rev2 -> E
>> Then, the mirror will not be fully used.
>> Say we have all repo mirrors for A, B, C, D, E, then in theory it's not
>> necessary to reach out to any network for downloading. But it's not the
>> case. After downloading B(rev1) and its submodule D from mirrors, the fetch
>> process continues to download C, thus B(rev2) and E. Now it finds that B
>> needs an update because its submodule E needs an update. Of course this is
>> true because E is not downloaded yet. Now the problem comes to whether to
>> use mirror or not. The git.py defines try_premirror to return 'False' when
>> the ud.clonedir exists. As B has been cloned, the ud.clonedir exists and
>> try_mirror returns False, resulting in not using mirror and going to upstream
>> directly.
>>
>> We can see that the mirrors are not fully used. This is usually not problem,
>> as the cost is only some network download. But in case the following two
>> settings are there, we get errors.
>> BB_NO_NETWORK = "0"
>> BB_ALLOWED_NETWORKS = "*.some.allowed.domain"
>> In such case, the gitsm recipe A will fail to fetch. Note that all contents
>> that A needs are in mirrors and now it's failing to fetch. This is unexpected.
>>
>> Note that the different revs of the same repo in gitsm recipe is not the only
>> way to reveal this problem. For example, there might be a recipe call B that
>> uses B:rev3. Check the protobuf and grpc recipes as an example.
>>
>> For now, we can use the following steps to reproduce this issue. To be clear,
>> the grpc recipe in meta-oe is now 1.60.0.
>> 1. Add in local.conf:
>> DL_DIR = "${TOPDIR}/downloads-premirror"
>> bitbake grpc -c fetch
>> 2. Comment out the DL_DIR setting in local.conf and add the following lines:
>> PREMIRRORS:append = " \
>> git://.*/.* git://${TOPDIR}/downloads-premirror/git2/MIRRORNAME;protocol=file \n \
>> gitsm://.*/.* gitsm://${TOPDIR}/downloads-premirror/git2/MIRRORNAME;protocol=file \n \
>> "
>> 3. Set BB_NO_NETWORK = "1" and then 'bitbake grpc -c fetch'.
>> This command succeeds and this shows that the premirror holds everything we need.
>> 4. Add the following lines and then 'bitbake grpc -c fetch'.
>> BB_NO_NETWORK = "0"
>> BB_ALLOWED_NETWORKS = "*.some.domain"
>>
>> After step 4, the error message is as below:
>> ERROR: grpc-1.60.0-r0 do_fetch: The URL: 'gitsm://github.com/protocolbuffers/protobuf.git;protocol=https;name=third_party/protobuf;subpath=third_party/protobuf;nobranch=1;lfs=True;bareclone=1;nobranch=1' is not trusted and cannot be used
>>
>> I guess that the try_premirror codes in git fetcher (fetch2/git.py) are assuming
>> that the clonedir syncs with the mirror repo. That's only reason I can think of
>> for why it returns 'False' if the clonedir exists. The assumption is wrong. The
>> mirror could well be updated via automation tools or some project-level setup
>> tools. Yet, it's returning True when BB_FETCH_PREMIRRORONLY is set, regardless
>> of whether the clonedir exists. This means it does know that the mirror could be
>> updated. It seems that it treats using mirror as some kind of 'best effort'. Anyway,
>> the codes' logic is just contradicting with itself.
>>
>> If the mirror can be possibly updated, then it should be tried regardless of whether
>> the clonedir exists. So we should remove the try_premirror codes here and use the
>> default one (in __init__.py), which always returns True.
>>
>> Signed-off-by: Chen Qi <Qi.Chen@windriver.com>
>> ---
>> lib/bb/fetch2/git.py | 9 ---------
>> 1 file changed, 9 deletions(-)
>>
>> diff --git a/lib/bb/fetch2/git.py b/lib/bb/fetch2/git.py
>> index 7aa3fd9e5..62618012b 100644
>> --- a/lib/bb/fetch2/git.py
>> +++ b/lib/bb/fetch2/git.py
>> @@ -353,15 +353,6 @@ class Git(FetchMethod):
>> def tarball_need_update(self, ud):
>> return ud.write_tarballs and not os.path.exists(ud.fullmirror)
>>
>> - def try_premirror(self, ud, d):
>> - # If we don't do this, updating an existing checkout with only premirrors
>> - # is not possible
>> - if bb.utils.to_boolean(d.getVar("BB_FETCH_PREMIRRORONLY")):
>> - return True
>> - if os.path.exists(ud.clonedir):
>> - return False
>> - return True
>> -
>> def download(self, ud, d):
>> """Fetch url"""
> I'm not sure this is a good idea. The problem is that the default
> fetcher would not try and incrementally fetch into an existing clone
> without this code and that is a common workflow.
'Incremental fetch'. That's the reason! Thanks.
I'll come up with another solution which will not break this workflow.
> Did the existing test cases pass after this change?
Is the following command all I need to ensure no regression in bitbake
fetcher?
bitbake-selftest bb.tests.fetch
>
> We probably should have a test case for this issue too.
Yes. I'll add one.
> Things have
> changes now mirrors/premirrors are non-destructive to existing git
> repos so we may be able to improve the core logic to fix this corner
> case but it can't be at the expense of some of the other workflows.
Got it.
I just found a related bug here:
https://bugzilla.yoctoproject.org/show_bug.cgi?id=15369
Though it's not the same problem, it's also related to try_premirror logic.
I'll add some comments in the try_premirror function to avoid confusion
in the future. I'll also check to see if I can add a test case to cover
this 'incremental fetch' workflow.
Regards,
Qi
>
> Cheers,
>
> Richard
prev parent reply other threads:[~2024-01-31 3:41 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-01-30 11:11 [bitbake-devel][PATCH] fetch2/git.py: fix the try_premirror logic Qi.Chen
2024-01-30 14:56 ` Richard Purdie
2024-01-31 3:41 ` ChenQi [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ce3c2fd0-50d1-87de-8981-6817aaad5abc@windriver.com \
--to=qi.chen@windriver.com \
--cc=bitbake-devel@lists.openembedded.org \
--cc=richard.purdie@linuxfoundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).