($INBOX_DIR/description missing)
 help / color / mirror / Atom feed
From: ChenQi <Qi.Chen@windriver.com>
To: Richard Purdie <richard.purdie@linuxfoundation.org>,
	bitbake-devel@lists.openembedded.org
Subject: Re: [bitbake-devel][PATCH] fetch2/git.py: fix the try_premirror logic
Date: Wed, 31 Jan 2024 11:41:37 +0800	[thread overview]
Message-ID: <ce3c2fd0-50d1-87de-8981-6817aaad5abc@windriver.com> (raw)
In-Reply-To: <915d6f72d58debc70a36fd82a03c2ccdf5be7239.camel@linuxfoundation.org>

On 1/30/24 22:56, Richard Purdie wrote:
> On Tue, 2024-01-30 at 19:11 +0800, Chen Qi via lists.openembedded.org
> wrote:
>> From: Chen Qi <Qi.Chen@windriver.com>
>>
>> For gitsm recipes, it's possible that some URL is used more than once.
>> e.g.,
>> A -> B:rev1 (B is a submodule of A)
>> A -> C (C is a submodule of A)
>> C -> B:rev2 (B is a submodule of C)
>> A anc C are both using B as submodules, but on different revs.
>> Now if we have:
>> B:rev1 -> D
>> B:rev2 -> E
>> Then, the mirror will not be fully used.
>> Say we have all repo mirrors for A, B, C, D, E, then in theory it's not
>> necessary to reach out to any network for downloading. But it's not the
>> case. After downloading B(rev1) and its submodule D from mirrors, the fetch
>> process continues to download C, thus B(rev2) and E. Now it finds that B
>> needs an update because its submodule E needs an update. Of course this is
>> true because E is not downloaded yet. Now the problem comes to whether to
>> use mirror or not. The git.py defines try_premirror to return 'False' when
>> the ud.clonedir exists. As B has been cloned, the ud.clonedir exists and
>> try_mirror returns False, resulting in not using mirror and going to upstream
>> directly.
>>
>> We can see that the mirrors are not fully used. This is usually not problem,
>> as the cost is only some network download. But in case the following two
>> settings are there, we get errors.
>> BB_NO_NETWORK = "0"
>> BB_ALLOWED_NETWORKS = "*.some.allowed.domain"
>> In such case, the gitsm recipe A will fail to fetch. Note that all contents
>> that A needs are in mirrors and now it's failing to fetch. This is unexpected.
>>
>> Note that the different revs of the same repo in gitsm recipe is not the only
>> way to reveal this problem. For example, there might be a recipe call B that
>> uses B:rev3. Check the protobuf and grpc recipes as an example.
>>
>> For now, we can use the following steps to reproduce this issue. To be clear,
>> the grpc recipe in meta-oe is now 1.60.0.
>> 1. Add in local.conf:
>>     DL_DIR = "${TOPDIR}/downloads-premirror"
>>     bitbake grpc -c fetch
>> 2. Comment out the DL_DIR setting in local.conf and add the following lines:
>>     PREMIRRORS:append = " \
>>       git://.*/.* git://${TOPDIR}/downloads-premirror/git2/MIRRORNAME;protocol=file \n \
>>       gitsm://.*/.* gitsm://${TOPDIR}/downloads-premirror/git2/MIRRORNAME;protocol=file \n \
>>     "
>> 3. Set BB_NO_NETWORK = "1" and then 'bitbake grpc -c fetch'.
>>     This command succeeds and this shows that the premirror holds everything we need.
>> 4. Add the following lines and then 'bitbake grpc -c fetch'.
>>     BB_NO_NETWORK = "0"
>>     BB_ALLOWED_NETWORKS = "*.some.domain"
>>
>> After step 4, the error message is as below:
>> ERROR: grpc-1.60.0-r0 do_fetch: The URL: 'gitsm://github.com/protocolbuffers/protobuf.git;protocol=https;name=third_party/protobuf;subpath=third_party/protobuf;nobranch=1;lfs=True;bareclone=1;nobranch=1' is not trusted and cannot be used
>>
>> I guess that the try_premirror codes in git fetcher (fetch2/git.py) are assuming
>> that the clonedir syncs with the mirror repo. That's only reason I can think of
>> for why it returns 'False' if the clonedir exists. The assumption is wrong. The
>> mirror could well be updated via automation tools or some project-level setup
>> tools. Yet, it's returning True when BB_FETCH_PREMIRRORONLY is set, regardless
>> of whether the clonedir exists. This means it does know that the mirror could be
>> updated. It seems that it treats using mirror as some kind of 'best effort'. Anyway,
>> the codes' logic is just contradicting with itself.
>>
>> If the mirror can be possibly updated, then it should be tried regardless of whether
>> the clonedir exists. So we should remove the try_premirror codes here and use the
>> default one (in __init__.py), which always returns True.
>>
>> Signed-off-by: Chen Qi <Qi.Chen@windriver.com>
>> ---
>>   lib/bb/fetch2/git.py | 9 ---------
>>   1 file changed, 9 deletions(-)
>>
>> diff --git a/lib/bb/fetch2/git.py b/lib/bb/fetch2/git.py
>> index 7aa3fd9e5..62618012b 100644
>> --- a/lib/bb/fetch2/git.py
>> +++ b/lib/bb/fetch2/git.py
>> @@ -353,15 +353,6 @@ class Git(FetchMethod):
>>       def tarball_need_update(self, ud):
>>           return ud.write_tarballs and not os.path.exists(ud.fullmirror)
>>   
>> -    def try_premirror(self, ud, d):
>> -        # If we don't do this, updating an existing checkout with only premirrors
>> -        # is not possible
>> -        if bb.utils.to_boolean(d.getVar("BB_FETCH_PREMIRRORONLY")):
>> -            return True
>> -        if os.path.exists(ud.clonedir):
>> -            return False
>> -        return True
>> -
>>       def download(self, ud, d):
>>           """Fetch url"""
> I'm not sure this is a good idea. The problem is that the default
> fetcher would not try and incrementally fetch into an existing clone
> without this code and that is a common workflow.

'Incremental fetch'. That's the reason! Thanks.

I'll come up with another solution which will not break this workflow.

> Did the existing test cases pass after this change?

Is the following command all I need to ensure no regression in bitbake 
fetcher?

bitbake-selftest bb.tests.fetch


>
> We probably should have a test case for this issue too.
Yes. I'll add one.
>   Things have
> changes now mirrors/premirrors are non-destructive to existing git
> repos so we may be able to improve the core logic to fix this corner
> case but it can't be at the expense of some of the other workflows.

Got it.

I just found a related bug here: 
https://bugzilla.yoctoproject.org/show_bug.cgi?id=15369

Though it's not the same problem, it's also related to try_premirror logic.

I'll add some comments in the try_premirror function to avoid confusion 
in the future. I'll also check to see if I can add a test case to cover 
this 'incremental fetch' workflow.

Regards,

Qi


>
> Cheers,
>
> Richard




      reply	other threads:[~2024-01-31  3:41 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-30 11:11 [bitbake-devel][PATCH] fetch2/git.py: fix the try_premirror logic Qi.Chen
2024-01-30 14:56 ` Richard Purdie
2024-01-31  3:41   ` ChenQi [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ce3c2fd0-50d1-87de-8981-6817aaad5abc@windriver.com \
    --to=qi.chen@windriver.com \
    --cc=bitbake-devel@lists.openembedded.org \
    --cc=richard.purdie@linuxfoundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).