From: "Sam James" <sam@gentoo.org>
To: gentoo-commits@lists.gentoo.org
Subject: [gentoo-commits] proj/portage:master commit in: /, lib/portage/sync/modules/git/
Date: Wed, 21 Dec 2022 01:28:10 +0000 (UTC) [thread overview]
Message-ID: <1671586083.92f8d27e034558edd749a08e5d61d579dc7172a8.sam@gentoo> (raw)
commit: 92f8d27e034558edd749a08e5d61d579dc7172a8
Author: Sam James <sam <AT> gentoo <DOT> org>
AuthorDate: Sat Dec 17 05:51:02 2022 +0000
Commit: Sam James <sam <AT> gentoo <DOT> org>
CommitDate: Wed Dec 21 01:28:03 2022 +0000
URL: https://gitweb.gentoo.org/proj/portage.git/commit/?id=92f8d27e
sync: git: only clobber changes if repository is marked as non-volatile
The fundamental issue with regard to Portage handling sync-type=git
repositories is follows: whenever Portage is syncing something, our
view was that we should prioritise successful completion of syncs,
especially given it may be performed unattended.
The rationale for this is that most people who use sync-type=git are doing so
because they want quicker sync (to complete), a more reliable one
(no Manifest race condition issues), and to get changes from upstream faster.
Recently, Portage started doing two things:
1. Clobbering changes to repositories with sync-type=git
If git is pulling from a CDN, Portage may on one sync receive state X, but
on a subsequent sync receive state X-1. This can cause an odd state
where git wants to resolve conflicts and manual intervention is required,
hence needing git reset.
This situation was often worse with sync-depth=1 and would lead
to orphaned files hence needing git clean.
2. Defaulted to sync-depth=clone-depth=1.
The motivation here was because of disk space growing unbounded
otherwise. It makes sense when considered within the school of thought / motivation
listed above (prioritising a successful sync and then making an optimisation
based on how we achieve that).
Further, there was additional motivation to avoid disk usage growing
unbounded (which additionally could lead to a sync failing at some point).
--
To allow users to opt-out of these destructive changes, we now
have a repository setting called 'volatile'.
* If volatile=yes, the repository is assumed to be user-modifiable
and Portage will NOT prioritise successful sync over preserving
repository state.
* If volatile=no, the repository is assumed to only be modified
by Portage, and Portage will try its best to ensure syncs continue
to work.
This is a followup to 7a336e85052530c62acf25b5df10e2d02a17e779
and 1b23d15e4c71d95c6c2520dcb85f8bf90ddf430c.
Bug: https://bugs.gentoo.org/887025
See: 7a336e85052530c62acf25b5df10e2d02a17e779
See: 1b23d15e4c71d95c6c2520dcb85f8bf90ddf430c
See: https://github.com/gentoo/portage/pull/801
See: https://github.com/gentoo/portage/pull/931
Closes: https://github.com/gentoo/portage/pull/939
Closes: https://github.com/gentoo/portage/pull/960
Signed-off-by: Sam James <sam <AT> gentoo.org>
NEWS | 53 +++++++++++++++++++++-
lib/portage/sync/modules/git/git.py | 89 ++++++++++++++++++++++++++++++-------
2 files changed, 124 insertions(+), 18 deletions(-)
diff --git a/NEWS b/NEWS
index 98f9c71a8..0f5625f00 100644
--- a/NEWS
+++ b/NEWS
@@ -1,10 +1,61 @@
portage-3.0.42 (UNRELEASED)
--------------
+Breaking changes:
+* Portage changed its git sync behaviour for repositories with sync-type=git.
+
+ Recently in Portage 3.0.39 (for sync depth) and Portage 3.0.40 (git reset/clobbering),
+ Portage started doing two things:
+
+ 1. Clobbering changes to repositories with sync-type=git
+
+ If git is pulling from a CDN, Portage may on one sync receive state X, but
+ on a subsequent sync receive state X-1. This can cause an odd state
+ where git wants to resolve conflicts and manual intervention is required,
+ hence needing git reset.
+
+ This situation was often worse with sync-depth=1 and would lead
+ to orphaned files hence needing git clean.
+
+ 2. Defaulted to sync-depth=clone-depth=1.
+
+ The motivation here was because of disk space growing unbounded
+ otherwise. It makes sense when considered within the school of thought / motivation
+ listed above (prioritising a successful sync and then making an optimisation
+ based on how we achieve that).
+
+ Further, there was additional motivation to avoid disk usage growing
+ unbounded (which additionally could lead to a sync failing at some point).
+
+ Portage 3.0.42 will now only make a repository shallow if:
+ 1. volatile=yes and it is a new sync (i.e. it was not deep before), or
+ 2. volatile=no and sync-depth is unset in repos.conf.
+
+ --
+
+ To allow users to opt-out of these destructive changes, we now
+ have a repository setting called 'volatile'.
+
+ * If volatile=yes, the repository is assumed to be user-modifiable
+ and Portage will NOT prioritise successful sync over preserving
+ repository state.
+
+ * If volatile=no, the repository is assumed to only be modified
+ by Portage, and Portage will try its best to ensure syncs continue
+ to work.
+
Features:
* cnf: make.conf.example.loong: add for the loong arch (bug #884135).
-* git: only perform shallow updates if the repository is a shallow one
+* sync: git: only perform destructive operations like 'git reset' to keep
+ sync working and the repository in a predictable state for sync-type=git
+ repositories if the repository is not marked volatile.
+
+* sync: git: only perform shallow updates if the repository is a shallow one
+ or if the repository is not marked volatile.
+
+* sync: git: run 'git clean' in git repositories if they are marked as
+ non-volatile.
Bug fixes:
* glsa: Abort if a GLSA's arch list doesn't match the expected format (bug #882797).
diff --git a/lib/portage/sync/modules/git/git.py b/lib/portage/sync/modules/git/git.py
index 7af665e5f..fe8a2c995 100644
--- a/lib/portage/sync/modules/git/git.py
+++ b/lib/portage/sync/modules/git/git.py
@@ -172,6 +172,54 @@ class GitSync(NewBase):
if self.settings.get("PORTAGE_QUIET") == "1":
git_cmd_opts += " --quiet"
+ # The logic here is a bit delicate. We need to balance two things:
+ # 1. Having a robust sync mechanism which works unattended.
+ # 2. Allowing users to have the flexibility they might expect when using
+ # a git repository in repos.conf for syncing.
+ #
+ # For sync-type=git repositories, we've seen a problem in the wild
+ # where shallow clones end up "breaking themselves" especially when
+ # the origin is behing a CDN. 'git pull' might return state X,
+ # but on a subsequent pull, return state X-1. git will then (sometimes)
+ # leave orphaned untracked files in the repository. On a subsequent pull,
+ # when state >= X is returned where those files exist in the origin,
+ # git then refuses to write over them and aborts to avoid clobbering
+ # local work.
+ #
+ # To mitigate this, Portage will aggressively clobber any changes
+ # in the local directory, as its priority is to keep syncing working,
+ # by running 'git clean' and 'git reset --hard'.
+ #
+ # Portage performs this clobbering if:
+ # 1. sync-type=git
+ # 2.
+ # - volatile=no (explicitly set to no), OR
+ # - volatile is unset AND the repository owner is neither root or portage
+ # 3. Portage is syncing the respository (rather than e.g. auto-sync=no
+ # and never running 'emaint sync -r foo')
+ #
+ # Portage will not clobber if:
+ # 1. volatile=yes (explicitly set in the config), OR
+ # 2. volatile is unset and the repository owner is root or portage.
+ #
+ # 'volatile' refers to whether the repository is volatile and may
+ # only be safely changed by Portage itself, i.e. whether Portage
+ # should expect the user to change it or not.
+ #
+ # - volatile=yes:
+ # The repository is volatile and may be changed at any time by the user.
+ # Portage will not perform destructive operations on the repository.
+ # - volatile=no
+ # The repository is not volatile. Only Portage may modify the
+ # repository. User changes may be lost.
+ # Portage may perform destructive operations on the repository
+ # to keep sync working.
+ #
+ # References:
+ # bug #887025
+ # bug #824782
+ # https://archives.gentoo.org/gentoo-dev/message/f58a97027252458ad0a44090a2602897
+
# Default: Perform shallow updates (but only if the target is
# already a shallow repository).
sync_depth = 1
@@ -273,23 +321,28 @@ class GitSync(NewBase):
if not self.verify_head(revision="refs/remotes/%s" % remote_branch):
return (1, False)
- # Clean up the repo before trying to sync to upstream's
- clean_cmd = [self.bin_command, "clean", "--force", "-d", "-x"]
+ if not self.repo.volatile:
+ # Clean up the repo before trying to sync to upstream.
+ # - Only done for volatile=false repositories to avoid losing
+ # data.
+ # - This is needed to avoid orphaned files preventing further syncs
+ # on shallow clones.
+ clean_cmd = [self.bin_command, "clean", "--force", "-d", "-x"]
- if quiet:
- clean_cmd.append("--quiet")
+ if quiet:
+ clean_cmd.append("--quiet")
- portage.process.spawn(
- clean_cmd,
- cwd=portage._unicode_encode(self.repo.location),
- **self.spawn_kwargs,
- )
+ exitcode = portage.process.spawn(
+ clean_cmd,
+ cwd=portage._unicode_encode(self.repo.location),
+ **self.spawn_kwargs,
+ )
- if exitcode != os.EX_OK:
- msg = "!!! git clean error in %s" % self.repo.location
- self.logger(self.xterm_titles, msg)
- writemsg_level(msg + "\n", level=logging.ERROR, noiselevel=-1)
- return (exitcode, False)
+ if exitcode != os.EX_OK:
+ msg = "!!! git clean error in %s" % self.repo.location
+ self.logger(self.xterm_titles, msg)
+ writemsg_level(msg + "\n", level=logging.ERROR, noiselevel=-1)
+ return (exitcode, False)
# `git diff --quiet` returns 0 on a clean tree and 1 otherwise
is_clean = (
@@ -301,9 +354,9 @@ class GitSync(NewBase):
== 0
)
- if not is_clean:
+ if not is_clean and not self.repo.volatile:
# If the repo isn't clean, clobber any changes for parity
- # with rsync
+ # with rsync. Only do this for non-volatile repositories.
merge_cmd = [self.bin_command, "reset", "--hard"]
elif shallow:
# Since the default merge strategy typically fails when
@@ -311,16 +364,18 @@ class GitSync(NewBase):
merge_cmd = [self.bin_command, "reset", "--merge"]
else:
merge_cmd = [self.bin_command, "merge"]
+
merge_cmd.append("refs/remotes/%s" % remote_branch)
if quiet:
merge_cmd.append("--quiet")
+
exitcode = portage.process.spawn(
merge_cmd,
cwd=portage._unicode_encode(self.repo.location),
**self.spawn_kwargs,
)
- if exitcode != os.EX_OK:
+ if exitcode != os.EX_OK and not self.repo.volatile:
# HACK - sometimes merging results in a tree diverged from
# upstream, so try to hack around it
# https://stackoverflow.com/questions/41075972/how-to-update-a-git-shallow-clone/41081908#41081908
next reply other threads:[~2022-12-21 1:28 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-12-21 1:28 Sam James [this message]
-- strict thread matches above, loose matches on Subject: below --
2023-10-22 19:03 [gentoo-commits] proj/portage:master commit in: /, lib/portage/sync/modules/git/ Sam James
2023-02-22 17:33 Sam James
2022-12-31 14:11 Sam James
2022-12-21 1:28 Sam James
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1671586083.92f8d27e034558edd749a08e5d61d579dc7172a8.sam@gentoo \
--to=sam@gentoo.org \
--cc=gentoo-commits@lists.gentoo.org \
--cc=gentoo-dev@lists.gentoo.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox