From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from lists.gentoo.org (pigeon.gentoo.org [208.92.234.80]) by finch.gentoo.org (Postfix) with ESMTP id 1C50B138A6C for ; Sat, 18 Apr 2015 17:45:56 +0000 (UTC) Received: from pigeon.gentoo.org (localhost [127.0.0.1]) by pigeon.gentoo.org (Postfix) with SMTP id 9A156E0908; Sat, 18 Apr 2015 17:45:53 +0000 (UTC) Received: from smtp.gentoo.org (smtp.gentoo.org [140.211.166.183]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by pigeon.gentoo.org (Postfix) with ESMTPS id CC248E0904 for ; Sat, 18 Apr 2015 17:45:52 +0000 (UTC) Received: from pomiot.lan (77-253-156-177.adsl.inetia.pl [77.253.156.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) (Authenticated sender: mgorny) by smtp.gentoo.org (Postfix) with ESMTPSA id 88803340CD0; Sat, 18 Apr 2015 17:45:50 +0000 (UTC) Date: Sat, 18 Apr 2015 19:45:34 +0200 From: =?UTF-8?B?TWljaGHFgiBHw7Nybnk=?= To: Brian Dolbec Cc: gentoo-portage-dev@lists.gentoo.org Subject: Re: [gentoo-portage-dev] [PATCH] Contribute squashdelta syncing module Message-ID: <20150418194534.01f43b72@pomiot.lan> In-Reply-To: <20150416103822.69555cb3.dolsen@gentoo.org> References: <1428228511-22133-1-git-send-email-mgorny@gentoo.org> <20150416103822.69555cb3.dolsen@gentoo.org> Organization: Gentoo X-Mailer: Claws Mail 3.11.1 (GTK+ 2.24.27; x86_64-pc-linux-gnu) Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Id: Gentoo Linux mail X-BeenThere: gentoo-portage-dev@lists.gentoo.org Reply-to: gentoo-portage-dev@lists.gentoo.org MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; boundary="Sig_/YV+P9gjbmzEqP0ASwg2gI09"; protocol="application/pgp-signature" X-Archives-Salt: 29cd3bad-581c-4e48-8218-4fd4b49a2197 X-Archives-Hash: 9d1a1828cb7b09f5770595c991136409 --Sig_/YV+P9gjbmzEqP0ASwg2gI09 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Dnia 2015-04-16, o godz. 10:38:22 Brian Dolbec napisa=C5=82(a): > On Sun, 5 Apr 2015 12:08:31 +0200 > Micha=C5=82 G=C3=B3rny wrote: >=20 > > The squashdelta module provides syncing via SquashFS snapshots. For > > the initial sync, a complete snapshot is fetched and placed in > > /var/cache/portage/squashfs. On subsequent sync operations, deltas are > > fetched from the mirror and used to reconstruct the newest snapshot. > >=20 > > The distfile fetching logic is reused to fetch the remote files > > and verify their checksums. Additionally, the sha512sum.txt file > > should be OpenPGP-verified after fetching but this is currently > > unimplemented. > >=20 > > After fetching, Portage tries to (re-)mount the SquashFS in repository > > location. > > --- > > cnf/repos.conf | 4 + > > pym/portage/sync/modules/squashdelta/README | 124 > > +++++++++++++ pym/portage/sync/modules/squashdelta/__init__.py | > > 37 ++++ .../sync/modules/squashdelta/squashdelta.py | 192 > > +++++++++++++++++++++ 4 files changed, 357 insertions(+) > > create mode 100644 pym/portage/sync/modules/squashdelta/README > > create mode 100644 pym/portage/sync/modules/squashdelta/__init__.py > > create mode 100644 > > pym/portage/sync/modules/squashdelta/squashdelta.py > >=20 > > diff --git a/cnf/repos.conf b/cnf/repos.conf > > index 1ca98ca..062fc0d 100644 > > --- a/cnf/repos.conf > > +++ b/cnf/repos.conf > > @@ -6,3 +6,7 @@ location =3D /usr/portage > > sync-type =3D rsync > > sync-uri =3D rsync://rsync.gentoo.org/gentoo-portage > > auto-sync =3D yes > > + > > +# for daily squashfs snapshots > > +#sync-type =3D squashdelta > > +#sync-uri =3D mirror://gentoo/../snapshots/squashfs > > >=20 > >=20 > > diff --git a/pym/portage/sync/modules/squashdelta/__init__.py > > b/pym/portage/sync/modules/squashdelta/__init__.py new file mode > > 100644 index 0000000..1a17dea > > --- /dev/null > > +++ b/pym/portage/sync/modules/squashdelta/__init__.py > > @@ -0,0 +1,37 @@ > > +# vim:fileencoding=3Dutf-8:noet > > +# (c) 2015 Micha=C5=82 G=C3=B3rny > > +# Distributed under the terms of the GNU General Public License v2 > > + > > +from portage.sync.config_checks import CheckSyncConfig > > + > > + > > +DEFAULT_CACHE_LOCATION =3D '/var/cache/portage/squashfs' > > + > > + > > +class CheckSquashDeltaConfig(CheckSyncConfig): > > + def __init__(self, repo, logger): > > + CheckSyncConfig.__init__(self, repo, logger) > > + self.checks.append('check_cache_location') > > + > > + def check_cache_location(self): > > + # TODO: make it configurable when Portage is fixed > > to support > > + # arbitrary config variables > > + pass > > + > > + > > +module_spec =3D { > > + 'name': 'squashdelta', > > + 'description': 'Syncing SquashFS images using SquashDeltas', > > + 'provides': { > > + 'squashdelta-module': { > > + 'name': "squashdelta", > > + 'class': "SquashDeltaSync", > > + 'description': 'Syncing SquashFS images > > using SquashDeltas', > > + 'functions': ['sync', 'new', 'exists'], > > + 'func_desc': { > > + 'sync': 'Performs the sync of the > > repository', > > + }, > > + 'validate_config': CheckSquashDeltaConfig, > > + } > > + } > > +} > > diff --git a/pym/portage/sync/modules/squashdelta/squashdelta.py > > b/pym/portage/sync/modules/squashdelta/squashdelta.py new file mode > > 100644 index 0000000..a0dfc46 > > --- /dev/null > > +++ b/pym/portage/sync/modules/squashdelta/squashdelta.py > > @@ -0,0 +1,192 @@ > > +# vim:fileencoding=3Dutf-8:noet > > +# (c) 2015 Micha=C5=82 G=C3=B3rny > > +# Distributed under the terms of the GNU General Public License v2 > > + > > +import errno > > +import io > > +import logging > > +import os > > +import os.path > > +import re > > + > > +import portage > > +from portage.package.ebuild.fetch import fetch > > +from portage.sync.syncbase import SyncBase > > + > > +from . import DEFAULT_CACHE_LOCATION > > + > > + > > +class SquashDeltaSync(SyncBase): >=20 >=20 > OK, I see a small mistake. You are subclassing SyncBase which does not > stub out a new() and you do not define one here. But you export a new() > in the module-spec above. Fixed. I removed them from [func_desc], and apparently forgot to do so from [functions]. > > + short_desc =3D "Repository syncing using SquashFS deltas" > > + > > + @staticmethod > > + def name(): > > + return "SquashDeltaSync" > > + > > + def __init__(self): > > + super(SquashDeltaSync, self).__init__( > > + 'squashmerge', > > 'dev-util/squashmerge') + > > + def sync(self, **kwargs): > > + self._kwargs(kwargs) > > + my_settings =3D portage.config(clone =3D self.settings) > > + cache_location =3D DEFAULT_CACHE_LOCATION > > + > > + # override fetching location > > + my_settings['DISTDIR'] =3D cache_location > > + > > + # make sure we append paths correctly > > + base_uri =3D self.repo.sync_uri > > + if not base_uri.endswith('/'): > > + base_uri +=3D '/' > > + > > + def my_fetch(fn, **kwargs): > > + kwargs['try_mirrors'] =3D 0 > > + return fetch([base_uri + fn], my_settings, > > **kwargs) + > > + # fetch sha512sum.txt > > + sha512_path =3D os.path.join(cache_location, > > 'sha512sum.txt') > > + try: > > + os.unlink(sha512_path) > > + except OSError: > > + pass > > + if not my_fetch('sha512sum.txt'): > > + return (1, False) > > + > > + if 'webrsync-gpg' in my_settings.features: > > + # TODO: GPG signature verification > > + pass > > + > > + # sha512sum.txt parsing > > + with io.open(sha512_path, 'r', encoding=3D'utf8') as f: > > + data =3D f.readlines() > > + > > + repo_re =3D re.compile(self.repo.name + '-(.*)$') > > + # current tag > > + current_re =3D re.compile('current:', re.IGNORECASE) > > + # checksum > > + checksum_re =3D re.compile('^([a-f0-9]{128})\s+(.*)$', > > re.IGNORECASE) + > > + def iter_snapshots(lines): > > + for l in lines: > > + m =3D current_re.search(l) > > + if m: > > + for s in l[m.end():].split(): > > + yield s > > + > > + def iter_checksums(lines): > > + for l in lines: > > + m =3D checksum_re.match(l) > > + if m: > > + yield (m.group(2), { > > + 'size': None, > > + 'SHA512': m.group(1), > > + }) > > + > > + # look for current indicator > > + for s in iter_snapshots(data): > > + m =3D repo_re.match(s) > > + if m: > > + new_snapshot =3D m.group(0) + '.sqfs' > > + new_version =3D m.group(1) > > + break > > + else: > > + logging.error('Unable to find current > > snapshot in sha512sum.txt') > > + return (1, False) > > + new_path =3D os.path.join(cache_location, new_snapshot) > > + > > + # get digests > > + my_digests =3D dict(iter_checksums(data)) > > + > > + # try to find a local snapshot > > + old_version =3D None > > + current_path =3D os.path.join(cache_location, > > + self.repo.name + '-current.sqfs') > > + try: > > + old_snapshot =3D os.readlink(current_path) > > + except OSError: > > + pass > > + else: > > + m =3D repo_re.match(old_snapshot) > > + if m and old_snapshot.endswith('.sqfs'): > > + old_version =3D m.group(1)[:-5] > > + old_path =3D > > os.path.join(cache_location, old_snapshot) + > > + if old_version is not None: > > + if old_version =3D=3D new_version: > > + logging.info('Snapshot up-to-date, > > verifying integrity.') > > + else: > > + # attempt to update > > + delta_path =3D None > > + expected_delta =3D '%s-%s-%s.sqdelta' > > % ( > > + self.repo.name, > > old_version, new_version) > > + if expected_delta not in my_digests: > > + logging.warning('No delta > > for %s->%s, fetching new snapshot.' > > + % > > (old_version, new_version)) > > + else: > > + delta_path =3D > > os.path.join(cache_location, expected_delta) + > > + if not > > my_fetch(expected_delta, digests =3D my_digests): > > + return (4, False) > > + if not self.has_bin: > > + return (5, False) > > + > > + ret =3D > > portage.process.spawn([self.bin_command, > > + old_path, > > delta_path, new_path], **self.spawn_kwargs) > > + if ret !=3D os.EX_OK: > > + > > logging.error('Merging the delta failed') > > + return (6, False) > > + > > + # pass-through to > > verification and cleanup + > > + # fetch full snapshot or verify the one we have > > + if not my_fetch(new_snapshot, digests =3D my_digests): > > + return (2, False) > > + > > + # create/update -current symlink > > + # using external ln for two reasons: > > + # 1. clean --force (unlike python's unlink+symlink) > > + # 2. easy userpriv (otherwise we'd have to lchown()) > > + ret =3D portage.process.spawn(['ln', '-s', '-f', > > new_snapshot, current_path], > > + **self.spawn_kwargs) > > + if ret !=3D os.EX_OK: > > + logging.error('Unable to set -current > > symlink') > > + retrurn (3, False) > > + > > + # remove old snapshot > > + if old_version is not None and old_version !=3D > > new_version: > > + try: > > + os.unlink(old_path) > > + except OSError as e: > > + logging.warning('Unable to unlink > > old snapshot: ' + str(e)) > > + if delta_path is not None: > > + try: > > + os.unlink(delta_path) > > + except OSError as e: > > + logging.warning('Unable to > > unlink old delta: ' + str(e)) > > + try: > > + os.unlink(sha512_path) > > + except OSError as e: > > + logging.warning('Unable to unlink > > sha512sum.txt: ' + str(e)) + > > + mount_cmd =3D ['mount', current_path, > > self.repo.location] > > + can_mount =3D True > > + if os.path.ismount(self.repo.location): > > + # need to umount old snapshot > > + ret =3D portage.process.spawn(['umount', '-l', > > self.repo.location]) > > + if ret !=3D os.EX_OK: > > + logging.warning('Unable to unmount > > old SquashFS after update') > > + can_mount =3D False > > + else: > > + try: > > + os.makedirs(self.repo.location) > > + except OSError as e: > > + if e.errno !=3D errno.EEXIST: > > + raise > > + > > + if can_mount: > > + ret =3D portage.process.spawn(mount_cmd) > > + if ret !=3D os.EX_OK: > > + logging.warning('Unable to > > (re-)mount SquashFS after update') + > > + return (0, True) >=20 > Overall the code itself looks decent. Aside from the small mistake > mentioned inline, my only concern is the sheer size of the sync(). It > is 162 lines and embeds 2 private functions. This code could easily be > broken up into several smaller task functions. It would make reading > the main sync() logic easier as well as the smaller task sections. I > am not a fan of the long winded functions and scripts present in > portage (this by no means is in the same category as many of those). > But I certainly don't want to let more of that in if I can help it. And > aim to reduce it while I'm the lead. Will try. > Ok, so the only data variable you wanted to add to the repos.conf was > the cache location? Yes, right now just that, I think. Maybe in the future we'd use per-repo OpenPGP verification switch. Something like: openpgp-verify =3D [no|yes|required] With 'yes' meaning 'verify if signed', and 'required' meaning 'refuse if not signed'. > I'll work on adding the gkeys integration in the gkeys branch I started > for the gpg verification. I see no point in porting the code from > emerge-webrsync's bash to python only to be replaced by gkeys in the > very near future. Please stub out a function & call for it when you > address the above issues. I'll fill in the code for it. Ok. --=20 Best regards, Micha=C5=82 G=C3=B3rny --Sig_/YV+P9gjbmzEqP0ASwg2gI09 Content-Type: application/pgp-signature Content-Description: OpenPGP digital signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQJ8BAEBCgBmBQJVMpg/XxSAAAAAAC4AKGlzc3Vlci1mcHJAbm90YXRpb25zLm9w ZW5wZ3AuZmlmdGhob3JzZW1hbi5uZXQ2REJCMDdDQzRGMERBRDA2RUEwQUZFNDFC MDdBMUFFQUVGQjQ0NjRFAAoJELB6GurvtEZOdugP/A/5V/r0Bqr+p8jMbXTfmUgK hB9zjdYeCxYzrHaoaOaDh6TcSSgio2VAsjknuoHbwAWzx/twXvGA29lFBtWnmGpB IfQ9I+EqCtl7L2KI5ZLnTV2WgVh3lzSP6fcAqLAcBENjvHMi8spPzp+lPXMsdxAf 3zrXFswFt2w437wFlzHO/oCDLMzRASXV+2xmWTAiRUUuqZ6iogswd610G9AbGOwh 3GxCPByJKCSbuTucTYFg5sVcG29bEeXawAVyU+BaHExIfApA2R+emiS0iS8akcuS 5Q+AnqEfhTcC8VQbqNCr3uJAtTCZoz3nnBWhmCTNKgV9MdKMWYo1q4hYUO6NTsSE hus7S3ID8fQzrldQ0x9LhN7GXneAU6Bf8mDUqUm/m13tYwWC/+pvd1o0CEmf9lj1 bZGN5RcJMYSGdWl94gID/x07BM1StkIlyd52OaMx84GJDFwGWiDfzws229QxQQ6s 6i/H9b9/W2vC3FrN+p1BhSLx9upJgU0/oEPxFpWT14vHj3pHb2VQswbYuBuwNb7P 68crq68xKAawOn8kZdNnzLFK6Mr2uqrIiE9jb9OExZbJlnL6Aizc+wZmmbMaB+bo L6bB7HGHryuC84iDZmJUGm4NvvcE4b9k9lwnYjYfKRj3VOcsyGeeFsYXn5yfI2Pw DWHtnY7pYKrn4b4ftrg7 =N4C0 -----END PGP SIGNATURE----- --Sig_/YV+P9gjbmzEqP0ASwg2gI09--