From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from lists.gentoo.org (pigeon.gentoo.org [208.92.234.80]) by finch.gentoo.org (Postfix) with ESMTP id 40C1F138A6C for ; Sat, 18 Apr 2015 18:29:43 +0000 (UTC) Received: from pigeon.gentoo.org (localhost [127.0.0.1]) by pigeon.gentoo.org (Postfix) with SMTP id A8208E0712; Sat, 18 Apr 2015 18:29:42 +0000 (UTC) Received: from smtp.gentoo.org (smtp.gentoo.org [140.211.166.183]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by pigeon.gentoo.org (Postfix) with ESMTPS id 145E1E01B5 for ; Sat, 18 Apr 2015 18:29:42 +0000 (UTC) Received: from pomiot.lan (77-253-156-177.adsl.inetia.pl [77.253.156.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-SHA256 (128/128 bits)) (No client certificate requested) (Authenticated sender: mgorny) by smtp.gentoo.org (Postfix) with ESMTPSA id 95987340CD0; Sat, 18 Apr 2015 18:29:40 +0000 (UTC) From: =?UTF-8?q?Micha=C5=82=20G=C3=B3rny?= To: gentoo-portage-dev@lists.gentoo.org Cc: =?UTF-8?q?Micha=C5=82=20G=C3=B3rny?= Subject: [gentoo-portage-dev] [PATCH v2] Contribute squashdelta syncing module Date: Sat, 18 Apr 2015 20:29:32 +0200 Message-Id: <1429381772-9901-1-git-send-email-mgorny@gentoo.org> X-Mailer: git-send-email 2.3.5 In-Reply-To: <20150416103822.69555cb3.dolsen@gentoo.org> References: <20150416103822.69555cb3.dolsen@gentoo.org> Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Id: Gentoo Linux mail X-BeenThere: gentoo-portage-dev@lists.gentoo.org Reply-to: gentoo-portage-dev@lists.gentoo.org MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Archives-Salt: 689bba53-e391-4a73-bb31-7b15566dfd1e X-Archives-Hash: ca55de3696c16f37267bb20707186dc3 The squashdelta module provides syncing via SquashFS snapshots. For the initial sync, a complete snapshot is fetched and placed in /var/cache/portage/squashfs. On subsequent sync operations, deltas are fetched from the mirror and used to reconstruct the newest snapshot. The distfile fetching logic is reused to fetch the remote files and verify their checksums. Additionally, the sha512sum.txt file should be OpenPGP-verified after fetching but this is currently unimplemented. After fetching, Portage tries to (re-)mount the SquashFS in repository location. --- cnf/repos.conf | 4 + pym/portage/sync/modules/squashdelta/README | 124 +++++++++++ pym/portage/sync/modules/squashdelta/__init__.py | 37 ++++ .../sync/modules/squashdelta/squashdelta.py | 231 +++++++++++++++++++++ 4 files changed, 396 insertions(+) create mode 100644 pym/portage/sync/modules/squashdelta/README create mode 100644 pym/portage/sync/modules/squashdelta/__init__.py create mode 100644 pym/portage/sync/modules/squashdelta/squashdelta.py diff --git a/cnf/repos.conf b/cnf/repos.conf index 1ca98ca..062fc0d 100644 --- a/cnf/repos.conf +++ b/cnf/repos.conf @@ -6,3 +6,7 @@ location = /usr/portage sync-type = rsync sync-uri = rsync://rsync.gentoo.org/gentoo-portage auto-sync = yes + +# for daily squashfs snapshots +#sync-type = squashdelta +#sync-uri = mirror://gentoo/../snapshots/squashfs diff --git a/pym/portage/sync/modules/squashdelta/README b/pym/portage/sync/modules/squashdelta/README new file mode 100644 index 0000000..994ae6d --- /dev/null +++ b/pym/portage/sync/modules/squashdelta/README @@ -0,0 +1,124 @@ +================== + squashdelta-sync +================== + +Introduction +============ + +Squashdelta-sync provides the squashfs syncing module for Portage. +When used as sync-type for the repository, it fetches the complete +repository snapshot on initial sync, and then uses squashdeltas to +efficiently update it. + +While initially intended for the daily snapshot of the Gentoo +repository, the module is designed with flexibility in mind. It can be +used to sync any repository, without enforcing any specific snapshotting +interval or versioning rules. However, each snapshot version identifier +must be unique in the scope of repository. + + +Technical hosting details +========================= + +The snapshot hosting needs to provide the following files: + +1. the current (newest) full SquashFS snapshot of the repository, + and optionally M past snapshots, + +2. the deltas from N past snapshots to the current snapshot, + +3. a ``sha512sum.txt`` file containing SHA-512 checksums of all hosted + files, optionally OpenPGP-signed. + +The following naming schemes are used for the snapshots and deltas, +respectively:: + + ${repo_name}-${version}.sqfs + ${repo_name}-${old_version}-${new_version}.sqdelta + +where: + +* ``${repo_name}`` is the repository name (as specified + in ``repos.conf``), +* ``${version}`` specifies the snapshot version, +* ``${old_version}`` specifies the snapshot version which the delta + updates from, +* ``${new_version}`` specifies the snapshot version which the delta + updates to. + +Version can be an arbitrary string. It does not need to be incremental, +however each version must be unique in the repository scope. +For example, the version can be a date, a revision number or a commit +hash. + +The ``sha512sum.txt`` uses the format used by the GNU coreutils +``sha512sum`` program. That is, it contains one or more lines consisting +of hexadecimal SHA-512 checksum followed by whitespace, followed by +a filename. Lines not matching that format should be ignored. + +Optionally, the ``sha512sum.txt`` may be OpenPGP-signed. In that case, +the file conforms to the ASCII-armored OpenPGP message format, with +the checksums being stored in the message body. + +Additionally, the ``sha512sum.txt`` needs to contain an additional line +containing the following string:: + + Current: ${repo_name}-${version} + +Stating the current (newest) snapshot version. If snapshots for multiple +repositories are provided in the same directory (using the same +``sha512sum.txt`` file), this line can occur multiple times or list +multiple snapshots, whitespace-separated. In order not to introduce +stray lines in the file, it is recommended to embed this information +in the OpenPGP comment field. + +An example script generating daily deltas for a repository can be found +in squashdelta-daily-gen_ repository. + +.. _squashdelta-daily-gen: https://bitbucket.org/mgorny/squashdelta-daily-gen + + +Technical syncing details +========================= + +When performing a sync, the script first fetches the ``sha512sum.txt`` +and processes it in order to determine the list of files available +on the mirror. It should be noted that the script will never use +a snapshot or delta that is not listed there. If the file is +OpenPGP-signed, the signature is verified. + +The script scans scans the ``sha512sum.txt`` for a line containing +the following string (case-insensitive):: + + Current: + +The text following this string is split on spaces, and the resulting +tokens are parsed as snapshot names. The one matching the current +repository name is used to determine the current (newest) snapshot +version. + +Afterwards, the script scans the local cache directory for the following +symlink:: + + ${repo_name}-current.sqfs + +If the symlink exists, the file pointed by it is assumed to be +the current (newest) local snapshot. Otherwise, the script assumes +initial sync. + +On initial sync, the script fetches the newest snapshot from mirror +and places it inside cache directory. The snapshot checksum is verified +using ``sha512sum.txt`` and ``${repo_name}-current.sqfs`` symlink is +created. + +On update, the script scans the file list for a delta transforming +the current local snapshot to the newest remote snapshot. If such +a delta is found, it is fetched, verified and applied to obtain +the new snapshot. Afterwards, the resulting snapshot checksum is +verified and the ``${repo_name}-current.sqfs`` symlink is updated. + +If no delta matches the version pair, it is assumed that the system is +outdated beyond available deltas and a new snapshot is fetched instead +(alike initial sync). + +.. vim:ft=rst diff --git a/pym/portage/sync/modules/squashdelta/__init__.py b/pym/portage/sync/modules/squashdelta/__init__.py new file mode 100644 index 0000000..680835c --- /dev/null +++ b/pym/portage/sync/modules/squashdelta/__init__.py @@ -0,0 +1,37 @@ +# vim:fileencoding=utf-8:noet +# (c) 2015 Michał Górny +# Distributed under the terms of the GNU General Public License v2 + +from portage.sync.config_checks import CheckSyncConfig + + +DEFAULT_CACHE_LOCATION = '/var/cache/portage/squashfs' + + +class CheckSquashDeltaConfig(CheckSyncConfig): + def __init__(self, repo, logger): + CheckSyncConfig.__init__(self, repo, logger) + self.checks.append('check_cache_location') + + def check_cache_location(self): + # TODO: make it configurable when Portage is fixed to support + # arbitrary config variables + pass + + +module_spec = { + 'name': 'squashdelta', + 'description': 'Syncing SquashFS images using SquashDeltas', + 'provides': { + 'squashdelta-module': { + 'name': "squashdelta", + 'class': "SquashDeltaSync", + 'description': 'Syncing SquashFS images using SquashDeltas', + 'functions': ['sync'], + 'func_desc': { + 'sync': 'Performs the sync of the repository', + }, + 'validate_config': CheckSquashDeltaConfig, + } + } +} diff --git a/pym/portage/sync/modules/squashdelta/squashdelta.py b/pym/portage/sync/modules/squashdelta/squashdelta.py new file mode 100644 index 0000000..796a5f0 --- /dev/null +++ b/pym/portage/sync/modules/squashdelta/squashdelta.py @@ -0,0 +1,231 @@ +# vim:fileencoding=utf-8:noet +# (c) 2015 Michał Górny +# Distributed under the terms of the GNU General Public License v2 + +import errno +import io +import logging +import os +import os.path +import re + +import portage +from portage.package.ebuild.fetch import fetch +from portage.sync.syncbase import SyncBase + +from . import DEFAULT_CACHE_LOCATION + + +class SquashDeltaError(Exception): + pass + + +class SquashDeltaSync(SyncBase): + short_desc = "Repository syncing using SquashFS deltas" + + @staticmethod + def name(): + return "SquashDeltaSync" + + def __init__(self): + super(SquashDeltaSync, self).__init__( + 'squashmerge', 'dev-util/squashmerge') + self.repo_re = re.compile(self.repo.name + '-(.*)$') + + def _configure(self): + self.my_settings = portage.config(clone = self.settings) + self.cache_location = DEFAULT_CACHE_LOCATION + + # override fetching location + self.my_settings['DISTDIR'] = self.cache_location + + # make sure we append paths correctly + self.base_uri = self.repo.sync_uri + if not self.base_uri.endswith('/'): + self.base_uri += '/' + + def _fetch(self, fn, **kwargs): + # disable implicit mirrors support since it relies on file + # being in distfiles/ + kwargs['try_mirrors'] = 0 + if not fetch([self.base_uri + fn], self.my_settings, **kwargs): + raise SquashDeltaError() + + def _openpgp_verify(self, data): + if 'webrsync-gpg' in self.my_settings.features: + # TODO: OpenPGP signature verification + # raise SquashDeltaError if it fails + pass + + def _parse_sha512sum(self, path): + # sha512sum.txt parsing + with io.open(path, 'r', encoding='utf8') as f: + data = f.readlines() + + if not self._openpgp_verify(data): + logging.error('OpenPGP verification failed for sha512sum.txt') + raise SquashDeltaError() + + # current tag + current_re = re.compile('current:', re.IGNORECASE) + # checksum + checksum_re = re.compile('^([a-f0-9]{128})\s+(.*)$', re.IGNORECASE) + + def iter_snapshots(lines): + for l in lines: + m = current_re.search(l) + if m: + for s in l[m.end():].split(): + yield s + + def iter_checksums(lines): + for l in lines: + m = checksum_re.match(l) + if m: + yield (m.group(2), { + 'size': None, + 'SHA512': m.group(1), + }) + + return (iter_snapshots(data), dict(iter_checksums(data))) + + def _find_newest_snapshot(self, snapshots): + # look for current indicator + for s in snapshots: + m = self.repo_re.match(s) + if m: + new_snapshot = m.group(0) + '.sqfs' + new_version = m.group(1) + break + else: + logging.error('Unable to find current snapshot in sha512sum.txt') + raise SquashDeltaError() + + new_path = os.path.join(self.cache_location, new_snapshot) + return (new_snapshot, new_version, new_path) + + def _find_local_snapshot(self, current_path): + # try to find a local snapshot + try: + old_snapshot = os.readlink(current_path) + except OSError: + return ('', '', '') + else: + m = self.repo_re.match(old_snapshot) + if m and old_snapshot.endswith('.sqfs'): + old_version = m.group(1)[:-5] + old_path = os.path.join(self.cache_location, old_snapshot) + + return (old_snapshot, old_version, old_path) + + def _try_delta(self, old_version, new_version, old_path, new_path, my_digests): + # attempt to update + delta_path = None + expected_delta = '%s-%s-%s.sqdelta' % ( + self.repo.name, old_version, new_version) + if expected_delta not in my_digests: + logging.warning('No delta for %s->%s, fetching new snapshot.' + % (old_version, new_version)) + else: + delta_path = os.path.join(self.cache_location, expected_delta) + + if not self._fetch(expected_delta, digests = my_digests): + raise SquashDeltaError() + if not self.has_bin: + raise SquashDeltaError() + + ret = portage.process.spawn([self.bin_command, + old_path, delta_path, new_path], **self.spawn_kwargs) + if ret != os.EX_OK: + logging.error('Merging the delta failed') + raise SquashDeltaError() + return delta_path + + def _update_symlink(self, new_snapshot, current_path): + # using external ln for two reasons: + # 1. clean --force (unlike python's unlink+symlink) + # 2. easy userpriv (otherwise we'd have to lchown()) + ret = portage.process.spawn(['ln', '-s', '-f', new_snapshot, current_path], + **self.spawn_kwargs) + if ret != os.EX_OK: + logging.error('Unable to set -current symlink') + raise SquashDeltaError() + + def _cleanup(self, path): + try: + os.unlink(path) + except OSError as e: + logging.warning('Unable to clean up ' + path + ': ' + str(e)) + + def _update_mount(self, current_path): + mount_cmd = ['mount', current_path, self.repo.location] + can_mount = True + if os.path.ismount(self.repo.location): + # need to umount old snapshot + ret = portage.process.spawn(['umount', '-l', self.repo.location]) + if ret != os.EX_OK: + logging.warning('Unable to unmount old SquashFS after update') + can_mount = False + else: + try: + os.makedirs(self.repo.location) + except OSError as e: + if e.errno != errno.EEXIST: + raise + + if can_mount: + ret = portage.process.spawn(mount_cmd) + if ret != os.EX_OK: + logging.warning('Unable to (re-)mount SquashFS after update') + + def sync(self, **kwargs): + self._kwargs(kwargs) + + try: + self._configure() + + # fetch sha512sum.txt + sha512_path = os.path.join(self.cache_location, 'sha512sum.txt') + try: + os.unlink(sha512_path) + except OSError as e: + if e.errno != errno.ENOENT: + logging.error('Unable to unlink sha512sum.txt') + return (1, False) + self._fetch('sha512sum.txt') + + snapshots, my_digests = self._parse_sha512sum(sha512_path) + + current_path = os.path.join(self.cache_location, + self.repo.name + '-current.sqfs') + new_snapshot, new_version, new_path = ( + self._find_newest_snapshot(snapshots)) + old_snapshot, old_version, old_path = ( + self._find_local_snapshot(current_path)) + + if old_version: + if old_version == new_version: + logging.info('Snapshot up-to-date, verifying integrity.') + else: + delta_path = self._try_delta(old_version, new_version, + old_path, new_path, my_digests) + # pass-through to verification and cleanup + + # fetch full snapshot or verify the one we have + self._fetch(new_snapshot, digests = my_digests) + + # create/update -current symlink + self._update_symlink(new_snapshot, current_path) + + # remove old snapshot + if old_version is not None and old_version != new_version: + self._cleanup(old_path) + if delta_path is not None: + self._cleanup(delta_path) + self._cleanup(sha512_path) + + self._update_mount(current_path) + + return (0, True) + except SquashDeltaError: + return (1, False) -- 2.3.5