public inbox for gentoo-portage-dev@lists.gentoo.org
 help / color / mirror / Atom feed
* [gentoo-portage-dev] [PATCH] Contribute squashdelta syncing module
@ 2015-04-05 10:08 Michał Górny
  2015-04-16 17:38 ` Brian Dolbec
  0 siblings, 1 reply; 5+ messages in thread
From: Michał Górny @ 2015-04-05 10:08 UTC (permalink / raw
  To: gentoo-portage-dev; +Cc: Michał Górny

The squashdelta module provides syncing via SquashFS snapshots. For the
initial sync, a complete snapshot is fetched and placed in
/var/cache/portage/squashfs. On subsequent sync operations, deltas are
fetched from the mirror and used to reconstruct the newest snapshot.

The distfile fetching logic is reused to fetch the remote files
and verify their checksums. Additionally, the sha512sum.txt file should
be OpenPGP-verified after fetching but this is currently unimplemented.

After fetching, Portage tries to (re-)mount the SquashFS in repository
location.
---
 cnf/repos.conf                                     |   4 +
 pym/portage/sync/modules/squashdelta/README        | 124 +++++++++++++
 pym/portage/sync/modules/squashdelta/__init__.py   |  37 ++++
 .../sync/modules/squashdelta/squashdelta.py        | 192 +++++++++++++++++++++
 4 files changed, 357 insertions(+)
 create mode 100644 pym/portage/sync/modules/squashdelta/README
 create mode 100644 pym/portage/sync/modules/squashdelta/__init__.py
 create mode 100644 pym/portage/sync/modules/squashdelta/squashdelta.py

diff --git a/cnf/repos.conf b/cnf/repos.conf
index 1ca98ca..062fc0d 100644
--- a/cnf/repos.conf
+++ b/cnf/repos.conf
@@ -6,3 +6,7 @@ location = /usr/portage
 sync-type = rsync
 sync-uri = rsync://rsync.gentoo.org/gentoo-portage
 auto-sync = yes
+
+# for daily squashfs snapshots
+#sync-type = squashdelta
+#sync-uri = mirror://gentoo/../snapshots/squashfs
diff --git a/pym/portage/sync/modules/squashdelta/README b/pym/portage/sync/modules/squashdelta/README
new file mode 100644
index 0000000..994ae6d
--- /dev/null
+++ b/pym/portage/sync/modules/squashdelta/README
@@ -0,0 +1,124 @@
+==================
+ squashdelta-sync
+==================
+
+Introduction
+============
+
+Squashdelta-sync provides the squashfs syncing module for Portage.
+When used as sync-type for the repository, it fetches the complete
+repository snapshot on initial sync, and then uses squashdeltas to
+efficiently update it.
+
+While initially intended for the daily snapshot of the Gentoo
+repository, the module is designed with flexibility in mind. It can be
+used to sync any repository, without enforcing any specific snapshotting
+interval or versioning rules. However, each snapshot version identifier
+must be unique in the scope of repository.
+
+
+Technical hosting details
+=========================
+
+The snapshot hosting needs to provide the following files:
+
+1. the current (newest) full SquashFS snapshot of the repository,
+   and optionally M past snapshots,
+
+2. the deltas from N past snapshots to the current snapshot,
+
+3. a ``sha512sum.txt`` file containing SHA-512 checksums of all hosted
+   files, optionally OpenPGP-signed.
+
+The following naming schemes are used for the snapshots and deltas,
+respectively::
+
+    ${repo_name}-${version}.sqfs
+    ${repo_name}-${old_version}-${new_version}.sqdelta
+
+where:
+
+* ``${repo_name}`` is the repository name (as specified
+  in ``repos.conf``),
+* ``${version}`` specifies the snapshot version,
+* ``${old_version}`` specifies the snapshot version which the delta
+  updates from,
+* ``${new_version}`` specifies the snapshot version which the delta
+  updates to.
+
+Version can be an arbitrary string. It does not need to be incremental,
+however each version must be unique in the repository scope.
+For example, the version can be a date, a revision number or a commit
+hash.
+
+The ``sha512sum.txt`` uses the format used by the GNU coreutils
+``sha512sum`` program. That is, it contains one or more lines consisting
+of hexadecimal SHA-512 checksum followed by whitespace, followed by
+a filename. Lines not matching that format should be ignored.
+
+Optionally, the ``sha512sum.txt`` may be OpenPGP-signed. In that case,
+the file conforms to the ASCII-armored OpenPGP message format, with
+the checksums being stored in the message body.
+
+Additionally, the ``sha512sum.txt`` needs to contain an additional line
+containing the following string::
+
+    Current: ${repo_name}-${version}
+
+Stating the current (newest) snapshot version. If snapshots for multiple
+repositories are provided in the same directory (using the same
+``sha512sum.txt`` file), this line can occur multiple times or list
+multiple snapshots, whitespace-separated. In order not to introduce
+stray lines in the file, it is recommended to embed this information
+in the OpenPGP comment field.
+
+An example script generating daily deltas for a repository can be found
+in squashdelta-daily-gen_ repository.
+
+.. _squashdelta-daily-gen: https://bitbucket.org/mgorny/squashdelta-daily-gen
+
+
+Technical syncing details
+=========================
+
+When performing a sync, the script first fetches the ``sha512sum.txt``
+and processes it in order to determine the list of files available
+on the mirror. It should be noted that the script will never use
+a snapshot or delta that is not listed there. If the file is
+OpenPGP-signed, the signature is verified.
+
+The script scans scans the ``sha512sum.txt`` for a line containing
+the following string (case-insensitive)::
+
+    Current:
+
+The text following this string is split on spaces, and the resulting
+tokens are parsed as snapshot names. The one matching the current
+repository name is used to determine the current (newest) snapshot
+version.
+
+Afterwards, the script scans the local cache directory for the following
+symlink::
+
+    ${repo_name}-current.sqfs
+
+If the symlink exists, the file pointed by it is assumed to be
+the current (newest) local snapshot. Otherwise, the script assumes
+initial sync.
+
+On initial sync, the script fetches the newest snapshot from mirror
+and places it inside cache directory. The snapshot checksum is verified
+using ``sha512sum.txt`` and ``${repo_name}-current.sqfs`` symlink is
+created.
+
+On update, the script scans the file list for a delta transforming
+the current local snapshot to the newest remote snapshot. If such
+a delta is found, it is fetched, verified and applied to obtain
+the new snapshot. Afterwards, the resulting snapshot checksum is
+verified and the ``${repo_name}-current.sqfs`` symlink is updated.
+
+If no delta matches the version pair, it is assumed that the system is
+outdated beyond available deltas and a new snapshot is fetched instead
+(alike initial sync).
+
+.. vim:ft=rst
diff --git a/pym/portage/sync/modules/squashdelta/__init__.py b/pym/portage/sync/modules/squashdelta/__init__.py
new file mode 100644
index 0000000..1a17dea
--- /dev/null
+++ b/pym/portage/sync/modules/squashdelta/__init__.py
@@ -0,0 +1,37 @@
+#	vim:fileencoding=utf-8:noet
+# (c) 2015 Michał Górny <mgorny@gentoo.org>
+# Distributed under the terms of the GNU General Public License v2
+
+from portage.sync.config_checks import CheckSyncConfig
+
+
+DEFAULT_CACHE_LOCATION = '/var/cache/portage/squashfs'
+
+
+class CheckSquashDeltaConfig(CheckSyncConfig):
+	def __init__(self, repo, logger):
+		CheckSyncConfig.__init__(self, repo, logger)
+		self.checks.append('check_cache_location')
+
+	def check_cache_location(self):
+		# TODO: make it configurable when Portage is fixed to support
+		# arbitrary config variables
+		pass
+
+
+module_spec = {
+	'name': 'squashdelta',
+	'description': 'Syncing SquashFS images using SquashDeltas',
+	'provides': {
+		'squashdelta-module': {
+			'name': "squashdelta",
+			'class': "SquashDeltaSync",
+			'description': 'Syncing SquashFS images using SquashDeltas',
+			'functions': ['sync', 'new', 'exists'],
+			'func_desc': {
+				'sync': 'Performs the sync of the repository',
+			},
+			'validate_config': CheckSquashDeltaConfig,
+		}
+	}
+}
diff --git a/pym/portage/sync/modules/squashdelta/squashdelta.py b/pym/portage/sync/modules/squashdelta/squashdelta.py
new file mode 100644
index 0000000..a0dfc46
--- /dev/null
+++ b/pym/portage/sync/modules/squashdelta/squashdelta.py
@@ -0,0 +1,192 @@
+#	vim:fileencoding=utf-8:noet
+# (c) 2015 Michał Górny <mgorny@gentoo.org>
+# Distributed under the terms of the GNU General Public License v2
+
+import errno
+import io
+import logging
+import os
+import os.path
+import re
+
+import portage
+from portage.package.ebuild.fetch import fetch
+from portage.sync.syncbase import SyncBase
+
+from . import DEFAULT_CACHE_LOCATION
+
+
+class SquashDeltaSync(SyncBase):
+	short_desc = "Repository syncing using SquashFS deltas"
+
+	@staticmethod
+	def name():
+		return "SquashDeltaSync"
+
+	def __init__(self):
+		super(SquashDeltaSync, self).__init__(
+				'squashmerge', 'dev-util/squashmerge')
+
+	def sync(self, **kwargs):
+		self._kwargs(kwargs)
+		my_settings = portage.config(clone = self.settings)
+		cache_location = DEFAULT_CACHE_LOCATION
+
+		# override fetching location
+		my_settings['DISTDIR'] = cache_location
+
+		# make sure we append paths correctly
+		base_uri = self.repo.sync_uri
+		if not base_uri.endswith('/'):
+			base_uri += '/'
+
+		def my_fetch(fn, **kwargs):
+			kwargs['try_mirrors'] = 0
+			return fetch([base_uri + fn], my_settings, **kwargs)
+
+		# fetch sha512sum.txt
+		sha512_path = os.path.join(cache_location, 'sha512sum.txt')
+		try:
+			os.unlink(sha512_path)
+		except OSError:
+			pass
+		if not my_fetch('sha512sum.txt'):
+			return (1, False)
+
+		if 'webrsync-gpg' in my_settings.features:
+			# TODO: GPG signature verification
+			pass
+
+		# sha512sum.txt parsing
+		with io.open(sha512_path, 'r', encoding='utf8') as f:
+			data = f.readlines()
+
+		repo_re = re.compile(self.repo.name + '-(.*)$')
+		# current tag
+		current_re = re.compile('current:', re.IGNORECASE)
+		# checksum
+		checksum_re = re.compile('^([a-f0-9]{128})\s+(.*)$', re.IGNORECASE)
+
+		def iter_snapshots(lines):
+			for l in lines:
+				m = current_re.search(l)
+				if m:
+					for s in l[m.end():].split():
+						yield s
+
+		def iter_checksums(lines):
+			for l in lines:
+				m = checksum_re.match(l)
+				if m:
+					yield (m.group(2), {
+						'size': None,
+						'SHA512': m.group(1),
+					})
+
+		# look for current indicator
+		for s in iter_snapshots(data):
+			m = repo_re.match(s)
+			if m:
+				new_snapshot = m.group(0) + '.sqfs'
+				new_version = m.group(1)
+				break
+		else:
+			logging.error('Unable to find current snapshot in sha512sum.txt')
+			return (1, False)
+		new_path = os.path.join(cache_location, new_snapshot)
+
+		# get digests
+		my_digests = dict(iter_checksums(data))
+
+		# try to find a local snapshot
+		old_version = None
+		current_path = os.path.join(cache_location,
+				self.repo.name + '-current.sqfs')
+		try:
+			old_snapshot = os.readlink(current_path)
+		except OSError:
+			pass
+		else:
+			m = repo_re.match(old_snapshot)
+			if m and old_snapshot.endswith('.sqfs'):
+				old_version = m.group(1)[:-5]
+				old_path = os.path.join(cache_location, old_snapshot)
+
+		if old_version is not None:
+			if old_version == new_version:
+				logging.info('Snapshot up-to-date, verifying integrity.')
+			else:
+				# attempt to update
+				delta_path = None
+				expected_delta = '%s-%s-%s.sqdelta' % (
+						self.repo.name, old_version, new_version)
+				if expected_delta not in my_digests:
+					logging.warning('No delta for %s->%s, fetching new snapshot.'
+							% (old_version, new_version))
+				else:
+					delta_path = os.path.join(cache_location, expected_delta)
+
+					if not my_fetch(expected_delta, digests = my_digests):
+						return (4, False)
+					if not self.has_bin:
+						return (5, False)
+
+					ret = portage.process.spawn([self.bin_command,
+							old_path, delta_path, new_path], **self.spawn_kwargs)
+					if ret != os.EX_OK:
+						logging.error('Merging the delta failed')
+						return (6, False)
+
+					# pass-through to verification and cleanup
+
+		# fetch full snapshot or verify the one we have
+		if not my_fetch(new_snapshot, digests = my_digests):
+			return (2, False)
+
+		# create/update -current symlink
+		# using external ln for two reasons:
+		# 1. clean --force (unlike python's unlink+symlink)
+		# 2. easy userpriv (otherwise we'd have to lchown())
+		ret = portage.process.spawn(['ln', '-s', '-f', new_snapshot, current_path],
+				**self.spawn_kwargs)
+		if ret != os.EX_OK:
+			logging.error('Unable to set -current symlink')
+			retrurn (3, False)
+
+		# remove old snapshot
+		if old_version is not None and old_version != new_version:
+			try:
+				os.unlink(old_path)
+			except OSError as e:
+				logging.warning('Unable to unlink old snapshot: ' + str(e))
+			if delta_path is not None:
+				try:
+					os.unlink(delta_path)
+				except OSError as e:
+					logging.warning('Unable to unlink old delta: ' + str(e))
+		try:
+			os.unlink(sha512_path)
+		except OSError as e:
+			logging.warning('Unable to unlink sha512sum.txt: ' + str(e))
+
+		mount_cmd = ['mount', current_path, self.repo.location]
+		can_mount = True
+		if os.path.ismount(self.repo.location):
+			# need to umount old snapshot
+			ret = portage.process.spawn(['umount', '-l', self.repo.location])
+			if ret != os.EX_OK:
+				logging.warning('Unable to unmount old SquashFS after update')
+				can_mount = False
+		else:
+			try:
+				os.makedirs(self.repo.location)
+			except OSError as e:
+				if e.errno != errno.EEXIST:
+					raise
+
+		if can_mount:
+			ret = portage.process.spawn(mount_cmd)
+			if ret != os.EX_OK:
+				logging.warning('Unable to (re-)mount SquashFS after update')
+
+		return (0, True)
-- 
2.3.5



^ permalink raw reply related	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2015-04-18 18:46 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-04-05 10:08 [gentoo-portage-dev] [PATCH] Contribute squashdelta syncing module Michał Górny
2015-04-16 17:38 ` Brian Dolbec
2015-04-18 17:45   ` Michał Górny
2015-04-18 18:29   ` [gentoo-portage-dev] [PATCH v2] " Michał Górny
2015-04-18 18:46   ` [gentoo-portage-dev] [PATCH v3] " Michał Górny

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox