public inbox for gentoo-portage-dev@lists.gentoo.org
 help / color / mirror / Atom feed
From: "Michał Górny" <mgorny@gentoo.org>
To: Brian Dolbec <dolsen@gentoo.org>
Cc: gentoo-portage-dev@lists.gentoo.org
Subject: Re: [gentoo-portage-dev] [PATCH] Contribute squashdelta syncing module
Date: Sat, 18 Apr 2015 19:45:34 +0200	[thread overview]
Message-ID: <20150418194534.01f43b72@pomiot.lan> (raw)
In-Reply-To: <20150416103822.69555cb3.dolsen@gentoo.org>

[-- Attachment #1: Type: text/plain, Size: 11782 bytes --]

Dnia 2015-04-16, o godz. 10:38:22
Brian Dolbec <dolsen@gentoo.org> napisał(a):

> On Sun,  5 Apr 2015 12:08:31 +0200
> Michał Górny <mgorny@gentoo.org> wrote:
> 
> > The squashdelta module provides syncing via SquashFS snapshots. For
> > the initial sync, a complete snapshot is fetched and placed in
> > /var/cache/portage/squashfs. On subsequent sync operations, deltas are
> > fetched from the mirror and used to reconstruct the newest snapshot.
> > 
> > The distfile fetching logic is reused to fetch the remote files
> > and verify their checksums. Additionally, the sha512sum.txt file
> > should be OpenPGP-verified after fetching but this is currently
> > unimplemented.
> > 
> > After fetching, Portage tries to (re-)mount the SquashFS in repository
> > location.
> > ---
> >  cnf/repos.conf                                     |   4 +
> >  pym/portage/sync/modules/squashdelta/README        | 124
> > +++++++++++++ pym/portage/sync/modules/squashdelta/__init__.py   |
> > 37 ++++ .../sync/modules/squashdelta/squashdelta.py        | 192
> > +++++++++++++++++++++ 4 files changed, 357 insertions(+)
> >  create mode 100644 pym/portage/sync/modules/squashdelta/README
> >  create mode 100644 pym/portage/sync/modules/squashdelta/__init__.py
> >  create mode 100644
> > pym/portage/sync/modules/squashdelta/squashdelta.py
> > 
> > diff --git a/cnf/repos.conf b/cnf/repos.conf
> > index 1ca98ca..062fc0d 100644
> > --- a/cnf/repos.conf
> > +++ b/cnf/repos.conf
> > @@ -6,3 +6,7 @@ location = /usr/portage
> >  sync-type = rsync
> >  sync-uri = rsync://rsync.gentoo.org/gentoo-portage
> >  auto-sync = yes
> > +
> > +# for daily squashfs snapshots
> > +#sync-type = squashdelta
> > +#sync-uri = mirror://gentoo/../snapshots/squashfs
> >
> 
> <snip>
> 
> > diff --git a/pym/portage/sync/modules/squashdelta/__init__.py
> > b/pym/portage/sync/modules/squashdelta/__init__.py new file mode
> > 100644 index 0000000..1a17dea
> > --- /dev/null
> > +++ b/pym/portage/sync/modules/squashdelta/__init__.py
> > @@ -0,0 +1,37 @@
> > +#	vim:fileencoding=utf-8:noet
> > +# (c) 2015 Michał Górny <mgorny@gentoo.org>
> > +# Distributed under the terms of the GNU General Public License v2
> > +
> > +from portage.sync.config_checks import CheckSyncConfig
> > +
> > +
> > +DEFAULT_CACHE_LOCATION = '/var/cache/portage/squashfs'
> > +
> > +
> > +class CheckSquashDeltaConfig(CheckSyncConfig):
> > +	def __init__(self, repo, logger):
> > +		CheckSyncConfig.__init__(self, repo, logger)
> > +		self.checks.append('check_cache_location')
> > +
> > +	def check_cache_location(self):
> > +		# TODO: make it configurable when Portage is fixed
> > to support
> > +		# arbitrary config variables
> > +		pass
> > +
> > +
> > +module_spec = {
> > +	'name': 'squashdelta',
> > +	'description': 'Syncing SquashFS images using SquashDeltas',
> > +	'provides': {
> > +		'squashdelta-module': {
> > +			'name': "squashdelta",
> > +			'class': "SquashDeltaSync",
> > +			'description': 'Syncing SquashFS images
> > using SquashDeltas',
> > +			'functions': ['sync', 'new', 'exists'],
> > +			'func_desc': {
> > +				'sync': 'Performs the sync of the
> > repository',
> > +			},
> > +			'validate_config': CheckSquashDeltaConfig,
> > +		}
> > +	}
> > +}
> > diff --git a/pym/portage/sync/modules/squashdelta/squashdelta.py
> > b/pym/portage/sync/modules/squashdelta/squashdelta.py new file mode
> > 100644 index 0000000..a0dfc46
> > --- /dev/null
> > +++ b/pym/portage/sync/modules/squashdelta/squashdelta.py
> > @@ -0,0 +1,192 @@
> > +#	vim:fileencoding=utf-8:noet
> > +# (c) 2015 Michał Górny <mgorny@gentoo.org>
> > +# Distributed under the terms of the GNU General Public License v2
> > +
> > +import errno
> > +import io
> > +import logging
> > +import os
> > +import os.path
> > +import re
> > +
> > +import portage
> > +from portage.package.ebuild.fetch import fetch
> > +from portage.sync.syncbase import SyncBase
> > +
> > +from . import DEFAULT_CACHE_LOCATION
> > +
> > +
> > +class SquashDeltaSync(SyncBase):
> 
> 
> OK, I see a small mistake.  You are subclassing SyncBase which does not
> stub out a new() and you do not define one here.  But you export a new()
> in the module-spec above.

Fixed. I removed them from [func_desc], and apparently forgot
to do so from [functions].

> > +	short_desc = "Repository syncing using SquashFS deltas"
> > +
> > +	@staticmethod
> > +	def name():
> > +		return "SquashDeltaSync"
> > +
> > +	def __init__(self):
> > +		super(SquashDeltaSync, self).__init__(
> > +				'squashmerge',
> > 'dev-util/squashmerge') +
> > +	def sync(self, **kwargs):
> > +		self._kwargs(kwargs)
> > +		my_settings = portage.config(clone = self.settings)
> > +		cache_location = DEFAULT_CACHE_LOCATION
> > +
> > +		# override fetching location
> > +		my_settings['DISTDIR'] = cache_location
> > +
> > +		# make sure we append paths correctly
> > +		base_uri = self.repo.sync_uri
> > +		if not base_uri.endswith('/'):
> > +			base_uri += '/'
> > +
> > +		def my_fetch(fn, **kwargs):
> > +			kwargs['try_mirrors'] = 0
> > +			return fetch([base_uri + fn], my_settings,
> > **kwargs) +
> > +		# fetch sha512sum.txt
> > +		sha512_path = os.path.join(cache_location,
> > 'sha512sum.txt')
> > +		try:
> > +			os.unlink(sha512_path)
> > +		except OSError:
> > +			pass
> > +		if not my_fetch('sha512sum.txt'):
> > +			return (1, False)
> > +
> > +		if 'webrsync-gpg' in my_settings.features:
> > +			# TODO: GPG signature verification
> > +			pass
> > +
> > +		# sha512sum.txt parsing
> > +		with io.open(sha512_path, 'r', encoding='utf8') as f:
> > +			data = f.readlines()
> > +
> > +		repo_re = re.compile(self.repo.name + '-(.*)$')
> > +		# current tag
> > +		current_re = re.compile('current:', re.IGNORECASE)
> > +		# checksum
> > +		checksum_re = re.compile('^([a-f0-9]{128})\s+(.*)$',
> > re.IGNORECASE) +
> > +		def iter_snapshots(lines):
> > +			for l in lines:
> > +				m = current_re.search(l)
> > +				if m:
> > +					for s in l[m.end():].split():
> > +						yield s
> > +
> > +		def iter_checksums(lines):
> > +			for l in lines:
> > +				m = checksum_re.match(l)
> > +				if m:
> > +					yield (m.group(2), {
> > +						'size': None,
> > +						'SHA512': m.group(1),
> > +					})
> > +
> > +		# look for current indicator
> > +		for s in iter_snapshots(data):
> > +			m = repo_re.match(s)
> > +			if m:
> > +				new_snapshot = m.group(0) + '.sqfs'
> > +				new_version = m.group(1)
> > +				break
> > +		else:
> > +			logging.error('Unable to find current
> > snapshot in sha512sum.txt')
> > +			return (1, False)
> > +		new_path = os.path.join(cache_location, new_snapshot)
> > +
> > +		# get digests
> > +		my_digests = dict(iter_checksums(data))
> > +
> > +		# try to find a local snapshot
> > +		old_version = None
> > +		current_path = os.path.join(cache_location,
> > +				self.repo.name + '-current.sqfs')
> > +		try:
> > +			old_snapshot = os.readlink(current_path)
> > +		except OSError:
> > +			pass
> > +		else:
> > +			m = repo_re.match(old_snapshot)
> > +			if m and old_snapshot.endswith('.sqfs'):
> > +				old_version = m.group(1)[:-5]
> > +				old_path =
> > os.path.join(cache_location, old_snapshot) +
> > +		if old_version is not None:
> > +			if old_version == new_version:
> > +				logging.info('Snapshot up-to-date,
> > verifying integrity.')
> > +			else:
> > +				# attempt to update
> > +				delta_path = None
> > +				expected_delta = '%s-%s-%s.sqdelta'
> > % (
> > +						self.repo.name,
> > old_version, new_version)
> > +				if expected_delta not in my_digests:
> > +					logging.warning('No delta
> > for %s->%s, fetching new snapshot.'
> > +							%
> > (old_version, new_version))
> > +				else:
> > +					delta_path =
> > os.path.join(cache_location, expected_delta) +
> > +					if not
> > my_fetch(expected_delta, digests = my_digests):
> > +						return (4, False)
> > +					if not self.has_bin:
> > +						return (5, False)
> > +
> > +					ret =
> > portage.process.spawn([self.bin_command,
> > +							old_path,
> > delta_path, new_path], **self.spawn_kwargs)
> > +					if ret != os.EX_OK:
> > +
> > logging.error('Merging the delta failed')
> > +						return (6, False)
> > +
> > +					# pass-through to
> > verification and cleanup +
> > +		# fetch full snapshot or verify the one we have
> > +		if not my_fetch(new_snapshot, digests = my_digests):
> > +			return (2, False)
> > +
> > +		# create/update -current symlink
> > +		# using external ln for two reasons:
> > +		# 1. clean --force (unlike python's unlink+symlink)
> > +		# 2. easy userpriv (otherwise we'd have to lchown())
> > +		ret = portage.process.spawn(['ln', '-s', '-f',
> > new_snapshot, current_path],
> > +				**self.spawn_kwargs)
> > +		if ret != os.EX_OK:
> > +			logging.error('Unable to set -current
> > symlink')
> > +			retrurn (3, False)
> > +
> > +		# remove old snapshot
> > +		if old_version is not None and old_version !=
> > new_version:
> > +			try:
> > +				os.unlink(old_path)
> > +			except OSError as e:
> > +				logging.warning('Unable to unlink
> > old snapshot: ' + str(e))
> > +			if delta_path is not None:
> > +				try:
> > +					os.unlink(delta_path)
> > +				except OSError as e:
> > +					logging.warning('Unable to
> > unlink old delta: ' + str(e))
> > +		try:
> > +			os.unlink(sha512_path)
> > +		except OSError as e:
> > +			logging.warning('Unable to unlink
> > sha512sum.txt: ' + str(e)) +
> > +		mount_cmd = ['mount', current_path,
> > self.repo.location]
> > +		can_mount = True
> > +		if os.path.ismount(self.repo.location):
> > +			# need to umount old snapshot
> > +			ret = portage.process.spawn(['umount', '-l',
> > self.repo.location])
> > +			if ret != os.EX_OK:
> > +				logging.warning('Unable to unmount
> > old SquashFS after update')
> > +				can_mount = False
> > +		else:
> > +			try:
> > +				os.makedirs(self.repo.location)
> > +			except OSError as e:
> > +				if e.errno != errno.EEXIST:
> > +					raise
> > +
> > +		if can_mount:
> > +			ret = portage.process.spawn(mount_cmd)
> > +			if ret != os.EX_OK:
> > +				logging.warning('Unable to
> > (re-)mount SquashFS after update') +
> > +		return (0, True)
> 
> Overall the code itself looks decent.  Aside from the small mistake
> mentioned inline, my only concern is the sheer size of the sync().  It
> is 162 lines and embeds 2 private functions. This code could easily be
> broken up into several smaller task functions.  It would make reading
> the main sync() logic easier as well as the smaller task sections.  I
> am not a fan of the long winded functions and scripts present in
> portage (this by no means is in the same category as many of those).
> But I certainly don't want to let more of that in if I can help it. And
> aim to reduce it while I'm the lead.

Will try.

> Ok, so the only data variable you wanted to add to the repos.conf was
> the cache location?

Yes, right now just that, I think. Maybe in the future we'd use
per-repo OpenPGP verification switch. Something like:

  openpgp-verify = [no|yes|required]

With 'yes' meaning 'verify if signed', and 'required' meaning 'refuse
if not signed'.

> I'll work on adding the gkeys integration in the gkeys branch I started
> for the gpg verification.  I see no point in porting the code from
> emerge-webrsync's bash to python only to be replaced by gkeys in the
> very near future.  Please stub out a function & call for it when you
> address the above issues.  I'll fill in the code for it.

Ok.

-- 
Best regards,
Michał Górny

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 949 bytes --]

  reply	other threads:[~2015-04-18 17:45 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-04-05 10:08 [gentoo-portage-dev] [PATCH] Contribute squashdelta syncing module Michał Górny
2015-04-16 17:38 ` Brian Dolbec
2015-04-18 17:45   ` Michał Górny [this message]
2015-04-18 18:29   ` [gentoo-portage-dev] [PATCH v2] " Michał Górny
2015-04-18 18:46   ` [gentoo-portage-dev] [PATCH v3] " Michał Górny

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150418194534.01f43b72@pomiot.lan \
    --to=mgorny@gentoo.org \
    --cc=dolsen@gentoo.org \
    --cc=gentoo-portage-dev@lists.gentoo.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox