public inbox for gentoo-portage-dev@lists.gentoo.org
 help / color / mirror / Atom feed
* [gentoo-portage-dev] [PATCH 0/4] Add sync-rcu support for rsync (bug 662070)
@ 2018-08-06  7:40 Zac Medico
  2018-08-06  7:40 ` [gentoo-portage-dev] [PATCH 1/4] Implement asyncio.iscoroutinefunction for compat_coroutine Zac Medico
                   ` (4 more replies)
  0 siblings, 5 replies; 7+ messages in thread
From: Zac Medico @ 2018-08-06  7:40 UTC (permalink / raw
  To: gentoo-portage-dev; +Cc: Zac Medico

Add a boolean sync-rcu repos.conf setting that behaves as follows:

sync-rcu = yes|no

    Enable read-copy-update (RCU) behavior for sync operations. The
    current latest immutable version of a repository will be referenced
    by a symlink found where the repository would normally be located
    (see the location setting). Repository consumers should resolve
    the cannonical path of this symlink before attempt to access
    the repository, and all operations should be read-only, since
    the repository is considered immutable. Updates occur by atomic
    replacement of the symlink, which causes new consumers to use the
    new immutable version, while any earlier consumers continue to
    use the cannonical path that was resolved earlier. This option
    requires sync-allow-hardlinks and sync-rcu-store-dir options to
    be enabled, and currently also requires that sync-type is set
    to rsync. This option is disabled by default, since the symlink
    usage would require special handling for scenarios involving bind
    mounts and chroots.

sync-rcu-store-dir

    Directory path reserved for sync-rcu storage. This directory must
    have a unique value for each repository (do not set it in the
    DEFAULT section).  This directory must not contain any other files
    or directories aside from those that are created automatically
    when sync-rcu is enabled.

sync-rcu-spare-snapshots = 1

    Number of spare snapshots for sync-rcu to retain with expired
    ttl. This protects the previous latest snapshot from being removed
    immediately after a new version becomes available, since it might
    still be used by running processes.

sync-rcu-ttl-days = 7

    Number of days for sync-rcu to retain previous immutable snapshots
    of a repository. After the ttl of a particular snapshot has
    expired, it will be remove automatically (the latest snapshot
    is exempt, and sync-rcu-spare-snapshots configures the number of
    previous snapshots that are exempt). If the ttl is set too low,
    then a snapshot could expire while it is in use by a running
    process.

Zac Medico (4):
  Implement asyncio.iscoroutinefunction for compat_coroutine
  Add _sync_decorator module
  rsync: split out repo storage framework
  Add sync-rcu support for rsync (bug 662070)

 lib/portage/repository/config.py                   |  36 ++-
 lib/portage/repository/storage/__init__.py         |   0
 .../repository/storage/hardlink_quarantine.py      |  95 ++++++++
 lib/portage/repository/storage/hardlink_rcu.py     | 251 +++++++++++++++++++++
 lib/portage/repository/storage/inplace.py          |  49 ++++
 lib/portage/repository/storage/interface.py        |  87 +++++++
 lib/portage/sync/controller.py                     |   1 +
 lib/portage/sync/modules/rsync/rsync.py            |  85 ++-----
 lib/portage/sync/syncbase.py                       |  33 +++
 .../tests/util/futures/test_compat_coroutine.py    |  14 ++
 lib/portage/util/futures/_asyncio/__init__.py      |  14 ++
 lib/portage/util/futures/_sync_decorator.py        |  54 +++++
 lib/portage/util/futures/compat_coroutine.py       |  12 +
 man/portage.5                                      |  35 +++
 14 files changed, 700 insertions(+), 66 deletions(-)
 create mode 100644 lib/portage/repository/storage/__init__.py
 create mode 100644 lib/portage/repository/storage/hardlink_quarantine.py
 create mode 100644 lib/portage/repository/storage/hardlink_rcu.py
 create mode 100644 lib/portage/repository/storage/inplace.py
 create mode 100644 lib/portage/repository/storage/interface.py
 create mode 100644 lib/portage/util/futures/_sync_decorator.py

-- 
2.16.4



^ permalink raw reply	[flat|nested] 7+ messages in thread

* [gentoo-portage-dev] [PATCH 1/4] Implement asyncio.iscoroutinefunction for compat_coroutine
  2018-08-06  7:40 [gentoo-portage-dev] [PATCH 0/4] Add sync-rcu support for rsync (bug 662070) Zac Medico
@ 2018-08-06  7:40 ` Zac Medico
  2018-08-06  7:40 ` [gentoo-portage-dev] [PATCH 2/4] Add _sync_decorator module Zac Medico
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 7+ messages in thread
From: Zac Medico @ 2018-08-06  7:40 UTC (permalink / raw
  To: gentoo-portage-dev; +Cc: Zac Medico

Sometimes it's useful to test if a function is a coroutine function,
so implement a version of asyncio.iscoroutinefunction that works
with asyncio.coroutine as well as compat_coroutine.coroutine (since
both kinds of coroutine functions behave identically for our
purposes).
---
 lib/portage/util/futures/_asyncio/__init__.py | 14 ++++++++++++++
 lib/portage/util/futures/compat_coroutine.py  | 12 ++++++++++++
 2 files changed, 26 insertions(+)

diff --git a/lib/portage/util/futures/_asyncio/__init__.py b/lib/portage/util/futures/_asyncio/__init__.py
index faab98e47..2a637624d 100644
--- a/lib/portage/util/futures/_asyncio/__init__.py
+++ b/lib/portage/util/futures/_asyncio/__init__.py
@@ -36,6 +36,7 @@ except ImportError:
 import portage
 portage.proxy.lazyimport.lazyimport(globals(),
 	'portage.util.futures.unix_events:_PortageEventLoopPolicy',
+	'portage.util.futures:compat_coroutine@_compat_coroutine',
 )
 from portage.util._eventloop.asyncio_event_loop import AsyncioEventLoop as _AsyncioEventLoop
 from portage.util._eventloop.global_event_loop import (
@@ -152,6 +153,19 @@ def create_subprocess_exec(*args, **kwargs):
 	return result
 
 
+def iscoroutinefunction(func):
+	"""
+	Return True if func is a decorated coroutine function,
+	supporting both asyncio.coroutine and compat_coroutine since
+	their behavior is identical for all practical purposes.
+	"""
+	if _compat_coroutine._iscoroutinefunction(func):
+		return True
+	elif _real_asyncio is not None and _real_asyncio.iscoroutinefunction(func):
+		return True
+	return False
+
+
 class Task(Future):
 	"""
 	Schedule the execution of a coroutine: wrap it in a future. A task
diff --git a/lib/portage/util/futures/compat_coroutine.py b/lib/portage/util/futures/compat_coroutine.py
index 59fdc31b6..be305c1b5 100644
--- a/lib/portage/util/futures/compat_coroutine.py
+++ b/lib/portage/util/futures/compat_coroutine.py
@@ -8,6 +8,17 @@ portage.proxy.lazyimport.lazyimport(globals(),
 	'portage.util.futures:asyncio',
 )
 
+# A marker for iscoroutinefunction.
+_is_coroutine = object()
+
+
+def _iscoroutinefunction(func):
+	"""
+	Return True if func is a decorated coroutine function
+	created with the coroutine decorator for this module.
+	"""
+	return getattr(func, '_is_coroutine', None) is _is_coroutine
+
 
 def coroutine(generator_func):
 	"""
@@ -34,6 +45,7 @@ def coroutine(generator_func):
 	@functools.wraps(generator_func)
 	def wrapped(*args, **kwargs):
 		return _generator_future(generator_func, *args, **kwargs)
+	wrapped._is_coroutine = _is_coroutine
 	return wrapped
 
 
-- 
2.16.4



^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [gentoo-portage-dev] [PATCH 2/4] Add _sync_decorator module
  2018-08-06  7:40 [gentoo-portage-dev] [PATCH 0/4] Add sync-rcu support for rsync (bug 662070) Zac Medico
  2018-08-06  7:40 ` [gentoo-portage-dev] [PATCH 1/4] Implement asyncio.iscoroutinefunction for compat_coroutine Zac Medico
@ 2018-08-06  7:40 ` Zac Medico
  2018-08-06  7:40 ` [gentoo-portage-dev] [PATCH 3/4] rsync: split out repo storage framework Zac Medico
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 7+ messages in thread
From: Zac Medico @ 2018-08-06  7:40 UTC (permalink / raw
  To: gentoo-portage-dev; +Cc: Zac Medico

Add functions that decorate coroutine methods and functions for
synchronous usage, allowing coroutines to smoothly blend with
synchronous code. This eliminates clutter that might otherwise
discourage the proliferation of coroutine usage for I/O bound tasks.

In the next commit, _sync_decorator will be used for smooth
integration of new classes that have coroutine methods.

Bug: https://bugs.gentoo.org/662070
---
 .../tests/util/futures/test_compat_coroutine.py    | 14 ++++++
 lib/portage/util/futures/_sync_decorator.py        | 54 ++++++++++++++++++++++
 2 files changed, 68 insertions(+)
 create mode 100644 lib/portage/util/futures/_sync_decorator.py

diff --git a/lib/portage/tests/util/futures/test_compat_coroutine.py b/lib/portage/tests/util/futures/test_compat_coroutine.py
index cbc070869..2b5ae91cd 100644
--- a/lib/portage/tests/util/futures/test_compat_coroutine.py
+++ b/lib/portage/tests/util/futures/test_compat_coroutine.py
@@ -6,6 +6,7 @@ from portage.util.futures.compat_coroutine import (
 	coroutine,
 	coroutine_return,
 )
+from portage.util.futures._sync_decorator import _sync_decorator, _sync_methods
 from portage.tests import TestCase
 
 
@@ -157,3 +158,16 @@ class CompatCoroutineTestCase(TestCase):
 		loop.run_until_complete(asyncio.wait([writer, reader]))
 
 		self.assertEqual(reader.result(), values)
+
+		# Test decoration of coroutine methods and functions for
+		# synchronous usage, allowing coroutines to smoothly
+		# blend with synchronous code.
+		sync_cubby = _sync_methods(cubby, loop=loop)
+		sync_reader = _sync_decorator(reader_coroutine, loop=loop)
+		writer = asyncio.ensure_future(writer_coroutine(cubby, values, None), loop=loop)
+		self.assertEqual(sync_reader(cubby, None), values)
+		self.assertTrue(writer.done())
+
+		for i in range(3):
+			sync_cubby.write(i)
+			self.assertEqual(sync_cubby.read(), i)
diff --git a/lib/portage/util/futures/_sync_decorator.py b/lib/portage/util/futures/_sync_decorator.py
new file mode 100644
index 000000000..02a0963a7
--- /dev/null
+++ b/lib/portage/util/futures/_sync_decorator.py
@@ -0,0 +1,54 @@
+# Copyright 2018 Gentoo Foundation
+# Distributed under the terms of the GNU General Public License v2
+
+import functools
+
+import portage
+portage.proxy.lazyimport.lazyimport(globals(),
+	'portage.util.futures:asyncio',
+)
+
+
+def _sync_decorator(func, loop=None):
+	"""
+	Decorate an asynchronous function (either a corouting function or a
+	function that returns a Future) with a wrapper that runs the function
+	synchronously.
+	"""
+	loop = asyncio._wrap_loop(loop)
+	@functools.wraps(func)
+	def wrapper(*args, **kwargs):
+		return loop.run_until_complete(func(*args, **kwargs))
+	return wrapper
+
+
+def _sync_methods(obj, loop=None):
+	"""
+	For use with synchronous code that needs to interact with an object
+	that has coroutine methods, this function generates a proxy which
+	conveniently converts coroutine methods into synchronous methods.
+	This allows coroutines to smoothly blend with synchronous
+	code, eliminating clutter that might otherwise discourage the
+	proliferation of coroutine usage for I/O bound tasks.
+	"""
+	loop = asyncio._wrap_loop(loop)
+	return _ObjectAttrWrapper(obj,
+		lambda attr: _sync_decorator(attr, loop=loop)
+		if asyncio.iscoroutinefunction(attr) else attr)
+
+
+class _ObjectAttrWrapper(portage.proxy.objectproxy.ObjectProxy):
+
+	__slots__ = ('_obj', '_attr_wrapper')
+
+	def __init__(self, obj, attr_wrapper):
+		object.__setattr__(self, '_obj', obj)
+		object.__setattr__(self, '_attr_wrapper', attr_wrapper)
+
+	def __getattribute__(self, attr):
+		obj = object.__getattribute__(self, '_obj')
+		attr_wrapper = object.__getattribute__(self, '_attr_wrapper')
+		return attr_wrapper(getattr(obj, attr))
+
+	def _get_target(self):
+		return object.__getattribute__(self, '_obj')
-- 
2.16.4



^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [gentoo-portage-dev] [PATCH 3/4] rsync: split out repo storage framework
  2018-08-06  7:40 [gentoo-portage-dev] [PATCH 0/4] Add sync-rcu support for rsync (bug 662070) Zac Medico
  2018-08-06  7:40 ` [gentoo-portage-dev] [PATCH 1/4] Implement asyncio.iscoroutinefunction for compat_coroutine Zac Medico
  2018-08-06  7:40 ` [gentoo-portage-dev] [PATCH 2/4] Add _sync_decorator module Zac Medico
@ 2018-08-06  7:40 ` Zac Medico
  2018-08-10  0:10   ` Brian Dolbec
  2018-08-06  7:40 ` [gentoo-portage-dev] [PATCH 4/4] Add sync-rcu support for rsync (bug 662070) Zac Medico
  2018-08-10  0:11 ` [gentoo-portage-dev] [PATCH 0/4] " Brian Dolbec
  4 siblings, 1 reply; 7+ messages in thread
From: Zac Medico @ 2018-08-06  7:40 UTC (permalink / raw
  To: gentoo-portage-dev; +Cc: Zac Medico

Since there aremany ways to manage repository storage, split out a repo
storage framework. The HardlinkQuarantineRepoStorage class implements
the existing default behavior, and the InplaceRepoStorage class
implements the legacy behavior (when sync-allow-hardlinks is disabled in
repos.conf).

Each class implements RepoStorageInterface, which uses coroutine methods
since coroutines are well-suited to the I/O bound tasks that these
methods perform. The _sync_decorator is used to convert coroutine
methods to synchronous methods, for smooth integration into the
surrounding synchronous code.

Bug: https://bugs.gentoo.org/662070
---
 lib/portage/repository/storage/__init__.py         |  0
 .../repository/storage/hardlink_quarantine.py      | 95 ++++++++++++++++++++++
 lib/portage/repository/storage/inplace.py          | 49 +++++++++++
 lib/portage/repository/storage/interface.py        | 87 ++++++++++++++++++++
 lib/portage/sync/controller.py                     |  1 +
 lib/portage/sync/modules/rsync/rsync.py            | 85 +++++--------------
 lib/portage/sync/syncbase.py                       | 31 +++++++
 7 files changed, 284 insertions(+), 64 deletions(-)
 create mode 100644 lib/portage/repository/storage/__init__.py
 create mode 100644 lib/portage/repository/storage/hardlink_quarantine.py
 create mode 100644 lib/portage/repository/storage/inplace.py
 create mode 100644 lib/portage/repository/storage/interface.py

diff --git a/lib/portage/repository/storage/__init__.py b/lib/portage/repository/storage/__init__.py
new file mode 100644
index 000000000..e69de29bb
diff --git a/lib/portage/repository/storage/hardlink_quarantine.py b/lib/portage/repository/storage/hardlink_quarantine.py
new file mode 100644
index 000000000..7e9cf4493
--- /dev/null
+++ b/lib/portage/repository/storage/hardlink_quarantine.py
@@ -0,0 +1,95 @@
+# Copyright 2018 Gentoo Foundation
+# Distributed under the terms of the GNU General Public License v2
+
+from portage import os
+from portage.repository.storage.interface import (
+	RepoStorageException,
+	RepoStorageInterface,
+)
+from portage.util.futures import asyncio
+from portage.util.futures.compat_coroutine import (
+	coroutine,
+	coroutine_return,
+)
+
+from _emerge.SpawnProcess import SpawnProcess
+
+
+class HardlinkQuarantineRepoStorage(RepoStorageInterface):
+	"""
+	This is the default storage module, since its quite compatible with
+	most configurations.
+
+	It's desirable to be able to create shared hardlinks between the
+	download directory and the normal repository, and this is facilitated
+	by making the download directory be a subdirectory of the normal
+	repository location (ensuring that no mountpoints are crossed).
+	Shared hardlinks are created by using the rsync --link-dest option.
+
+	Since the download is initially unverified, it is safest to save
+	it in a quarantine directory. The quarantine directory is also
+	useful for making the repository update more atomic, so that it
+	less likely that normal repository location will be observed in
+	a partially synced state.
+	"""
+	def __init__(self, repo, spawn_kwargs):
+		self._user_location = repo.location
+		self._update_location = None
+		self._spawn_kwargs = spawn_kwargs
+		self._current_update = None
+
+	@coroutine
+	def _check_call(self, cmd):
+		"""
+		Run cmd and raise RepoStorageException on failure.
+
+		@param cmd: command to executre
+		@type cmd: list
+		"""
+		p = SpawnProcess(args=cmd, scheduler=asyncio._wrap_loop(), **self._spawn_kwargs)
+		p.start()
+		if (yield p.async_wait()) != os.EX_OK:
+			raise RepoStorageException('command exited with status {}: {}'.\
+				format(p.returncode, ' '.join(cmd)))
+
+	@coroutine
+	def init_update(self):
+		update_location = os.path.join(self._user_location, '.tmp-unverified-download-quarantine')
+		yield self._check_call(['rm', '-rf', update_location])
+
+		# Use  rsync --link-dest to hardlink a files into self._update_location,
+		# since cp -l is not portable.
+		yield self._check_call(['rsync', '-a', '--link-dest', self._user_location,
+			'--exclude', '/{}'.format(os.path.basename(update_location)),
+			self._user_location + '/', update_location + '/'])
+
+		self._update_location = update_location
+
+		coroutine_return(self._update_location)
+
+	@property
+	def current_update(self):
+		if self._update_location is None:
+			raise RepoStorageException('current update does not exist')
+		return self._update_location
+
+	@coroutine
+	def commit_update(self):
+		update_location = self.current_update
+		self._update_location = None
+		yield self._check_call(['rsync', '-a', '--delete',
+			'--exclude', '/{}'.format(os.path.basename(update_location)),
+			update_location + '/', self._user_location + '/'])
+
+		yield self._check_call(['rm', '-rf', update_location])
+
+	@coroutine
+	def abort_update(self):
+		if self._update_location is not None:
+			update_location = self._update_location
+			self._update_location = None
+			yield self._check_call(['rm', '-rf', update_location])
+
+	@coroutine
+	def garbage_collection(self):
+		yield self.abort_update()
diff --git a/lib/portage/repository/storage/inplace.py b/lib/portage/repository/storage/inplace.py
new file mode 100644
index 000000000..f1117ad03
--- /dev/null
+++ b/lib/portage/repository/storage/inplace.py
@@ -0,0 +1,49 @@
+# Copyright 2018 Gentoo Foundation
+# Distributed under the terms of the GNU General Public License v2
+
+from portage.repository.storage.interface import (
+	RepoStorageException,
+	RepoStorageInterface,
+)
+from portage.util.futures.compat_coroutine import coroutine, coroutine_return
+
+
+class InplaceRepoStorage(RepoStorageInterface):
+	"""
+	Legacy repo storage behavior, where updates are applied in-place.
+	This module is not recommended, since the repository is left in an
+	unspecified (possibly malicious) state if the update fails.
+	"""
+	def __init__(self, repo, spawn_kwargs):
+		self._user_location = repo.location
+		self._update_location = None
+
+	@coroutine
+	def init_update(self):
+		self._update_location = self._user_location
+		coroutine_return(self._update_location)
+		yield None
+
+	@property
+	def current_update(self):
+		if self._update_location is None:
+			raise RepoStorageException('current update does not exist')
+		return self._update_location
+
+	@coroutine
+	def commit_update(self):
+		self.current_update
+		self._update_location = None
+		coroutine_return()
+		yield None
+
+	@coroutine
+	def abort_update(self):
+		self._update_location = None
+		coroutine_return()
+		yield None
+
+	@coroutine
+	def garbage_collection(self):
+		coroutine_return()
+		yield None
diff --git a/lib/portage/repository/storage/interface.py b/lib/portage/repository/storage/interface.py
new file mode 100644
index 000000000..f83c42b84
--- /dev/null
+++ b/lib/portage/repository/storage/interface.py
@@ -0,0 +1,87 @@
+# Copyright 2018 Gentoo Foundation
+# Distributed under the terms of the GNU General Public License v2
+
+from portage.exception import PortageException
+from portage.util.futures.compat_coroutine import coroutine
+
+
+class RepoStorageException(PortageException):
+	"""
+	Base class for exceptions raise by RepoStorageInterface.
+	"""
+
+
+class RepoStorageInterface(object):
+	"""
+	Abstract repository storage interface.
+
+	Implementations can assume that the repo.location directory already
+	exists with appropriate permissions (SyncManager handles this).
+
+	TODO: Add a method to check of a previous uncommitted update, which
+	typically indicates a verification failure:
+	    https://bugs.gentoo.org/662386
+	"""
+	def __init__(self, repo, spawn_kwargs):
+		"""
+		@param repo: repository configuration
+		@type repo: portage.repository.config.RepoConfig
+		@param spawn_kwargs: keyword arguments supported by the
+			portage.process.spawn function
+		@type spawn_kwargs: dict
+		"""
+		raise NotImplementedError
+
+	@coroutine
+	def init_update(self):
+		"""
+		Create an update directory as a destination to sync updates to.
+		The directory will be populated with files from the previous
+		immutable snapshot, if available. Note that this directory
+		may contain hardlinks that reference files in the previous
+		immutable snapshot, so these files should not be modified
+		(tools like rsync and git normally break hardlinks when
+		files need to be modified).
+
+		@rtype: str
+		@return: path of directory to update, populated with files from
+			the previous snapshot if available
+		"""
+		raise NotImplementedError
+
+	@property
+	def current_update(self):
+		"""
+		Get the current update directory which would have been returned
+		from the most recent call to the init_update method. This raises
+		RepoStorageException if the init_update method has not been
+		called.
+
+		@rtype: str
+		@return: path of directory to update
+		"""
+		raise NotImplementedError
+
+	@coroutine
+	def commit_update(self):
+		"""
+		Commit the current update directory, so that is becomes the
+		latest immutable snapshot.
+		"""
+		raise NotImplementedError
+
+	@coroutine
+	def abort_update(self):
+		"""
+		Delete the current update directory. If there was not an update
+		in progress, or it has already been committed, then this has
+		no effect.
+		"""
+		raise NotImplementedError
+
+	@coroutine
+	def garbage_collection(self):
+		"""
+		Remove expired snapshots.
+		"""
+		raise NotImplementedError
diff --git a/lib/portage/sync/controller.py b/lib/portage/sync/controller.py
index 3bccf6f74..bf5750f7f 100644
--- a/lib/portage/sync/controller.py
+++ b/lib/portage/sync/controller.py
@@ -327,6 +327,7 @@ class SyncManager(object):
 		# override the defaults when sync_umask is set
 		if repo.sync_umask is not None:
 			spawn_kwargs["umask"] = int(repo.sync_umask, 8)
+		spawn_kwargs.setdefault("umask", 0o022)
 		self.spawn_kwargs = spawn_kwargs
 
 		if self.usersync_uid is not None:
diff --git a/lib/portage/sync/modules/rsync/rsync.py b/lib/portage/sync/modules/rsync/rsync.py
index 56e38631e..17b1b9e7b 100644
--- a/lib/portage/sync/modules/rsync/rsync.py
+++ b/lib/portage/sync/modules/rsync/rsync.py
@@ -59,55 +59,6 @@ class RsyncSync(NewBase):
 	def __init__(self):
 		NewBase.__init__(self, "rsync", RSYNC_PACKAGE_ATOM)
 
-	def _select_download_dir(self):
-		'''
-		Select and return the download directory. It's desirable to be able
-		to create shared hardlinks between the download directory to the
-		normal repository, and this is facilitated by making the download
-		directory be a subdirectory of the normal repository location
-		(ensuring that no mountpoints are crossed). Shared hardlinks are
-		created by using the rsync --link-dest option.
-
-		Since the download is initially unverified, it is safest to save
-		it in a quarantine directory. The quarantine directory is also
-		useful for making the repository update more atomic, so that it
-		less likely that normal repository location will be observed in
-		a partially synced state.
-
-		This method returns a quarantine directory if sync-allow-hardlinks
-		is enabled in repos.conf, and otherwise it returne the normal
-		repository location.
-		'''
-		if self.repo.sync_allow_hardlinks:
-			return os.path.join(self.repo.location, '.tmp-unverified-download-quarantine')
-		else:
-			return self.repo.location
-
-	def _commit_download(self, download_dir):
-		'''
-		Commit changes from download_dir if it does not refer to the
-		normal repository location.
-		'''
-		exitcode = 0
-		if self.repo.location != download_dir:
-			rsynccommand = [self.bin_command] + self.rsync_opts + self.extra_rsync_opts
-			rsynccommand.append('--exclude=/%s' % os.path.basename(download_dir))
-			rsynccommand.append('%s/' % download_dir.rstrip('/'))
-			rsynccommand.append('%s/' % self.repo.location)
-			exitcode = portage.process.spawn(rsynccommand, **self.spawn_kwargs)
-
-		return exitcode
-
-	def _remove_download(self, download_dir):
-		"""
-		Remove download_dir if it does not refer to the normal repository
-		location.
-		"""
-		exitcode = 0
-		if self.repo.location != download_dir:
-			exitcode = subprocess.call(['rm', '-rf', download_dir])
-		return exitcode
-
 	def update(self):
 		'''Internal update function which performs the transfer'''
 		opts = self.options.get('emerge_config').opts
@@ -143,8 +94,8 @@ class RsyncSync(NewBase):
 			self.extra_rsync_opts.extend(portage.util.shlex_split(
 				self.repo.module_specific_options['sync-rsync-extra-opts']))
 
-		download_dir = self._select_download_dir()
 		exitcode = 0
+		verify_failure = False
 
 		# Process GLEP74 verification options.
 		# Default verification to 'no'; it's enabled for ::gentoo
@@ -240,10 +191,14 @@ class RsyncSync(NewBase):
 				self.proto = "file"
 				dosyncuri = syncuri[7:]
 				unchanged, is_synced, exitcode, updatecache_flg = self._do_rsync(
-					dosyncuri, timestamp, opts, download_dir)
+					dosyncuri, timestamp, opts)
 				self._process_exitcode(exitcode, dosyncuri, out, 1)
-				if exitcode == 0 and not unchanged:
-					self._commit_download(download_dir)
+				if exitcode == 0:
+					if unchanged:
+						self.repo_storage.abort_update()
+					else:
+						self.repo_storage.commit_update()
+						self.repo_storage.garbage_collection()
 				return (exitcode, updatecache_flg)
 
 			retries=0
@@ -375,7 +330,7 @@ class RsyncSync(NewBase):
 					dosyncuri = dosyncuri[6:].replace('/', ':/', 1)
 
 				unchanged, is_synced, exitcode, updatecache_flg = self._do_rsync(
-					dosyncuri, timestamp, opts, download_dir)
+					dosyncuri, timestamp, opts)
 				if not unchanged:
 					local_state_unchanged = False
 				if is_synced:
@@ -390,6 +345,7 @@ class RsyncSync(NewBase):
 					# exit loop
 					exitcode = EXCEEDED_MAX_RETRIES
 					break
+
 			self._process_exitcode(exitcode, dosyncuri, out, maxretries)
 
 			if local_state_unchanged:
@@ -397,6 +353,8 @@ class RsyncSync(NewBase):
 				# in this case, so refer gemato to the normal repository
 				# location.
 				download_dir = self.repo.location
+			else:
+				download_dir = self.download_dir
 
 			# if synced successfully, verify now
 			if exitcode == 0 and self.verify_metamanifest:
@@ -448,14 +406,18 @@ class RsyncSync(NewBase):
 								% (e,),
 								level=logging.ERROR, noiselevel=-1)
 						exitcode = 1
+						verify_failure = True
 
 			if exitcode == 0 and not local_state_unchanged:
-				exitcode = self._commit_download(download_dir)
+				self.repo_storage.commit_update()
+				self.repo_storage.garbage_collection()
 
 			return (exitcode, updatecache_flg)
 		finally:
-			if exitcode == 0:
-				self._remove_download(download_dir)
+			# Don't delete the update if verification failed, in case
+			# the cause needs to be investigated.
+			if not verify_failure:
+				self.repo_storage.abort_update()
 			if openpgp_env is not None:
 				openpgp_env.close()
 
@@ -594,7 +556,7 @@ class RsyncSync(NewBase):
 		return rsync_opts
 
 
-	def _do_rsync(self, syncuri, timestamp, opts, download_dir):
+	def _do_rsync(self, syncuri, timestamp, opts):
 		updatecache_flg = False
 		is_synced = False
 		if timestamp != 0 and "--quiet" not in opts:
@@ -720,11 +682,6 @@ class RsyncSync(NewBase):
 				# actual sync
 				command = rsynccommand[:]
 
-				if self.repo.location != download_dir:
-					# Use shared hardlinks for files that are identical
-					# in the previous snapshot of the repository.
-					command.append('--link-dest=%s' % self.repo.location)
-
 				submodule_paths = self._get_submodule_paths()
 				if submodule_paths:
 					# The only way to select multiple directories to
@@ -738,7 +695,7 @@ class RsyncSync(NewBase):
 				else:
 					command.append(syncuri + "/")
 
-				command.append(download_dir)
+				command.append(self.download_dir)
 
 				exitcode = None
 				try:
diff --git a/lib/portage/sync/syncbase.py b/lib/portage/sync/syncbase.py
index ce69a4fc0..1d2a00b7c 100644
--- a/lib/portage/sync/syncbase.py
+++ b/lib/portage/sync/syncbase.py
@@ -15,6 +15,7 @@ import portage
 from portage.util import writemsg_level
 from portage.util._eventloop.global_event_loop import global_event_loop
 from portage.util.backoff import RandomExponentialBackoff
+from portage.util.futures._sync_decorator import _sync_methods
 from portage.util.futures.retry import retry
 from portage.util.futures.executor.fork import ForkExecutor
 from . import _SUBMODULE_PATH_MAP
@@ -40,6 +41,8 @@ class SyncBase(object):
 		self.repo = None
 		self.xterm_titles = None
 		self.spawn_kwargs = None
+		self.repo_storage = None
+		self._download_dir = None
 		self.bin_command = None
 		self._bin_command = bin_command
 		self.bin_pkg = bin_pkg
@@ -72,7 +75,35 @@ class SyncBase(object):
 		self.repo = self.options.get('repo', None)
 		self.xterm_titles = self.options.get('xterm_titles', False)
 		self.spawn_kwargs = self.options.get('spawn_kwargs', None)
+		storage_cls = portage.load_mod(self._select_storage_module())
+		self.repo_storage = _sync_methods(storage_cls(self.repo, self.spawn_kwargs))
 
+	def _select_storage_module(self):
+		'''
+		Select an appropriate implementation of RepoStorageInterface, based
+		on repos.conf settings.
+
+		@rtype: str
+		@return: name of the selected repo storage constructor
+		'''
+		if self.repo.sync_allow_hardlinks:
+			mod_name = 'portage.repository.storage.hardlink_quarantine.HardlinkQuarantineRepoStorage'
+		else:
+			mod_name = 'portage.repository.storage.inplace.InplaceRepoStorage'
+		return mod_name
+
+	@property
+	def download_dir(self):
+		"""
+		Get the path of the download directory, where the repository
+		update is staged. The directory is initialized lazily, since
+		the repository might already be at the latest revision, and
+		there may be some cost associated with the directory
+		initialization.
+		"""
+		if self._download_dir is None:
+			self._download_dir = self.repo_storage.init_update()
+		return self._download_dir
 
 	def exists(self, **kwargs):
 		'''Tests whether the repo actually exists'''
-- 
2.16.4



^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [gentoo-portage-dev] [PATCH 4/4] Add sync-rcu support for rsync (bug 662070)
  2018-08-06  7:40 [gentoo-portage-dev] [PATCH 0/4] Add sync-rcu support for rsync (bug 662070) Zac Medico
                   ` (2 preceding siblings ...)
  2018-08-06  7:40 ` [gentoo-portage-dev] [PATCH 3/4] rsync: split out repo storage framework Zac Medico
@ 2018-08-06  7:40 ` Zac Medico
  2018-08-10  0:11 ` [gentoo-portage-dev] [PATCH 0/4] " Brian Dolbec
  4 siblings, 0 replies; 7+ messages in thread
From: Zac Medico @ 2018-08-06  7:40 UTC (permalink / raw
  To: gentoo-portage-dev; +Cc: Zac Medico

Add a boolean sync-rcu repos.conf setting that behaves as follows:

    Enable read-copy-update (RCU) behavior for sync operations. The
    current latest immutable version of a repository will be referenced
    by a symlink found where the repository would normally be located
    (see the location setting). Repository consumers should resolve
    the cannonical path of this symlink before attempt to access
    the repository, and all operations should be read-only, since
    the repository is considered immutable. Updates occur by atomic
    replacement of the symlink, which causes new consumers to use the
    new immutable version, while any earlier consumers continue to
    use the cannonical path that was resolved earlier. This option
    requires sync-allow-hardlinks and sync-rcu-store-dir options to
    be enabled, and currently also requires that sync-type is set
    to rsync. This option is disabled by default, since the symlink
    usage would require special handling for scenarios involving bind
    mounts and chroots.

Bug: https://bugs.gentoo.org/662070
---
 lib/portage/repository/config.py               |  36 +++-
 lib/portage/repository/storage/hardlink_rcu.py | 251 +++++++++++++++++++++++++
 lib/portage/sync/syncbase.py                   |   4 +-
 man/portage.5                                  |  35 ++++
 4 files changed, 323 insertions(+), 3 deletions(-)
 create mode 100644 lib/portage/repository/storage/hardlink_rcu.py

diff --git a/lib/portage/repository/config.py b/lib/portage/repository/config.py
index f790f9392..8cdc2a696 100644
--- a/lib/portage/repository/config.py
+++ b/lib/portage/repository/config.py
@@ -84,7 +84,7 @@ class RepoConfig(object):
 		'profile_formats', 'sign_commit', 'sign_manifest', 'strict_misc_digests',
 		'sync_depth', 'sync_hooks_only_on_change',
 		'sync_type', 'sync_umask', 'sync_uri', 'sync_user', 'thin_manifest',
-		'update_changelog', '_eapis_banned', '_eapis_deprecated',
+		'update_changelog', 'user_location', '_eapis_banned', '_eapis_deprecated',
 		'_masters_orig', 'module_specific_options', 'manifest_required_hashes',
 		'sync_allow_hardlinks',
 		'sync_openpgp_key_path',
@@ -93,6 +93,10 @@ class RepoConfig(object):
 		'sync_openpgp_key_refresh_retry_delay_exp_base',
 		'sync_openpgp_key_refresh_retry_delay_mult',
 		'sync_openpgp_key_refresh_retry_overall_timeout',
+		'sync_rcu',
+		'sync_rcu_store_dir',
+		'sync_rcu_spare_snapshots',
+		'sync_rcu_ttl_days',
 		)
 
 	def __init__(self, name, repo_opts, local_config=True):
@@ -198,6 +202,22 @@ class RepoConfig(object):
 			'sync_openpgp_key_refresh_retry_overall_timeout'):
 			setattr(self, k, repo_opts.get(k.replace('_', '-'), None))
 
+		self.sync_rcu = repo_opts.get(
+			'sync-rcu', 'false').lower() in ('true', 'yes')
+
+		self.sync_rcu_store_dir = repo_opts.get('sync-rcu-store-dir')
+
+		for k in ('sync-rcu-spare-snapshots', 'sync-rcu-ttl-days'):
+			v = repo_opts.get(k, '').strip() or None
+			if v:
+				try:
+					v = int(v)
+				except (OverflowError, ValueError):
+					writemsg(_("!!! Invalid %s setting for repo"
+						" %s: %s\n") % (k, name, v), noiselevel=-1)
+					v = None
+			setattr(self, k.replace('-', '_'), v)
+
 		self.module_specific_options = {}
 
 		# Not implemented.
@@ -206,9 +226,14 @@ class RepoConfig(object):
 			format = format.strip()
 		self.format = format
 
+		self.user_location = None
 		location = repo_opts.get('location')
 		if location is not None and location.strip():
 			if os.path.isdir(location) or portage._sync_mode:
+				# The user_location is required for sync-rcu support,
+				# since it manages a symlink which resides at that
+				# location (and realpath is irreversible).
+				self.user_location = location
 				location = os.path.realpath(location)
 		else:
 			location = None
@@ -542,6 +567,10 @@ class RepoConfigLoader(object):
 							'sync_openpgp_key_refresh_retry_delay_exp_base',
 							'sync_openpgp_key_refresh_retry_delay_mult',
 							'sync_openpgp_key_refresh_retry_overall_timeout',
+							'sync_rcu',
+							'sync_rcu_store_dir',
+							'sync_rcu_spare_snapshots',
+							'sync_rcu_ttl_days',
 							'sync_type', 'sync_umask', 'sync_uri', 'sync_user',
 							'module_specific_options'):
 							v = getattr(repos_conf_opts, k, None)
@@ -962,7 +991,7 @@ class RepoConfigLoader(object):
 		return repo_name in self.prepos
 
 	def config_string(self):
-		bool_keys = ("strict_misc_digests", "sync_allow_hardlinks")
+		bool_keys = ("strict_misc_digests", "sync_allow_hardlinks", "sync_rcu")
 		str_or_int_keys = ("auto_sync", "clone_depth", "format", "location",
 			"main_repo", "priority", "sync_depth", "sync_openpgp_key_path",
 			"sync_openpgp_key_refresh_retry_count",
@@ -970,6 +999,9 @@ class RepoConfigLoader(object):
 			"sync_openpgp_key_refresh_retry_delay_exp_base",
 			"sync_openpgp_key_refresh_retry_delay_mult",
 			"sync_openpgp_key_refresh_retry_overall_timeout",
+			"sync_rcu_store_dir",
+			"sync_rcu_spare_snapshots",
+			"sync_rcu_ttl_days",
 			"sync_type", "sync_umask", "sync_uri", 'sync_user')
 		str_tuple_keys = ("aliases", "eclass_overrides", "force")
 		repo_config_tuple_keys = ("masters",)
diff --git a/lib/portage/repository/storage/hardlink_rcu.py b/lib/portage/repository/storage/hardlink_rcu.py
new file mode 100644
index 000000000..80cdbb0d7
--- /dev/null
+++ b/lib/portage/repository/storage/hardlink_rcu.py
@@ -0,0 +1,251 @@
+# Copyright 2018 Gentoo Foundation
+# Distributed under the terms of the GNU General Public License v2
+
+import datetime
+
+import portage
+from portage import os
+from portage.repository.storage.interface import (
+	RepoStorageException,
+	RepoStorageInterface,
+)
+from portage.util.futures import asyncio
+from portage.util.futures.compat_coroutine import (
+	coroutine,
+	coroutine_return,
+)
+
+from _emerge.SpawnProcess import SpawnProcess
+
+
+class HardlinkRcuRepoStorage(RepoStorageInterface):
+	"""
+	Enable read-copy-update (RCU) behavior for sync operations. The
+	current latest immutable version of a repository will be
+	reference by a symlink found where the repository would normally
+	be located.  Repository consumers should resolve the cannonical
+	path of this symlink before attempt to access the repository,
+	and all operations should be read-only, since the repository
+	is considered immutable. Updates occur by atomic replacement
+	of the symlink, which causes new consumers to use the new
+	immutable version, while any earlier consumers continue to use
+	the cannonical path that was resolved earlier.
+
+	Performance is better than HardlinkQuarantineRepoStorage,
+	since commit involves atomic replacement of a symlink. Since
+	the symlink usage would require special handling for scenarios
+	involving bind mounts and chroots, this module is not enabled
+	by default.
+
+	repos.conf parameters:
+
+		sync-rcu-store-dir
+
+			Directory path reserved for sync-rcu storage. This
+			directory must have a unique value for each repository
+			(do not set it in the DEFAULT section).  This directory
+			must not contain any other files or directories aside
+			from those that are created automatically when sync-rcu
+			is enabled.
+
+		sync-rcu-spare-snapshots = 1
+
+			Number of spare snapshots for sync-rcu to retain with
+			expired ttl. This protects the previous latest snapshot
+			from being removed immediately after a new version
+			becomes available, since it might still be used by
+			running processes.
+
+		sync-rcu-ttl-days = 7
+
+			Number of days for sync-rcu to retain previous immutable
+			snapshots of a repository. After the ttl of a particular
+			snapshot has expired, it will be remove automatically (the
+			latest snapshot is exempt, and sync-rcu-spare-snapshots
+			configures the number of previous snapshots that are
+			exempt). If the ttl is set too low, then a snapshot could
+			expire while it is in use by a running process.
+
+	"""
+	def __init__(self, repo, spawn_kwargs):
+		# Note that repo.location cannot substitute for repo.user_location here,
+		# since we manage a symlink that resides at repo.user_location, and
+		# repo.location is the irreversible result of realpath(repo.user_location).
+		self._user_location = repo.user_location
+		self._spawn_kwargs = spawn_kwargs
+
+		if not repo.sync_allow_hardlinks:
+			raise RepoStorageException("repos.conf sync-rcu setting"
+				" for repo '%s' requires that sync-allow-hardlinks be enabled" % repo.name)
+
+		# Raise an exception if repo.sync_rcu_store_dir is unset, since the
+		# user needs to be aware of this location for bind mount and chroot
+		# scenarios
+		if not repo.sync_rcu_store_dir:
+			raise RepoStorageException("repos.conf sync-rcu setting"
+				" for repo '%s' requires that sync-rcu-store-dir be set" % repo.name)
+
+		self._storage_location = repo.sync_rcu_store_dir
+		if repo.sync_rcu_spare_snapshots is None or repo.sync_rcu_spare_snapshots < 0:
+			self._spare_snapshots = 1
+		else:
+			self._spare_snapshots = repo.sync_rcu_spare_snapshots
+		if self._spare_snapshots < 0:
+			self._spare_snapshots = 0
+		if repo.sync_rcu_ttl_days is None or repo.sync_rcu_ttl_days < 0:
+			self._ttl_days = 1
+		else:
+			self._ttl_days = repo.sync_rcu_ttl_days
+		self._update_location = None
+		self._latest_symlink = os.path.join(self._storage_location, 'latest')
+		self._latest_canonical = os.path.realpath(self._latest_symlink)
+		if not os.path.exists(self._latest_canonical) or os.path.islink(self._latest_canonical):
+			# It doesn't exist, or it's a broken symlink.
+			self._latest_canonical = None
+		self._snapshots_dir = os.path.join(self._storage_location, 'snapshots')
+
+	@coroutine
+	def _check_call(self, cmd, privileged=False):
+		"""
+		Run cmd and raise RepoStorageException on failure.
+
+		@param cmd: command to executre
+		@type cmd: list
+		@param privileged: run with maximum privileges
+		@type privileged: bool
+		"""
+		if privileged:
+			kwargs = dict(fd_pipes=self._spawn_kwargs.get('fd_pipes'))
+		else:
+			kwargs = self._spawn_kwargs
+		p = SpawnProcess(args=cmd, scheduler=asyncio._wrap_loop(), **kwargs)
+		p.start()
+		if (yield p.async_wait()) != os.EX_OK:
+			raise RepoStorageException('command exited with status {}: {}'.\
+				format(p.returncode, ' '.join(cmd)))
+
+	@coroutine
+	def init_update(self):
+		update_location = os.path.join(self._storage_location, 'update')
+		yield self._check_call(['rm', '-rf', update_location])
+
+		# This assumes normal umask permissions if it doesn't exist yet.
+		portage.util.ensure_dirs(self._storage_location)
+
+		if self._latest_canonical is not None:
+			portage.util.ensure_dirs(update_location)
+			portage.util.apply_stat_permissions(update_location,
+				os.stat(self._user_location))
+			# Use  rsync --link-dest to hardlink a files into update_location,
+			# since cp -l is not portable.
+			yield self._check_call(['rsync', '-a', '--link-dest', self._latest_canonical,
+				self._latest_canonical + '/', update_location + '/'])
+
+		elif not os.path.islink(self._user_location):
+			yield self._migrate(update_location)
+			update_location = (yield self.init_update())
+
+		self._update_location = update_location
+
+		coroutine_return(self._update_location)
+
+	@coroutine
+	def _migrate(self, update_location):
+		"""
+		When repo.user_location is a normal directory, migrate it to
+		storage so that it can be replaced with a symlink. After migration,
+		commit the content as the latest snapshot.
+		"""
+		try:
+			os.rename(self._user_location, update_location)
+		except OSError:
+			portage.util.ensure_dirs(update_location)
+			portage.util.apply_stat_permissions(update_location,
+				os.stat(self._user_location))
+			# It's probably on a different device, so copy it.
+			yield self._check_call(['rsync', '-a',
+				self._user_location + '/', update_location + '/'])
+
+			# Remove the old copy so that symlink can be created. Run with
+			# maximum privileges, since removal requires write access to
+			# the parent directory.
+			yield self._check_call(['rm', '-rf', user_location], privileged=True)
+
+		self._update_location = update_location
+
+		# Make this copy the latest snapshot
+		yield self.commit_update()
+
+	@property
+	def current_update(self):
+		if self._update_location is None:
+			raise RepoStorageException('current update does not exist')
+		return self._update_location
+
+	@coroutine
+	def commit_update(self):
+		update_location = self.current_update
+		self._update_location = None
+		try:
+			snapshots = [int(name) for name in os.listdir(self._snapshots_dir)]
+		except OSError:
+			snapshots = []
+			portage.util.ensure_dirs(self._snapshots_dir)
+			portage.util.apply_stat_permissions(self._snapshots_dir,
+				os.stat(self._storage_location))
+		if snapshots:
+			new_id = max(snapshots) + 1
+		else:
+			new_id = 1
+		os.rename(update_location, os.path.join(self._snapshots_dir, str(new_id)))
+		new_symlink = self._latest_symlink + '.new'
+		try:
+			os.unlink(new_symlink)
+		except OSError:
+			pass
+		os.symlink('snapshots/{}'.format(new_id), new_symlink)
+		os.rename(new_symlink, self._latest_symlink)
+
+		try:
+			user_location_correct = os.path.samefile(self._user_location, self._latest_symlink)
+		except OSError:
+			user_location_correct = False
+
+		if not user_location_correct:
+			new_symlink = self._user_location + '.new'
+			try:
+				os.unlink(new_symlink)
+			except OSError:
+				pass
+			os.symlink(self._latest_symlink, new_symlink)
+			os.rename(new_symlink, self._user_location)
+
+		coroutine_return()
+		yield None
+
+	@coroutine
+	def abort_update(self):
+		if self._update_location is not None:
+			update_location = self._update_location
+			self._update_location = None
+			yield self._check_call(['rm', '-rf', update_location])
+
+	@coroutine
+	def garbage_collection(self):
+		snap_ttl = datetime.timedelta(days=self._ttl_days)
+		snapshots = sorted(int(name) for name in os.listdir(self._snapshots_dir))
+		# always preserve the latest snapshot
+		protect_count = self._spare_snapshots + 1
+		while snapshots and protect_count:
+			protect_count -= 1
+			snapshots.pop()
+		for snap_id in snapshots:
+			snap_path = os.path.join(self._snapshots_dir, str(snap_id))
+			try:
+				st = os.stat(snap_path)
+			except OSError:
+				continue
+			snap_timestamp = datetime.datetime.utcfromtimestamp(st.st_mtime)
+			if (datetime.datetime.utcnow() - snap_timestamp) < snap_ttl:
+				continue
+			yield self._check_call(['rm', '-rf', snap_path])
diff --git a/lib/portage/sync/syncbase.py b/lib/portage/sync/syncbase.py
index 1d2a00b7c..5d9455f93 100644
--- a/lib/portage/sync/syncbase.py
+++ b/lib/portage/sync/syncbase.py
@@ -86,7 +86,9 @@ class SyncBase(object):
 		@rtype: str
 		@return: name of the selected repo storage constructor
 		'''
-		if self.repo.sync_allow_hardlinks:
+		if self.repo.sync_rcu:
+			mod_name = 'portage.repository.storage.hardlink_rcu.HardlinkRcuRepoStorage'
+		elif self.repo.sync_allow_hardlinks:
 			mod_name = 'portage.repository.storage.hardlink_quarantine.HardlinkQuarantineRepoStorage'
 		else:
 			mod_name = 'portage.repository.storage.inplace.InplaceRepoStorage'
diff --git a/man/portage.5 b/man/portage.5
index cd9d5036d..20f9aae7a 100644
--- a/man/portage.5
+++ b/man/portage.5
@@ -1025,6 +1025,41 @@ If set to true, then sync of a given repository will not trigger postsync
 hooks unless hooks would have executed for a master repository or the
 repository has changed since the previous sync operation.
 .TP
+.B sync\-rcu = yes|no
+Enable read\-copy\-update (RCU) behavior for sync operations. The current
+latest immutable version of a repository will be referenced by a symlink
+found where the repository would normally be located (see the \fBlocation\fR
+setting). Repository consumers should resolve the cannonical path of this
+symlink before attempt to access the repository, and all operations should
+be read\-only, since the repository is considered immutable. Updates occur
+by atomic replacement of the symlink, which causes new consumers to use the
+new immutable version, while any earlier consumers continue to use the
+cannonical path that was resolved earlier. This option requires
+sync\-allow\-hardlinks and sync\-rcu\-store\-dir options to be enabled, and
+currently also requires that sync\-type is set to rsync. This option is
+disabled by default, since the symlink usage would require special handling
+for scenarios involving bind mounts and chroots.
+.TP
+.B sync\-rcu\-store\-dir
+Directory path reserved for sync\-rcu storage. This directory must have a
+unique value for each repository (do not set it in the DEFAULT section).
+This directory must not contain any other files or directories aside from
+those that are created automatically when sync\-rcu is enabled.
+.TP
+.B sync\-rcu\-spare\-snapshots = 1
+Number of spare snapshots for sync\-rcu to retain with expired ttl. This
+protects the previous latest snapshot from being removed immediately after
+a new version becomes available, since it might still be used by running
+processes.
+.TP
+.B sync\-rcu\-ttl\-days = 7
+Number of days for sync\-rcu to retain previous immutable snapshots of
+a repository. After the ttl of a particular snapshot has expired, it
+will be remove automatically (the latest snapshot is exempt, and
+sync\-rcu\-spare\-snapshots configures the number of previous snapshots
+that are exempt). If the ttl is set too low, then a snapshot could
+expire while it is in use by a running process.
+.TP
 .B sync\-type
 Specifies type of synchronization performed by `emerge \-\-sync`.
 .br
-- 
2.16.4



^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [gentoo-portage-dev] [PATCH 3/4] rsync: split out repo storage framework
  2018-08-06  7:40 ` [gentoo-portage-dev] [PATCH 3/4] rsync: split out repo storage framework Zac Medico
@ 2018-08-10  0:10   ` Brian Dolbec
  0 siblings, 0 replies; 7+ messages in thread
From: Brian Dolbec @ 2018-08-10  0:10 UTC (permalink / raw
  To: gentoo-portage-dev

On Mon,  6 Aug 2018 00:40:32 -0700
Zac Medico <zmedico@gentoo.org> wrote:

> Since there aremany ways to manage repository storage, split out a
> repo storage framework. The HardlinkQuarantineRepoStorage class
> implements the existing default behavior, and the InplaceRepoStorage
> class implements the legacy behavior (when sync-allow-hardlinks is
> disabled in repos.conf).
> 
> Each class implements RepoStorageInterface, which uses coroutine
> methods since coroutines are well-suited to the I/O bound tasks that
> these methods perform. The _sync_decorator is used to convert
> coroutine methods to synchronous methods, for smooth integration into
> the surrounding synchronous code.
> 
> Bug: https://bugs.gentoo.org/662070
>

missing space in first line of commit message
s/aremany/are many

 ---


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [gentoo-portage-dev] [PATCH 0/4] Add sync-rcu support for rsync (bug 662070)
  2018-08-06  7:40 [gentoo-portage-dev] [PATCH 0/4] Add sync-rcu support for rsync (bug 662070) Zac Medico
                   ` (3 preceding siblings ...)
  2018-08-06  7:40 ` [gentoo-portage-dev] [PATCH 4/4] Add sync-rcu support for rsync (bug 662070) Zac Medico
@ 2018-08-10  0:11 ` Brian Dolbec
  4 siblings, 0 replies; 7+ messages in thread
From: Brian Dolbec @ 2018-08-10  0:11 UTC (permalink / raw
  To: gentoo-portage-dev

On Mon,  6 Aug 2018 00:40:29 -0700
Zac Medico <zmedico@gentoo.org> wrote:

> Add a boolean sync-rcu repos.conf setting that behaves as follows:
> 
> sync-rcu = yes|no
> 
>     Enable read-copy-update (RCU) behavior for sync operations. The
>     current latest immutable version of a repository will be
> referenced by a symlink found where the repository would normally be
> located (see the location setting). Repository consumers should
> resolve the cannonical path of this symlink before attempt to access
>     the repository, and all operations should be read-only, since
>     the repository is considered immutable. Updates occur by atomic
>     replacement of the symlink, which causes new consumers to use the
>     new immutable version, while any earlier consumers continue to
>     use the cannonical path that was resolved earlier. This option
>     requires sync-allow-hardlinks and sync-rcu-store-dir options to
>     be enabled, and currently also requires that sync-type is set
>     to rsync. This option is disabled by default, since the symlink
>     usage would require special handling for scenarios involving bind
>     mounts and chroots.
> 
> sync-rcu-store-dir
> 
>     Directory path reserved for sync-rcu storage. This directory must
>     have a unique value for each repository (do not set it in the
>     DEFAULT section).  This directory must not contain any other files
>     or directories aside from those that are created automatically
>     when sync-rcu is enabled.
> 
> sync-rcu-spare-snapshots = 1
> 
>     Number of spare snapshots for sync-rcu to retain with expired
>     ttl. This protects the previous latest snapshot from being removed
>     immediately after a new version becomes available, since it might
>     still be used by running processes.
> 
> sync-rcu-ttl-days = 7
> 
>     Number of days for sync-rcu to retain previous immutable snapshots
>     of a repository. After the ttl of a particular snapshot has
>     expired, it will be remove automatically (the latest snapshot
>     is exempt, and sync-rcu-spare-snapshots configures the number of
>     previous snapshots that are exempt). If the ttl is set too low,
>     then a snapshot could expire while it is in use by a running
>     process.
> 
> Zac Medico (4):
>   Implement asyncio.iscoroutinefunction for compat_coroutine
>   Add _sync_decorator module
>   rsync: split out repo storage framework
>   Add sync-rcu support for rsync (bug 662070)
> 
>  lib/portage/repository/config.py                   |  36 ++-
>  lib/portage/repository/storage/__init__.py         |   0
>  .../repository/storage/hardlink_quarantine.py      |  95 ++++++++
>  lib/portage/repository/storage/hardlink_rcu.py     | 251
> +++++++++++++++++++++
> lib/portage/repository/storage/inplace.py          |  49 ++++
> lib/portage/repository/storage/interface.py        |  87 +++++++
> lib/portage/sync/controller.py                     |   1 +
> lib/portage/sync/modules/rsync/rsync.py            |  85 ++-----
> lib/portage/sync/syncbase.py                       |  33
> +++ .../tests/util/futures/test_compat_coroutine.py    |  14 ++
> lib/portage/util/futures/_asyncio/__init__.py      |  14 ++
> lib/portage/util/futures/_sync_decorator.py        |  54 +++++
> lib/portage/util/futures/compat_coroutine.py       |  12 +
> man/portage.5                                      |  35 +++ 14 files
> changed, 700 insertions(+), 66 deletions(-) create mode 100644
> lib/portage/repository/storage/__init__.py create mode 100644
> lib/portage/repository/storage/hardlink_quarantine.py create mode
> 100644 lib/portage/repository/storage/hardlink_rcu.py create mode
> 100644 lib/portage/repository/storage/inplace.py create mode 100644
> lib/portage/repository/storage/interface.py create mode 100644
> lib/portage/util/futures/_sync_decorator.py
> 

series looks good, just the one typo


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2018-08-10  0:11 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-08-06  7:40 [gentoo-portage-dev] [PATCH 0/4] Add sync-rcu support for rsync (bug 662070) Zac Medico
2018-08-06  7:40 ` [gentoo-portage-dev] [PATCH 1/4] Implement asyncio.iscoroutinefunction for compat_coroutine Zac Medico
2018-08-06  7:40 ` [gentoo-portage-dev] [PATCH 2/4] Add _sync_decorator module Zac Medico
2018-08-06  7:40 ` [gentoo-portage-dev] [PATCH 3/4] rsync: split out repo storage framework Zac Medico
2018-08-10  0:10   ` Brian Dolbec
2018-08-06  7:40 ` [gentoo-portage-dev] [PATCH 4/4] Add sync-rcu support for rsync (bug 662070) Zac Medico
2018-08-10  0:11 ` [gentoo-portage-dev] [PATCH 0/4] " Brian Dolbec

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox