public inbox for gentoo-portage-dev@lists.gentoo.org
 help / color / mirror / Atom feed
From: "W. Trevor King" <wking@tremily.us>
To: gentoo-portage-dev@lists.gentoo.org
Cc: Rafael Goncalves Martins <rafaelmartins@gentoo.org>,
	"W. Trevor King" <wking@tremily.us>
Subject: [gentoo-portage-dev] [PATCH v3 3/4] pym/portage/package/ebuild/fetch.py: Factor out _get_uris
Date: Sun, 19 Jan 2014 19:26:09 -0800	[thread overview]
Message-ID: <2b4c89a8be60c042d9afd2fb4151b9736afe36ac.1390187967.git.wking@tremily.us> (raw)
In-Reply-To: <cover.1390187967.git.wking@tremily.us>
In-Reply-To: <cover.1390187967.git.wking@tremily.us>

The current fetch() function is quite long, which makes it hard to
know what I can change without adverse side effects.  By pulling this
logic out of the main function, we get clearer logic in fetch() and
more explicit input for the config extraction.

This block was especially complicated, so I also created the helper
functions _get_file_uri_tuples and _expand_mirror.  I'd prefer if
_expand_mirror iterated through URIs instead of (group, URI) tuples,
but we need a distinct marker for third-party URIs to build
third_party_mirror_uris which is used to build primaryuri_dict which
is used way down in fetch():

  if checksum_failure_count == \
      checksum_failure_primaryuri:
      # Switch to "primaryuri" mode in order
      # to increase the probablility of
      # of success.
      primaryuris = \
          primaryuri_dict.get(myfile)
          if primaryuris:
              uri_list.extend(
                  reversed(primaryuris))

I don't know if this is useful enough to motivate the uglier
_expandmirror return values, but I'll kick that can down the road for
now.

There was some discussion on the list [1] about the defaults in:

  def _get_uris(uris, settings, custom_mirrors=(), locations=()):

but I prefer this to Alec's floated:

  def _get_uris(uris, settings, custom_mirrors=None, locations=None):
      if not custom_mirrors:
          custom_mirrors = ()
      if not locations:
          locations = ()

Which accomplishes the same thing with less clarity ;).  My default
values are tuples (and not mutable lists) to make it extra-obvious
that we're relying on anything crazy like mutating our default values
;).

There was also discussion about whether the ugly settings object
should be passed to _get_uris, or whether the appropriate settings
should be extracted first and then passed through [2].  I've passed
settings through, because I'd prefer to have the uglyness handled in
helper functions.  The fetch() function is what most folks will care
about, so we want to keep that as clean as possible.  If the helper
functions have to suffer a bit for a cleaner fetch(), so be it.

[1]: http://thread.gmane.org/gmane.linux.gentoo.portage.devel/4002/focus=4041
[2]: http://thread.gmane.org/gmane.linux.gentoo.portage.devel/4002/focus=4042
---
 pym/portage/package/ebuild/fetch.py    | 208 ++++++++++++++++++++-------------
 pym/portage/tests/ebuild/test_fetch.py | 121 +++++++++++++++++++
 2 files changed, 248 insertions(+), 81 deletions(-)

diff --git a/pym/portage/package/ebuild/fetch.py b/pym/portage/package/ebuild/fetch.py
index 4ecefc9..0093a6e 100644
--- a/pym/portage/package/ebuild/fetch.py
+++ b/pym/portage/package/ebuild/fetch.py
@@ -15,9 +15,9 @@ import sys
 import tempfile
 
 try:
-	from urllib.parse import urlparse
+	from urllib.parse import urlparse, urlunparse
 except ImportError:
-	from urlparse import urlparse
+	from urlparse import urlparse, urlunparse
 
 import portage
 portage.proxy.lazyimport.lazyimport(globals(),
@@ -301,6 +301,128 @@ def _get_fetch_resume_size(settings, default='350K'):
 	return v
 
 
+def _get_file_uri_tuples(uris):
+	"""Return a list of (filename, URI) tuples."""
+	file_uri_tuples = []
+	# Check for 'items' attribute since OrderedDict is not a dict.
+	if hasattr(uris, 'items'):
+		for filename, uri_set in uris.items():
+			for uri in uri_set:
+				file_uri_tuples.append((filename, uri))
+			if not uri_set:
+				file_uri_tuples.append((filename, None))
+	else:
+		for uri in uris:
+			if urlparse(uri).scheme:
+				file_uri_tuples.append(
+					(os.path.basename(uri), uri))
+			else:
+				file_uri_tuples.append(
+					(os.path.basename(uri), None))
+	return file_uri_tuples
+
+
+def _expand_mirror(uri, custom_mirrors=(), third_party_mirrors=()):
+	"""
+	Replace the 'mirror://' scheme and netloc in the URI.
+
+	Returns an iterable listing expanded (group, URI) tuples,
+	where the group is either 'custom' or 'third-party'.
+	"""
+	parsed = urlparse(uri)
+	mirror = parsed.netloc
+	path = parsed.path
+	if path:
+		# Try user-defined mirrors first
+		if mirror in custom_mirrors:
+			for cmirr in custom_mirrors[mirror]:
+				m_uri = urlparse(cmirr)
+				yield ('custom', urlunparse((
+					m_uri.scheme, m_uri.netloc, path) +
+					parsed[3:]))
+
+		# now try the official mirrors
+		if mirror in third_party_mirrors:
+			uris = []
+			for locmirr in third_party_mirrors[mirror]:
+				m_uri = urlparse(locmirr)
+				uris.append(urlunparse((
+					m_uri.scheme, m_uri.netloc, path) +
+					parsed[3:]))
+			random.shuffle(uris)
+			for uri in uris:
+				yield ('third-party', uri)
+
+		if (not custom_mirrors.get(mirror, []) and
+				not third_party_mirrors.get(mirror, [])):
+			writemsg(
+				_("No known mirror by the name: %s\n")
+				% mirror)
+	else:
+		writemsg(_("Invalid mirror definition in SRC_URI:\n"),
+			 noiselevel=-1)
+		writemsg("  %s\n" % uri, noiselevel=-1)
+
+
+def _get_uris(uris, settings, custom_mirrors=(), locations=()):
+	restrict = settings.get("PORTAGE_RESTRICT", "").split()
+	restrict_fetch = "fetch" in restrict
+	restrict_mirror = "mirror" in restrict or "nomirror" in restrict
+	force_mirror = (
+		"force-mirror" in settings.features and
+		not restrict_mirror)
+
+	third_party_mirrors = settings.thirdpartymirrors()
+	third_party_mirror_uris = {}
+	filedict = OrderedDict()
+	primaryuri_dict = {}
+	for filename, uri in _get_file_uri_tuples(uris=uris):
+		if filename not in filedict:
+			filedict[filename] = [
+				os.path.join(location, 'distfiles', filename)
+				for location in locations]
+		if uri is None:
+			continue
+		if uri.startswith('mirror://'):
+			expanded_uris = _expand_mirror(
+				uri=uri, custom_mirrors=custom_mirrors,
+				third_party_mirrors=third_party_mirrors)
+			filedict[filename].extend(
+				uri for _, uri in expanded_uris)
+			third_party_mirror_uris.setdefault(filename, []).extend(
+				uri for group, uri in expanded_uris
+				if group == 'third-party')
+		else:
+			if restrict_fetch or force_mirror:
+				# Only fetch from specific mirrors is allowed.
+				continue
+			primaryuris = primaryuri_dict.get(filename)
+			if primaryuris is None:
+				primaryuris = []
+				primaryuri_dict[filename] = primaryuris
+			primaryuris.append(uri)
+
+	# Order primaryuri_dict values to match that in SRC_URI.
+	for uris in primaryuri_dict.values():
+		uris.reverse()
+
+	# Prefer third_party_mirrors over normal mirrors in cases when
+	# the file does not yet exist on the normal mirrors.
+	for filename, uris in third_party_mirror_uris.items():
+		primaryuri_dict.setdefault(filename, []).extend(uris)
+
+	# Now merge primaryuri values into filedict (includes mirrors
+	# explicitly referenced in SRC_URI).
+	if "primaryuri" in restrict:
+		for filename, uris in filedict.items():
+			filedict[filename] = primaryuri_dict.get(filename, []) + uris
+	else:
+		for filename in filedict:
+			filedict[filename] += primaryuri_dict.get(filename, [])
+
+	return filedict, primaryuri_dict
+
+
 def fetch(myuris, mysettings, listonly=0, fetchonly=0,
 	locks_in_subdir=".locks", use_locks=1, try_mirrors=1, digests=None,
 	allow_missing_digests=True):
@@ -332,7 +454,6 @@ def fetch(myuris, mysettings, listonly=0, fetchonly=0,
 	# couple of checksum failures, to increase the probablility
 	# of success before checksum_failure_max_tries is reached.
 	checksum_failure_primaryuri = 2
-	thirdpartymirrors = mysettings.thirdpartymirrors()
 
 	# In the background parallel-fetch process, it's safe to skip checksum
 	# verification of pre-existing files in $DISTDIR that have the correct
@@ -405,7 +526,6 @@ def fetch(myuris, mysettings, listonly=0, fetchonly=0,
 			del mymirrors[x]
 
 	restrict_fetch = "fetch" in restrict
-	force_mirror = "force-mirror" in features and not restrict_mirror
 	custom_local_mirrors = custommirrors.get("local", [])
 	if restrict_fetch:
 		# With fetch restriction, a normal uri may only be fetched from
@@ -416,83 +536,9 @@ def fetch(myuris, mysettings, listonly=0, fetchonly=0,
 	else:
 		locations = mymirrors
 
-	file_uri_tuples = []
-	# Check for 'items' attribute since OrderedDict is not a dict.
-	if hasattr(myuris, 'items'):
-		for myfile, uri_set in myuris.items():
-			for myuri in uri_set:
-				file_uri_tuples.append((myfile, myuri))
-			if not uri_set:
-				file_uri_tuples.append((myfile, None))
-	else:
-		for myuri in myuris:
-			if urlparse(myuri).scheme:
-				file_uri_tuples.append((os.path.basename(myuri), myuri))
-			else:
-				file_uri_tuples.append((os.path.basename(myuri), None))
-
-	filedict = OrderedDict()
-	primaryuri_dict = {}
-	thirdpartymirror_uris = {}
-	for myfile, myuri in file_uri_tuples:
-		if myfile not in filedict:
-			filedict[myfile]=[]
-			for y in range(0,len(locations)):
-				filedict[myfile].append(locations[y]+"/distfiles/"+myfile)
-		if myuri is None:
-			continue
-		if myuri[:9]=="mirror://":
-			eidx = myuri.find("/", 9)
-			if eidx != -1:
-				mirrorname = myuri[9:eidx]
-				path = myuri[eidx+1:]
-
-				# Try user-defined mirrors first
-				if mirrorname in custommirrors:
-					for cmirr in custommirrors[mirrorname]:
-						filedict[myfile].append(
-							cmirr.rstrip("/") + "/" + path)
-
-				# now try the official mirrors
-				if mirrorname in thirdpartymirrors:
-					uris = [locmirr.rstrip("/") + "/" + path \
-						for locmirr in thirdpartymirrors[mirrorname]]
-					random.shuffle(uris)
-					filedict[myfile].extend(uris)
-					thirdpartymirror_uris.setdefault(myfile, []).extend(uris)
-
-				if not filedict[myfile]:
-					writemsg(_("No known mirror by the name: %s\n") % (mirrorname))
-			else:
-				writemsg(_("Invalid mirror definition in SRC_URI:\n"), noiselevel=-1)
-				writemsg("  %s\n" % (myuri), noiselevel=-1)
-		else:
-			if restrict_fetch or force_mirror:
-				# Only fetch from specific mirrors is allowed.
-				continue
-			primaryuris = primaryuri_dict.get(myfile)
-			if primaryuris is None:
-				primaryuris = []
-				primaryuri_dict[myfile] = primaryuris
-			primaryuris.append(myuri)
-
-	# Order primaryuri_dict values to match that in SRC_URI.
-	for uris in primaryuri_dict.values():
-		uris.reverse()
-
-	# Prefer thirdpartymirrors over normal mirrors in cases when
-	# the file does not yet exist on the normal mirrors.
-	for myfile, uris in thirdpartymirror_uris.items():
-		primaryuri_dict.setdefault(myfile, []).extend(uris)
-
-	# Now merge primaryuri values into filedict (includes mirrors
-	# explicitly referenced in SRC_URI).
-	if "primaryuri" in restrict:
-		for myfile, uris in filedict.items():
-			filedict[myfile] = primaryuri_dict.get(myfile, []) + uris
-	else:
-		for myfile in filedict:
-			filedict[myfile] += primaryuri_dict.get(myfile, [])
+	filedict, primaryuri_dict = _get_uris(
+		uris=myuris, settings=mysettings,
+		custom_mirrors=custommirrors, locations=locations)
 
 	can_fetch=True
 
diff --git a/pym/portage/tests/ebuild/test_fetch.py b/pym/portage/tests/ebuild/test_fetch.py
index 2b06190..9e1635a 100644
--- a/pym/portage/tests/ebuild/test_fetch.py
+++ b/pym/portage/tests/ebuild/test_fetch.py
@@ -1,13 +1,27 @@
 # Copyright 1998-2013 Gentoo Foundation
 # Distributed under the terms of the GNU General Public License v2
 
+from portage import OrderedDict
 from portage.package.ebuild.fetch import (
 	_get_checksum_failure_max_tries,
 	_get_fetch_resume_size,
+	_get_file_uri_tuples,
+	_expand_mirror,
+	_get_uris,
 	)
 from portage.tests import TestCase
 
 
+class _Settings(dict):
+	"""Mock settings instance for testing."""
+	@property
+	def features(self):
+		return self['features']
+
+	def thirdpartymirrors(self):
+		return self['third-party mirrors']
+
+
 class FetchTestCase(TestCase):
 	"""
 	Test fetch and it's helper functions.
@@ -20,6 +34,26 @@ class FetchTestCase(TestCase):
 	them directly (outside of these tests).
 	"""
 
+	_third_party_mirrors = {
+		'gentoo': [
+			'http://distfiles.gentoo.org/',
+			'ftp://ftp.gentoo.org/',
+			],
+		}
+
+	_custom_mirrors = {
+		'gentoo': [
+			'http://192.168.0.1/',
+			'ftp://192.168.0.2/',
+			],
+		}
+
+	_settings = _Settings({
+			'PORTAGE_RESTRICT': '',
+			'features': [],
+			'third-party mirrors': _third_party_mirrors,
+		})
+
 	def test_get_checksum_failure_max_tries(self):
 		self.assertEqual(
 			_get_checksum_failure_max_tries(settings={}),
@@ -80,3 +114,90 @@ class FetchTestCase(TestCase):
 		self.assertEqual(
 			_get_fetch_resume_size(settings={}, default='3K'),
 			3072)
+
+	def test_get_file_uri_tuples(self):
+		self.assertEqual(
+			_get_file_uri_tuples(uris=[
+				'http://host/path',
+				'/filesystem/other-path',
+				]),
+			[
+				('path', 'http://host/path'),
+				('other-path', None),
+			])
+		uris = OrderedDict()
+		uris['path'] = {'http://host/path', 'http://other-host/path'}
+		uris['other-path'] = {}
+		self.assertEqual(
+			_get_file_uri_tuples(uris=uris),
+			[
+				('path', 'http://host/path'),
+				('path', 'http://other-host/path'),
+				('other-path', None),
+			])
+
+	def test_expand_mirror(self):
+		uris = list(_expand_mirror(
+			uri='mirror://gentoo/some/file',
+			custom_mirrors=self._custom_mirrors,
+			third_party_mirrors=self._third_party_mirrors,
+			))
+		self.assertEqual(
+			uris[:2],
+			[
+				('custom', 'http://192.168.0.1/some/file'),
+				('custom', 'ftp://192.168.0.2/some/file'),
+			])
+		self.assertEqual(
+			sorted(uris[2:]),  # de-randomize
+			[
+				('third-party',
+					'ftp://ftp.gentoo.org/some/file'),
+				('third-party',
+					'http://distfiles.gentoo.org/some/file'),
+			])
+
+
+	def test_get_uris(self):
+		files, primary_uris = _get_uris(
+			uris=[
+				'mirror://gentoo/some/file',
+				'http://example.net/my/other-file',
+				],
+			settings=self._settings,
+			custom_mirrors=self._custom_mirrors,
+			locations=[
+				'http://location.net/',
+				'ftp://location.com/path',
+				],
+			)
+		self.assertEqual(list(files.keys()), ['file', 'other-file'])
+		self.assertEqual(
+			files['file'][:4],
+			[
+				'http://location.net/distfiles/file',
+				'ftp://location.com/path/distfiles/file',
+				'http://192.168.0.1/some/file',
+				'ftp://192.168.0.2/some/file',
+			])
+		self.assertEqual(
+			sorted(files['file'][4:]),  # de-randomize
+			[
+				'ftp://ftp.gentoo.org/some/file',
+				'http://distfiles.gentoo.org/some/file',
+			])
+		self.assertEqual(
+			files['other-file'],
+			[
+				'http://location.net/distfiles/other-file',
+				'ftp://location.com/path/distfiles/other-file',
+				'http://example.net/my/other-file',
+			])
+		self.assertEqual(
+			primary_uris,
+			{
+				'file': [],
+				'other-file': [
+					'http://example.net/my/other-file',
+					],
+			})
-- 
1.8.5.2.8.g0f6c0d1



  parent reply	other threads:[~2014-01-20  3:28 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-01-19  3:07 [gentoo-portage-dev] [PATCH 0/3] Initial fetch() refactoring W. Trevor King
2014-01-19  3:07 ` [gentoo-portage-dev] [PATCH 1/3] pym/portage/package/ebuild/fetch.py: Factor out _get_checksum_failure_max_tries W. Trevor King
2014-01-20  1:26   ` Tom Wijsman
2014-01-20  1:56     ` W. Trevor King
2014-01-19  3:07 ` [gentoo-portage-dev] [PATCH 2/3] pym/portage/package/ebuild/fetch.py: Factor out _get_fetch_resume_size W. Trevor King
2014-01-20  1:41   ` Tom Wijsman
2014-01-20  2:01     ` W. Trevor King
2014-01-20  2:26       ` Tom Wijsman
2014-01-19  3:07 ` [gentoo-portage-dev] [PATCH 3/3] pym/portage/package/ebuild/fetch.py: Factor out _get_uris W. Trevor King
2014-01-19 21:36   ` Sebastian Luther
2014-01-19 21:43     ` W. Trevor King
2014-01-19 22:36   ` Alec Warner
2014-01-19 23:06     ` W. Trevor King
2014-01-19 23:31       ` W. Trevor King
2014-01-19 20:05 ` [gentoo-portage-dev] [PATCH 0/3] Initial fetch() refactoring Sebastian Luther
2014-01-19 20:58   ` W. Trevor King
2014-01-19 22:14     ` [gentoo-portage-dev] [PATCH v2 " W. Trevor King
2014-01-19 22:14       ` [gentoo-portage-dev] [PATCH v2 1/3] pym/portage/package/ebuild/fetch.py: Factor out _get_checksum_failure_max_tries W. Trevor King
2014-01-19 22:44         ` Alec Warner
2014-01-19 22:45           ` Alec Warner
2014-01-19 22:51             ` W. Trevor King
2014-01-19 22:52               ` Alec Warner
2014-01-19 22:14       ` [gentoo-portage-dev] [PATCH v2 2/3] pym/portage/package/ebuild/fetch.py: Factor out _get_fetch_resume_size W. Trevor King
2014-01-19 22:14       ` [gentoo-portage-dev] [PATCH v2 3/3] pym/portage/package/ebuild/fetch.py: Factor out _get_uris W. Trevor King
2014-01-20  3:26       ` [gentoo-portage-dev] [PATCH v3 0/4] Initial fetch() refactoring W. Trevor King
2014-01-20  3:26         ` [gentoo-portage-dev] [PATCH v3 1/4] pym/portage/package/ebuild/fetch.py: Factor out _get_checksum_failure_max_tries W. Trevor King
2014-01-20  3:26         ` [gentoo-portage-dev] [PATCH v3 2/4] pym/portage/package/ebuild/fetch.py: Factor out _get_fetch_resume_size W. Trevor King
2014-01-20  3:26         ` W. Trevor King [this message]
2014-01-21  2:57           ` [gentoo-portage-dev] Re: [PATCH v3 3/4] pym/portage/package/ebuild/fetch.py: Factor out _get_uris W. Trevor King
2014-01-20  3:26         ` [gentoo-portage-dev] [PATCH v3 4/4] pym/portage/package/ebuild/fetch.py: Flatten conditionals in _get_fetch_resume_size W. Trevor King
2014-01-22  5:35         ` [gentoo-portage-dev] [PATCH v3 0/4] Initial fetch() refactoring Mike Frysinger
2014-01-22 16:10           ` W. Trevor King
2014-01-22 19:00             ` Mike Frysinger
2014-01-27  4:00         ` [gentoo-portage-dev] " W. Trevor King
2014-01-19 21:22 ` [gentoo-portage-dev] [PATCH 0/3] " Sebastian Luther
2014-01-19 22:45   ` Alexander Berntsen
2014-01-19 22:46     ` Alec Warner
2014-01-19 22:50       ` Alexander Berntsen
2014-01-19 22:54         ` Alec Warner
2014-01-19 23:51           ` Alexander Berntsen
2014-01-19 23:53             ` Alec Warner
2014-01-19 23:11       ` W. Trevor King
2014-01-22  5:34       ` Mike Frysinger
2014-01-19 21:39 ` Sebastian Luther
2014-01-19 22:46   ` W. Trevor King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2b4c89a8be60c042d9afd2fb4151b9736afe36ac.1390187967.git.wking@tremily.us \
    --to=wking@tremily.us \
    --cc=gentoo-portage-dev@lists.gentoo.org \
    --cc=rafaelmartins@gentoo.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox