public inbox for gentoo-commits@lists.gentoo.org
 help / color / mirror / Atom feed
From: "Zac Medico" <zmedico@gentoo.org>
To: gentoo-commits@lists.gentoo.org
Subject: [gentoo-commits] proj/portage:master commit in: pym/portage/cache/
Date: Wed, 13 Jul 2016 11:32:07 +0000 (UTC)	[thread overview]
Message-ID: <1468409374.9abbda7d054761ae6c333d3e6d420632b9658b6d.zmedico@gentoo> (raw)

commit:     9abbda7d054761ae6c333d3e6d420632b9658b6d
Author:     Zac Medico <zmedico <AT> gentoo <DOT> org>
AuthorDate: Sun Jul 10 06:11:41 2016 +0000
Commit:     Zac Medico <zmedico <AT> gentoo <DOT> org>
CommitDate: Wed Jul 13 11:29:34 2016 +0000
URL:        https://gitweb.gentoo.org/proj/portage.git/commit/?id=9abbda7d

portage.cache: write md5 instead of mtime (bug 568934)

Change cache modules to write md5 in cache entries, instead of mtime.
Since portage-2.2.27, the relevant cache modules have had the ability
to read cache entries containing either md5 or mtime, therefore this
change is backward-compatible with portage-2.2.27 and later.

Also fix the reconstruct_eclasses function to raise CacheCorruption
when the specified chf_type is md5 and the cache entry contains mtime
data, and optimize __getitem__ to skip reconstruct_eclasses calls when
the entry appears to have a different chf_type.

X-Gentoo-Bug: 568934
X-Gentoo-Bug-url: https://bugs.gentoo.org/show_bug.cgi?id=568934
Acked-by: Alexander Berntsen <bernalex <AT> gentoo.org>

 pym/portage/cache/anydbm.py    |  4 ++--
 pym/portage/cache/flat_hash.py |  4 ++--
 pym/portage/cache/sqlite.py    |  4 ++--
 pym/portage/cache/template.py  | 36 ++++++++++++++++++++++++++++++++----
 4 files changed, 38 insertions(+), 10 deletions(-)

diff --git a/pym/portage/cache/anydbm.py b/pym/portage/cache/anydbm.py
index 80d24e5..88d85b0 100644
--- a/pym/portage/cache/anydbm.py
+++ b/pym/portage/cache/anydbm.py
@@ -36,8 +36,8 @@ from portage.cache import cache_errors
 
 class database(fs_template.FsBased):
 
-	validation_chf = 'mtime'
-	chf_types = ('mtime', 'md5')
+	validation_chf = 'md5'
+	chf_types = ('md5', 'mtime')
 
 	autocommits = True
 	cleanse_keys = True

diff --git a/pym/portage/cache/flat_hash.py b/pym/portage/cache/flat_hash.py
index cca0f10..3a899c0 100644
--- a/pym/portage/cache/flat_hash.py
+++ b/pym/portage/cache/flat_hash.py
@@ -163,5 +163,5 @@ class md5_database(database):
 
 
 class mtime_md5_database(database):
-	validation_chf = 'mtime'
-	chf_types = ('mtime', 'md5')
+	validation_chf = 'md5'
+	chf_types = ('md5', 'mtime')

diff --git a/pym/portage/cache/sqlite.py b/pym/portage/cache/sqlite.py
index 32e4076..69150f6 100644
--- a/pym/portage/cache/sqlite.py
+++ b/pym/portage/cache/sqlite.py
@@ -18,8 +18,8 @@ if sys.hexversion >= 0x3000000:
 
 class database(fs_template.FsBased):
 
-	validation_chf = 'mtime'
-	chf_types = ('mtime', 'md5')
+	validation_chf = 'md5'
+	chf_types = ('md5', 'mtime')
 
 	autocommits = False
 	synchronous = False

diff --git a/pym/portage/cache/template.py b/pym/portage/cache/template.py
index a7c6de0..8662d85 100644
--- a/pym/portage/cache/template.py
+++ b/pym/portage/cache/template.py
@@ -54,6 +54,15 @@ class database(object):
 
 		if self.serialize_eclasses and "_eclasses_" in d:
 			for chf_type in chf_types:
+				if '_%s_' % chf_type not in d:
+					# Skip the reconstruct_eclasses call, since it's
+					# a waste of time if it contains a different chf_type
+					# than the current one. In the past, it was possible
+					# for reconstruct_eclasses called with chf_type='md5'
+					# to "successfully" return invalid data here, because
+					# it was unable to distinguish between md5 data and
+					# mtime data.
+					continue
 				try:
 					d["_eclasses_"] = reconstruct_eclasses(cpv, d["_eclasses_"],
 						chf_type, paths=self.store_eclass_paths)
@@ -62,6 +71,9 @@ class database(object):
 						raise
 				else:
 					break
+			else:
+				raise cache_errors.CacheCorruption(cpv,
+					'entry does not contain a recognized chf_type')
 
 		elif "_eclasses_" not in d:
 			d["_eclasses_"] = {}
@@ -310,6 +322,23 @@ def serialize_eclasses(eclass_dict, chf_type='mtime', paths=True):
 		for k, v in sorted(eclass_dict.items(), key=_keysorter))
 
 
+def _md5_deserializer(md5):
+	"""
+	Without this validation, it's possible for reconstruct_eclasses to
+	mistakenly interpret mtime data as md5 data, and return an invalid
+	data structure containing strings where ints are expected.
+	"""
+	if len(md5) != 32:
+		raise ValueError('expected 32 hex digits')
+	return md5
+
+
+_chf_deserializers = {
+	'md5': _md5_deserializer,
+	'mtime': long,
+}
+
+
 def reconstruct_eclasses(cpv, eclass_string, chf_type='mtime', paths=True):
 	"""returns a dict when handed a string generated by serialize_eclasses"""
 	eclasses = eclass_string.rstrip().lstrip().split("\t")
@@ -317,9 +346,7 @@ def reconstruct_eclasses(cpv, eclass_string, chf_type='mtime', paths=True):
 		# occasionally this occurs in the fs backends.  they suck.
 		return {}
 
-	converter = _unicode
-	if chf_type == 'mtime':
-		converter = long
+	converter = _chf_deserializers.get(chf_type, lambda x: x)
 
 	if paths:
 		if len(eclasses) % 3 != 0:
@@ -340,6 +367,7 @@ def reconstruct_eclasses(cpv, eclass_string, chf_type='mtime', paths=True):
 		raise cache_errors.CacheCorruption(cpv,
 			"_eclasses_ was of invalid len %i" % len(eclasses))
 	except ValueError:
-		raise cache_errors.CacheCorruption(cpv, "_eclasses_ mtime conversion to long failed")
+		raise cache_errors.CacheCorruption(cpv,
+			"_eclasses_ not valid for chf_type {}".format(chf_type))
 	del eclasses
 	return d


             reply	other threads:[~2016-07-13 11:32 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-07-13 11:32 Zac Medico [this message]
  -- strict thread matches above, loose matches on Subject: below --
2016-09-19 16:57 [gentoo-commits] proj/portage:master commit in: pym/portage/cache/ Zac Medico
2016-07-24 23:22 Zac Medico
2015-12-29 16:42 Zac Medico
2015-12-29 16:42 Zac Medico
2015-12-29 16:42 Zac Medico
2014-11-14 17:33 Zac Medico
2013-07-26  8:23 Arfrever Frehtes Taifersar Arahesis
2013-07-26  7:57 Zac Medico
2013-01-18 17:12 Zac Medico
2013-01-18 16:37 Zac Medico
2013-01-18 15:32 Zac Medico
2012-11-21  4:38 Zac Medico
2012-10-02 20:30 Zac Medico
2012-09-25  3:44 Zac Medico
2012-09-25  1:54 Zac Medico
2012-09-25  1:42 Zac Medico
2012-09-18 19:02 Zac Medico
2012-06-10  8:35 Zac Medico
2012-06-10  8:28 Zac Medico
2012-06-10  8:25 Zac Medico
2012-05-24 19:06 Zac Medico
2012-05-23 19:00 Zac Medico
2011-10-29 23:10 Zac Medico
2011-10-18  5:26 Zac Medico
2011-10-14 15:30 Zac Medico
2011-09-07 15:56 Zac Medico
2011-05-12 19:05 Zac Medico
2011-05-12 19:02 Zac Medico
2011-05-12 19:02 Zac Medico
2011-02-08  6:37 Zac Medico
2011-02-07  0:14 Zac Medico

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1468409374.9abbda7d054761ae6c333d3e6d420632b9658b6d.zmedico@gentoo \
    --to=zmedico@gentoo.org \
    --cc=gentoo-commits@lists.gentoo.org \
    --cc=gentoo-dev@lists.gentoo.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox