public inbox for gentoo-portage-dev@lists.gentoo.org
 help / color / mirror / Atom feed
* [gentoo-portage-dev] [PATCH] portage.cache: write md5 instead of mtime (bug 568934)
@ 2016-07-10  6:51 Zac Medico
  2016-07-10 19:44 ` [gentoo-portage-dev] [PATCH v2] " Zac Medico
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Zac Medico @ 2016-07-10  6:51 UTC (permalink / raw
  To: gentoo-portage-dev; +Cc: Zac Medico

Change cache modules to write md5 in cache entries, instead of mtime.
Since portage-2.2.27, the relevant cache modules have had the ability
to read cache entries containing either md5 or mtime, therefore this
change is backward-compatible with portage-2.2.27 and later.

Also, fix the reconstruct_eclasses function to raise CacheCorruption
when the specified chf_type is md5 and the cache entry contains mtime
data. This is needed so that the cache module chf_types attributes can
list md5 before mtime, without having mtime data be incorrectly
interpreted as md5 data.

X-Gentoo-Bug: 568934
X-Gentoo-Bug-url: https://bugs.gentoo.org/show_bug.cgi?id=568934
---
 pym/portage/cache/anydbm.py    |  4 ++--
 pym/portage/cache/flat_hash.py |  4 ++--
 pym/portage/cache/sqlite.py    |  4 ++--
 pym/portage/cache/template.py  | 19 +++++++++++++++----
 4 files changed, 21 insertions(+), 10 deletions(-)

diff --git a/pym/portage/cache/anydbm.py b/pym/portage/cache/anydbm.py
index 80d24e5..88d85b0 100644
--- a/pym/portage/cache/anydbm.py
+++ b/pym/portage/cache/anydbm.py
@@ -36,8 +36,8 @@ from portage.cache import cache_errors
 
 class database(fs_template.FsBased):
 
-	validation_chf = 'mtime'
-	chf_types = ('mtime', 'md5')
+	validation_chf = 'md5'
+	chf_types = ('md5', 'mtime')
 
 	autocommits = True
 	cleanse_keys = True
diff --git a/pym/portage/cache/flat_hash.py b/pym/portage/cache/flat_hash.py
index cca0f10..3a899c0 100644
--- a/pym/portage/cache/flat_hash.py
+++ b/pym/portage/cache/flat_hash.py
@@ -163,5 +163,5 @@ class md5_database(database):
 
 
 class mtime_md5_database(database):
-	validation_chf = 'mtime'
-	chf_types = ('mtime', 'md5')
+	validation_chf = 'md5'
+	chf_types = ('md5', 'mtime')
diff --git a/pym/portage/cache/sqlite.py b/pym/portage/cache/sqlite.py
index 32e4076..69150f6 100644
--- a/pym/portage/cache/sqlite.py
+++ b/pym/portage/cache/sqlite.py
@@ -18,8 +18,8 @@ if sys.hexversion >= 0x3000000:
 
 class database(fs_template.FsBased):
 
-	validation_chf = 'mtime'
-	chf_types = ('mtime', 'md5')
+	validation_chf = 'md5'
+	chf_types = ('md5', 'mtime')
 
 	autocommits = False
 	synchronous = False
diff --git a/pym/portage/cache/template.py b/pym/portage/cache/template.py
index a7c6de0..021c706 100644
--- a/pym/portage/cache/template.py
+++ b/pym/portage/cache/template.py
@@ -310,6 +310,18 @@ def serialize_eclasses(eclass_dict, chf_type='mtime', paths=True):
 		for k, v in sorted(eclass_dict.items(), key=_keysorter))
 
 
+def _md5_deserializer(md5):
+	if len(md5) != 32:
+		raise ValueError('expected 32 hex digits')
+	return md5
+
+
+_chf_deserializers = {
+	'md5': _md5_deserializer,
+	'mtime': long,
+}
+
+
 def reconstruct_eclasses(cpv, eclass_string, chf_type='mtime', paths=True):
 	"""returns a dict when handed a string generated by serialize_eclasses"""
 	eclasses = eclass_string.rstrip().lstrip().split("\t")
@@ -317,9 +329,7 @@ def reconstruct_eclasses(cpv, eclass_string, chf_type='mtime', paths=True):
 		# occasionally this occurs in the fs backends.  they suck.
 		return {}
 
-	converter = _unicode
-	if chf_type == 'mtime':
-		converter = long
+	converter = _chf_deserializers.get(chf_type, lambda x: x)
 
 	if paths:
 		if len(eclasses) % 3 != 0:
@@ -340,6 +350,7 @@ def reconstruct_eclasses(cpv, eclass_string, chf_type='mtime', paths=True):
 		raise cache_errors.CacheCorruption(cpv,
 			"_eclasses_ was of invalid len %i" % len(eclasses))
 	except ValueError:
-		raise cache_errors.CacheCorruption(cpv, "_eclasses_ mtime conversion to long failed")
+		raise cache_errors.CacheCorruption(cpv,
+			"_eclasses_ not valid for chf_type {}".format(chf_type))
 	del eclasses
 	return d
-- 
2.7.4



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [gentoo-portage-dev] [PATCH v2] portage.cache: write md5 instead of mtime (bug 568934)
  2016-07-10  6:51 [gentoo-portage-dev] [PATCH] portage.cache: write md5 instead of mtime (bug 568934) Zac Medico
@ 2016-07-10 19:44 ` Zac Medico
  2016-07-10 20:18 ` [gentoo-portage-dev] [PATCH v3] " Zac Medico
  2016-07-12 17:18 ` [gentoo-portage-dev] [PATCH v4] " Zac Medico
  2 siblings, 0 replies; 8+ messages in thread
From: Zac Medico @ 2016-07-10 19:44 UTC (permalink / raw
  To: gentoo-portage-dev; +Cc: Zac Medico

Change cache modules to write md5 in cache entries, instead of mtime.
Since portage-2.2.27, the relevant cache modules have had the ability
to read cache entries containing either md5 or mtime, therefore this
change is backward-compatible with portage-2.2.27 and later.

Also fix the reconstruct_eclasses function to raise CacheCorruption
when the specified chf_type is md5 and the cache entry contains mtime
data, and optimize __getitem__ to skip reconstruct_eclasses calls when
the entry appears to have a different chf_type.

X-Gentoo-Bug: 568934
X-Gentoo-Bug-url: https://bugs.gentoo.org/show_bug.cgi?id=568934
---
[PATCH v2] adds a __getitem__ optimization to skip reconstruct_eclasses
calls when the entry appears to have a different chf_type

 pym/portage/cache/anydbm.py    |  4 ++--
 pym/portage/cache/flat_hash.py |  4 ++--
 pym/portage/cache/sqlite.py    |  4 ++--
 pym/portage/cache/template.py  | 23 +++++++++++++++++++----
 4 files changed, 25 insertions(+), 10 deletions(-)

diff --git a/pym/portage/cache/anydbm.py b/pym/portage/cache/anydbm.py
index 80d24e5..88d85b0 100644
--- a/pym/portage/cache/anydbm.py
+++ b/pym/portage/cache/anydbm.py
@@ -36,8 +36,8 @@ from portage.cache import cache_errors
 
 class database(fs_template.FsBased):
 
-	validation_chf = 'mtime'
-	chf_types = ('mtime', 'md5')
+	validation_chf = 'md5'
+	chf_types = ('md5', 'mtime')
 
 	autocommits = True
 	cleanse_keys = True
diff --git a/pym/portage/cache/flat_hash.py b/pym/portage/cache/flat_hash.py
index cca0f10..3a899c0 100644
--- a/pym/portage/cache/flat_hash.py
+++ b/pym/portage/cache/flat_hash.py
@@ -163,5 +163,5 @@ class md5_database(database):
 
 
 class mtime_md5_database(database):
-	validation_chf = 'mtime'
-	chf_types = ('mtime', 'md5')
+	validation_chf = 'md5'
+	chf_types = ('md5', 'mtime')
diff --git a/pym/portage/cache/sqlite.py b/pym/portage/cache/sqlite.py
index 32e4076..69150f6 100644
--- a/pym/portage/cache/sqlite.py
+++ b/pym/portage/cache/sqlite.py
@@ -18,8 +18,8 @@ if sys.hexversion >= 0x3000000:
 
 class database(fs_template.FsBased):
 
-	validation_chf = 'mtime'
-	chf_types = ('mtime', 'md5')
+	validation_chf = 'md5'
+	chf_types = ('md5', 'mtime')
 
 	autocommits = False
 	synchronous = False
diff --git a/pym/portage/cache/template.py b/pym/portage/cache/template.py
index a7c6de0..24d8f8f 100644
--- a/pym/portage/cache/template.py
+++ b/pym/portage/cache/template.py
@@ -54,6 +54,10 @@ class database(object):
 
 		if self.serialize_eclasses and "_eclasses_" in d:
 			for chf_type in chf_types:
+				if '_%s_' % chf_type not in d:
+					# Skip the reconstruct_eclasses call, since this
+					# entry appears to have a different chf_type.
+					continue
 				try:
 					d["_eclasses_"] = reconstruct_eclasses(cpv, d["_eclasses_"],
 						chf_type, paths=self.store_eclass_paths)
@@ -310,6 +314,18 @@ def serialize_eclasses(eclass_dict, chf_type='mtime', paths=True):
 		for k, v in sorted(eclass_dict.items(), key=_keysorter))
 
 
+def _md5_deserializer(md5):
+	if len(md5) != 32:
+		raise ValueError('expected 32 hex digits')
+	return md5
+
+
+_chf_deserializers = {
+	'md5': _md5_deserializer,
+	'mtime': long,
+}
+
+
 def reconstruct_eclasses(cpv, eclass_string, chf_type='mtime', paths=True):
 	"""returns a dict when handed a string generated by serialize_eclasses"""
 	eclasses = eclass_string.rstrip().lstrip().split("\t")
@@ -317,9 +333,7 @@ def reconstruct_eclasses(cpv, eclass_string, chf_type='mtime', paths=True):
 		# occasionally this occurs in the fs backends.  they suck.
 		return {}
 
-	converter = _unicode
-	if chf_type == 'mtime':
-		converter = long
+	converter = _chf_deserializers.get(chf_type, lambda x: x)
 
 	if paths:
 		if len(eclasses) % 3 != 0:
@@ -340,6 +354,7 @@ def reconstruct_eclasses(cpv, eclass_string, chf_type='mtime', paths=True):
 		raise cache_errors.CacheCorruption(cpv,
 			"_eclasses_ was of invalid len %i" % len(eclasses))
 	except ValueError:
-		raise cache_errors.CacheCorruption(cpv, "_eclasses_ mtime conversion to long failed")
+		raise cache_errors.CacheCorruption(cpv,
+			"_eclasses_ not valid for chf_type {}".format(chf_type))
 	del eclasses
 	return d
-- 
2.7.4



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [gentoo-portage-dev] [PATCH v3] portage.cache: write md5 instead of mtime (bug 568934)
  2016-07-10  6:51 [gentoo-portage-dev] [PATCH] portage.cache: write md5 instead of mtime (bug 568934) Zac Medico
  2016-07-10 19:44 ` [gentoo-portage-dev] [PATCH v2] " Zac Medico
@ 2016-07-10 20:18 ` Zac Medico
  2016-07-12 13:59   ` Alexander Berntsen
  2016-07-12 17:18 ` [gentoo-portage-dev] [PATCH v4] " Zac Medico
  2 siblings, 1 reply; 8+ messages in thread
From: Zac Medico @ 2016-07-10 20:18 UTC (permalink / raw
  To: gentoo-portage-dev; +Cc: Zac Medico

Change cache modules to write md5 in cache entries, instead of mtime.
Since portage-2.2.27, the relevant cache modules have had the ability
to read cache entries containing either md5 or mtime, therefore this
change is backward-compatible with portage-2.2.27 and later.

Also fix the reconstruct_eclasses function to raise CacheCorruption
when the specified chf_type is md5 and the cache entry contains mtime
data, and optimize __getitem__ to skip reconstruct_eclasses calls when
the entry appears to have a different chf_type.

X-Gentoo-Bug: 568934
X-Gentoo-Bug-url: https://bugs.gentoo.org/show_bug.cgi?id=568934
---
[PATCH v3] fixes the __getitem__ optimization to ensure that
CacheCorruption is raised if a cache entry does not contain a
recognized chf_type

 pym/portage/cache/anydbm.py    |  4 ++--
 pym/portage/cache/flat_hash.py |  4 ++--
 pym/portage/cache/sqlite.py    |  4 ++--
 pym/portage/cache/template.py  | 26 ++++++++++++++++++++++----
 4 files changed, 28 insertions(+), 10 deletions(-)

diff --git a/pym/portage/cache/anydbm.py b/pym/portage/cache/anydbm.py
index 80d24e5..88d85b0 100644
--- a/pym/portage/cache/anydbm.py
+++ b/pym/portage/cache/anydbm.py
@@ -36,8 +36,8 @@ from portage.cache import cache_errors
 
 class database(fs_template.FsBased):
 
-	validation_chf = 'mtime'
-	chf_types = ('mtime', 'md5')
+	validation_chf = 'md5'
+	chf_types = ('md5', 'mtime')
 
 	autocommits = True
 	cleanse_keys = True
diff --git a/pym/portage/cache/flat_hash.py b/pym/portage/cache/flat_hash.py
index cca0f10..3a899c0 100644
--- a/pym/portage/cache/flat_hash.py
+++ b/pym/portage/cache/flat_hash.py
@@ -163,5 +163,5 @@ class md5_database(database):
 
 
 class mtime_md5_database(database):
-	validation_chf = 'mtime'
-	chf_types = ('mtime', 'md5')
+	validation_chf = 'md5'
+	chf_types = ('md5', 'mtime')
diff --git a/pym/portage/cache/sqlite.py b/pym/portage/cache/sqlite.py
index 32e4076..69150f6 100644
--- a/pym/portage/cache/sqlite.py
+++ b/pym/portage/cache/sqlite.py
@@ -18,8 +18,8 @@ if sys.hexversion >= 0x3000000:
 
 class database(fs_template.FsBased):
 
-	validation_chf = 'mtime'
-	chf_types = ('mtime', 'md5')
+	validation_chf = 'md5'
+	chf_types = ('md5', 'mtime')
 
 	autocommits = False
 	synchronous = False
diff --git a/pym/portage/cache/template.py b/pym/portage/cache/template.py
index a7c6de0..d292eed 100644
--- a/pym/portage/cache/template.py
+++ b/pym/portage/cache/template.py
@@ -54,6 +54,10 @@ class database(object):
 
 		if self.serialize_eclasses and "_eclasses_" in d:
 			for chf_type in chf_types:
+				if '_%s_' % chf_type not in d:
+					# Skip the reconstruct_eclasses call, since this
+					# entry appears to have a different chf_type.
+					continue
 				try:
 					d["_eclasses_"] = reconstruct_eclasses(cpv, d["_eclasses_"],
 						chf_type, paths=self.store_eclass_paths)
@@ -62,6 +66,9 @@ class database(object):
 						raise
 				else:
 					break
+			else:
+				raise cache_errors.CacheCorruption(cpv,
+					'entry does not contain a recognized chf_type')
 
 		elif "_eclasses_" not in d:
 			d["_eclasses_"] = {}
@@ -310,6 +317,18 @@ def serialize_eclasses(eclass_dict, chf_type='mtime', paths=True):
 		for k, v in sorted(eclass_dict.items(), key=_keysorter))
 
 
+def _md5_deserializer(md5):
+	if len(md5) != 32:
+		raise ValueError('expected 32 hex digits')
+	return md5
+
+
+_chf_deserializers = {
+	'md5': _md5_deserializer,
+	'mtime': long,
+}
+
+
 def reconstruct_eclasses(cpv, eclass_string, chf_type='mtime', paths=True):
 	"""returns a dict when handed a string generated by serialize_eclasses"""
 	eclasses = eclass_string.rstrip().lstrip().split("\t")
@@ -317,9 +336,7 @@ def reconstruct_eclasses(cpv, eclass_string, chf_type='mtime', paths=True):
 		# occasionally this occurs in the fs backends.  they suck.
 		return {}
 
-	converter = _unicode
-	if chf_type == 'mtime':
-		converter = long
+	converter = _chf_deserializers.get(chf_type, lambda x: x)
 
 	if paths:
 		if len(eclasses) % 3 != 0:
@@ -340,6 +357,7 @@ def reconstruct_eclasses(cpv, eclass_string, chf_type='mtime', paths=True):
 		raise cache_errors.CacheCorruption(cpv,
 			"_eclasses_ was of invalid len %i" % len(eclasses))
 	except ValueError:
-		raise cache_errors.CacheCorruption(cpv, "_eclasses_ mtime conversion to long failed")
+		raise cache_errors.CacheCorruption(cpv,
+			"_eclasses_ not valid for chf_type {}".format(chf_type))
 	del eclasses
 	return d
-- 
2.7.4



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [gentoo-portage-dev] [PATCH v3] portage.cache: write md5 instead of mtime (bug 568934)
  2016-07-10 20:18 ` [gentoo-portage-dev] [PATCH v3] " Zac Medico
@ 2016-07-12 13:59   ` Alexander Berntsen
  2016-07-12 16:30     ` Zac Medico
  0 siblings, 1 reply; 8+ messages in thread
From: Alexander Berntsen @ 2016-07-12 13:59 UTC (permalink / raw
  To: gentoo-portage-dev

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

The _chf_deserializers and _md5_deserializer stuff looks rather
overengineered. I don't know what the reconstruct_eclasses skipping
entails (the comment makes it seem like "skip this because apparently
it's different lol who knows -_o_-"). The rest of the patch lgtm.

- -- 
Alexander
bernalex@gentoo.org
https://secure.plaimi.net/~alexander
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iQIcBAEBCgAGBQJXhPfCAAoJENQqWdRUGk8BGD8QAN6DgO0LDe6L8+yFzljTS79k
ctrEvV+cm6Ti8crBuXzjgEi2hmSWwEbpFi/OjAA+8JuDVigqSOF1qh32UyhgAK2m
ugm9Vs6/ooQ6NqJu1xd5NF342ul06DNvsU9kKQsmoO8f03EmHRKlAxCFIs5UBl1P
0cd5ULg/dFANzpe2zKFDVk0YGgFmrN8X2nziosttb0MrgfMkAP712ZcEAtHMusWj
iKz0ByJcogTvuWJLKSoMQbU1EGm+/NjRB7mV3BN7LBoVarPmCt//s6jG7GkFTiNI
T6sDsn/rFOdFyiGmxXaZ+3ztv3z7WFHvHGzyyCqofJceYxjmaT1vk0itWYDACi7O
QJmsZ+EnL72z3i+J3AwONtqixBQkJ/Jpt7Ye/O2drRA8eHZ2wJODH2jnFONKvf75
v2JfnWy1X63SikNorsn9/WE4j00rky/0fA+0WR2anMW01B8cgZU/LhaoNzIsV696
3XNmwNjZDmhhngUfj/vEVgtpopOiG2m96Myq2opw1wXv8pI6OmQevaOCuLpOMmpm
yaRCcYNWRJ5QY0FQJJoIIqwMdiuXov+uQhRIQ6Im0THEOYKmwCwETFAPfXMETYn7
qTgj51RiK1NHl5mibojjcJJHTWHLg++XSEfuUJnlBL8GdApIUtC1dE4+BmYcVlxt
i3ojLlHlG8Gc/+TkaR8f
=lxGo
-----END PGP SIGNATURE-----


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [gentoo-portage-dev] [PATCH v3] portage.cache: write md5 instead of mtime (bug 568934)
  2016-07-12 13:59   ` Alexander Berntsen
@ 2016-07-12 16:30     ` Zac Medico
  0 siblings, 0 replies; 8+ messages in thread
From: Zac Medico @ 2016-07-12 16:30 UTC (permalink / raw
  To: gentoo-portage-dev

On 07/12/2016 06:59 AM, Alexander Berntsen wrote:
> The _chf_deserializers and _md5_deserializer stuff looks rather
> overengineered. 

That stuff is not strictly required after the addition of the
intelligent reconstruct_eclasses skipping in __getitem__. However, it's
still good to have because it protects against a subtle misbehavior of
reconstruct_eclasses, where it's called with chf_type='md5' and produces
an invalid data structure containing mtime strings (rather than mtime ints).

> I don't know what the reconstruct_eclasses skipping
> entails (the comment makes it seem like "skip this because apparently
> it's different lol who knows -_o_-").

The _eclasses_data contains either md5 or mtime. It's a waste of time to
try to call reconstruct_eclasses with chf_type='md5' when eclasses
contains mtime data (and it would also produce an invalid data structure
in the absence of the _chf_deserializers and _md5_deserializer stuff).
So, it's nice to take the presence of _md5_ or _mtime_ in the cache
entry as a hint about whether _eclasses_ contains md5 or mtime data.

> The rest of the patch lgtm.

I'll add the insights that I have discussed above as comments the patch.
-- 
Thanks,
Zac


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [gentoo-portage-dev] [PATCH v4] portage.cache: write md5 instead of mtime (bug 568934)
  2016-07-10  6:51 [gentoo-portage-dev] [PATCH] portage.cache: write md5 instead of mtime (bug 568934) Zac Medico
  2016-07-10 19:44 ` [gentoo-portage-dev] [PATCH v2] " Zac Medico
  2016-07-10 20:18 ` [gentoo-portage-dev] [PATCH v3] " Zac Medico
@ 2016-07-12 17:18 ` Zac Medico
  2016-07-13 11:16   ` Alexander Berntsen
  2 siblings, 1 reply; 8+ messages in thread
From: Zac Medico @ 2016-07-12 17:18 UTC (permalink / raw
  To: gentoo-portage-dev; +Cc: Zac Medico

Change cache modules to write md5 in cache entries, instead of mtime.
Since portage-2.2.27, the relevant cache modules have had the ability
to read cache entries containing either md5 or mtime, therefore this
change is backward-compatible with portage-2.2.27 and later.

Also fix the reconstruct_eclasses function to raise CacheCorruption
when the specified chf_type is md5 and the cache entry contains mtime
data, and optimize __getitem__ to skip reconstruct_eclasses calls when
the entry appears to have a different chf_type.

X-Gentoo-Bug: 568934
X-Gentoo-Bug-url: https://bugs.gentoo.org/show_bug.cgi?id=568934
---
[PATCH v4] adds some comments to clarify the purposes of the  __getitem__
optimization and _md5_deserializer stuff

 pym/portage/cache/anydbm.py    |  4 ++--
 pym/portage/cache/flat_hash.py |  4 ++--
 pym/portage/cache/sqlite.py    |  4 ++--
 pym/portage/cache/template.py  | 36 ++++++++++++++++++++++++++++++++----
 4 files changed, 38 insertions(+), 10 deletions(-)

diff --git a/pym/portage/cache/anydbm.py b/pym/portage/cache/anydbm.py
index 80d24e5..88d85b0 100644
--- a/pym/portage/cache/anydbm.py
+++ b/pym/portage/cache/anydbm.py
@@ -36,8 +36,8 @@ from portage.cache import cache_errors
 
 class database(fs_template.FsBased):
 
-	validation_chf = 'mtime'
-	chf_types = ('mtime', 'md5')
+	validation_chf = 'md5'
+	chf_types = ('md5', 'mtime')
 
 	autocommits = True
 	cleanse_keys = True
diff --git a/pym/portage/cache/flat_hash.py b/pym/portage/cache/flat_hash.py
index cca0f10..3a899c0 100644
--- a/pym/portage/cache/flat_hash.py
+++ b/pym/portage/cache/flat_hash.py
@@ -163,5 +163,5 @@ class md5_database(database):
 
 
 class mtime_md5_database(database):
-	validation_chf = 'mtime'
-	chf_types = ('mtime', 'md5')
+	validation_chf = 'md5'
+	chf_types = ('md5', 'mtime')
diff --git a/pym/portage/cache/sqlite.py b/pym/portage/cache/sqlite.py
index 32e4076..69150f6 100644
--- a/pym/portage/cache/sqlite.py
+++ b/pym/portage/cache/sqlite.py
@@ -18,8 +18,8 @@ if sys.hexversion >= 0x3000000:
 
 class database(fs_template.FsBased):
 
-	validation_chf = 'mtime'
-	chf_types = ('mtime', 'md5')
+	validation_chf = 'md5'
+	chf_types = ('md5', 'mtime')
 
 	autocommits = False
 	synchronous = False
diff --git a/pym/portage/cache/template.py b/pym/portage/cache/template.py
index a7c6de0..8662d85 100644
--- a/pym/portage/cache/template.py
+++ b/pym/portage/cache/template.py
@@ -54,6 +54,15 @@ class database(object):
 
 		if self.serialize_eclasses and "_eclasses_" in d:
 			for chf_type in chf_types:
+				if '_%s_' % chf_type not in d:
+					# Skip the reconstruct_eclasses call, since it's
+					# a waste of time if it contains a different chf_type
+					# than the current one. In the past, it was possible
+					# for reconstruct_eclasses called with chf_type='md5'
+					# to "successfully" return invalid data here, because
+					# it was unable to distinguish between md5 data and
+					# mtime data.
+					continue
 				try:
 					d["_eclasses_"] = reconstruct_eclasses(cpv, d["_eclasses_"],
 						chf_type, paths=self.store_eclass_paths)
@@ -62,6 +71,9 @@ class database(object):
 						raise
 				else:
 					break
+			else:
+				raise cache_errors.CacheCorruption(cpv,
+					'entry does not contain a recognized chf_type')
 
 		elif "_eclasses_" not in d:
 			d["_eclasses_"] = {}
@@ -310,6 +322,23 @@ def serialize_eclasses(eclass_dict, chf_type='mtime', paths=True):
 		for k, v in sorted(eclass_dict.items(), key=_keysorter))
 
 
+def _md5_deserializer(md5):
+	"""
+	Without this validation, it's possible for reconstruct_eclasses to
+	mistakenly interpret mtime data as md5 data, and return an invalid
+	data structure containing strings where ints are expected.
+	"""
+	if len(md5) != 32:
+		raise ValueError('expected 32 hex digits')
+	return md5
+
+
+_chf_deserializers = {
+	'md5': _md5_deserializer,
+	'mtime': long,
+}
+
+
 def reconstruct_eclasses(cpv, eclass_string, chf_type='mtime', paths=True):
 	"""returns a dict when handed a string generated by serialize_eclasses"""
 	eclasses = eclass_string.rstrip().lstrip().split("\t")
@@ -317,9 +346,7 @@ def reconstruct_eclasses(cpv, eclass_string, chf_type='mtime', paths=True):
 		# occasionally this occurs in the fs backends.  they suck.
 		return {}
 
-	converter = _unicode
-	if chf_type == 'mtime':
-		converter = long
+	converter = _chf_deserializers.get(chf_type, lambda x: x)
 
 	if paths:
 		if len(eclasses) % 3 != 0:
@@ -340,6 +367,7 @@ def reconstruct_eclasses(cpv, eclass_string, chf_type='mtime', paths=True):
 		raise cache_errors.CacheCorruption(cpv,
 			"_eclasses_ was of invalid len %i" % len(eclasses))
 	except ValueError:
-		raise cache_errors.CacheCorruption(cpv, "_eclasses_ mtime conversion to long failed")
+		raise cache_errors.CacheCorruption(cpv,
+			"_eclasses_ not valid for chf_type {}".format(chf_type))
 	del eclasses
 	return d
-- 
2.7.4



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [gentoo-portage-dev] [PATCH v4] portage.cache: write md5 instead of mtime (bug 568934)
  2016-07-12 17:18 ` [gentoo-portage-dev] [PATCH v4] " Zac Medico
@ 2016-07-13 11:16   ` Alexander Berntsen
  2016-07-13 11:33     ` Zac Medico
  0 siblings, 1 reply; 8+ messages in thread
From: Alexander Berntsen @ 2016-07-13 11:16 UTC (permalink / raw
  To: gentoo-portage-dev

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

LGTM.

- -- 
Alexander
bernalex@gentoo.org
https://secure.plaimi.net/~alexander
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iQIcBAEBCgAGBQJXhiMFAAoJENQqWdRUGk8BcD8P/0Z39KZ8dOGOyjY1WmdSIG3p
n4WSsMb7wG/aoYUgFMPAUY7gKQ2zMB2ei3Yc0Y/fPs3z4pOJYJmKLSHlBpU0tamp
4xSlu61jgO9R/b2XEN1PuqTyzRvfUWhlYSZBMgF82maK3zYAgEwBkpnwb04gzxuO
imZ2e8r8IcUmxo0NSGLAkhB6YILo8iZMjGId2NRbukNisKdecKNX2nJAyJcIXnse
5yHFIf4LJbW9R72crtBvVovZZYTgptSW3E67zI1ZH3SRmulg6atypCFp0x/9N1m5
FrNDCl2IMGrOjeZ9mKbF1OcjENBTa2h984OMUCrKmDhGBYp1wfyREFTH02oreKHj
Ei13AzjBid9B+ySruz+nf/cjBCWdTBlimgca3Q3O9RVx8spVCVGqCRwomxi4kEsU
zk2HpQmMuLPLxR54516w9i1JR/fQmcZRqghIxVZ+SVNmKBk5vBUBviQLS5168Wov
LzG7eMCgm7FYsjpoL0PDnP6UCR0hyzGXiIX1Bxoc7tHKKEZBNT4hr8VtjkWW+jf2
5GAvSIfq1jjFGWhwrKh0vdqfLIUH91yvKc8ZPFTygDTJbhiqEpdwDqCryF7Y+teD
uX1Id0pJvhr8mX96DJnoBZhOxriHLxXtPTGyBN0OR2XXVR7o5L09zupOrv242mqq
9eF3TLZwBfZOdyn+13IO
=GvrT
-----END PGP SIGNATURE-----


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [gentoo-portage-dev] [PATCH v4] portage.cache: write md5 instead of mtime (bug 568934)
  2016-07-13 11:16   ` Alexander Berntsen
@ 2016-07-13 11:33     ` Zac Medico
  0 siblings, 0 replies; 8+ messages in thread
From: Zac Medico @ 2016-07-13 11:33 UTC (permalink / raw
  To: gentoo-portage-dev

On 07/13/2016 04:16 AM, Alexander Berntsen wrote:
> LGTM.
> 
> 

Pushed:

https://gitweb.gentoo.org/proj/portage.git/commit/?id=9abbda7d054761ae6c333d3e6d420632b9658b6d
-- 
Thanks,
Zac


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2016-07-13 11:33 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-07-10  6:51 [gentoo-portage-dev] [PATCH] portage.cache: write md5 instead of mtime (bug 568934) Zac Medico
2016-07-10 19:44 ` [gentoo-portage-dev] [PATCH v2] " Zac Medico
2016-07-10 20:18 ` [gentoo-portage-dev] [PATCH v3] " Zac Medico
2016-07-12 13:59   ` Alexander Berntsen
2016-07-12 16:30     ` Zac Medico
2016-07-12 17:18 ` [gentoo-portage-dev] [PATCH v4] " Zac Medico
2016-07-13 11:16   ` Alexander Berntsen
2016-07-13 11:33     ` Zac Medico

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox