public inbox for gentoo-portage-dev@lists.gentoo.org
 help / color / mirror / Atom feed
From: Zac Medico <zmedico@gentoo.org>
To: gentoo-portage-dev@lists.gentoo.org
Cc: Zac Medico <zmedico@gentoo.org>
Subject: [gentoo-portage-dev] [PATCH 5/5] Add emerge --search-index option.
Date: Sat,  1 Nov 2014 15:46:23 -0700	[thread overview]
Message-ID: <1414881983-19877-6-git-send-email-zmedico@gentoo.org> (raw)
In-Reply-To: <1414881983-19877-1-git-send-email-zmedico@gentoo.org>

The new emerge --search-index option, which is enabled by default,
causes pkg_desc_index to be used for search optimization. The search
index needs to be regenerated by egencache after changes are made to
a repository (see the --update-pkg-desc-index action).

For users that would like to modify ebuilds in a repository without
running egencache afterwards, emerge --search-index=n can be used to
get non-indexed search. Alternatively, the user could simply remove
the stale index file, in order to disable the search index for a
particular repository.

In order to conserve memory, indices are read as streams, and
MultiIterGroupBy is used to group results from IndexedPortdb and
IndexedVardb. Stream-oriented search also makes it possible to
display search results incrementally (fixing bug #412471).

X-Gentoo-Bug: 525718
X-Gentoo-Bug-URL: https://bugs.gentoo.org/show_bug.cgi?id=525718
---
 man/emerge.1            |   8 ++++
 pym/_emerge/actions.py  |   3 +-
 pym/_emerge/depgraph.py |   2 +-
 pym/_emerge/main.py     |   5 +++
 pym/_emerge/search.py   | 112 ++++++++++++++++++++++++++++++++++--------------
 5 files changed, 95 insertions(+), 35 deletions(-)

diff --git a/man/emerge.1 b/man/emerge.1
index bbe71ac..7bcdd9a 100644
--- a/man/emerge.1
+++ b/man/emerge.1
@@ -796,6 +796,14 @@ If ebuilds using EAPIs which \fIdo not\fR support \fBHDEPEND\fR are built in
 the same \fBemerge\fR run as those using EAPIs which \fIdo\fR support
 \fBHDEPEND\fR, this option affects only the former.
 .TP
+.BR "\-\-search\-index < y | n >"
+Enable or disable indexed search for search actions. This option is
+enabled by default. The search index needs to be regenerated by
+\fBegencache\fR(1) after changes are made to a repository (see the
+\fB\-\-update\-pkg\-desc\-index\fR action). This setting can be added
+to \fBEMERGE_DEFAULT_OPTS\fR (see \fBmake.conf\fR(5)) and later
+overridden via the command line.
+.TP
 .BR "\-\-select [ y | n ] (\-w short option)"
 Add specified packages to the world set (inverse of
 \fB\-\-oneshot\fR). This is useful if you want to
diff --git a/pym/_emerge/actions.py b/pym/_emerge/actions.py
index 48b0826..8a22ab5 100644
--- a/pym/_emerge/actions.py
+++ b/pym/_emerge/actions.py
@@ -2015,7 +2015,8 @@ def action_search(root_config, myopts, myfiles, spinner):
 		searchinstance = search(root_config,
 			spinner, "--searchdesc" in myopts,
 			"--quiet" not in myopts, "--usepkg" in myopts,
-			"--usepkgonly" in myopts)
+			"--usepkgonly" in myopts,
+			search_index = myopts.get("--search-index", "y") != "n")
 		for mysearch in myfiles:
 			try:
 				searchinstance.execute(mysearch)
diff --git a/pym/_emerge/depgraph.py b/pym/_emerge/depgraph.py
index 78b9236..2fbb7ce 100644
--- a/pym/_emerge/depgraph.py
+++ b/pym/_emerge/depgraph.py
@@ -8596,7 +8596,7 @@ def ambiguous_package_name(arg, atoms, root_config, spinner, myopts):
 
 	s = search(root_config, spinner, "--searchdesc" in myopts,
 		"--quiet" not in myopts, "--usepkg" in myopts,
-		"--usepkgonly" in myopts)
+		"--usepkgonly" in myopts, search_index = False)
 	null_cp = portage.dep_getkey(insert_category_into_atom(
 		arg, "null"))
 	cat, atom_pn = portage.catsplit(null_cp)
diff --git a/pym/_emerge/main.py b/pym/_emerge/main.py
index cf7966c..c08e12a 100644
--- a/pym/_emerge/main.py
+++ b/pym/_emerge/main.py
@@ -616,6 +616,11 @@ def parse_opts(tmpcmdline, silent=False):
 			"choices" :("True", "rdeps")
 		},
 
+		"--search-index": {
+			"help": "Enable or disable indexed search (enabled by default)",
+			"choices": y_or_n
+		},
+
 		"--select": {
 			"shortopt" : "-w",
 			"help"    : "add specified packages to the world set " + \
diff --git a/pym/_emerge/search.py b/pym/_emerge/search.py
index 4b0fd9f..acde3bd 100644
--- a/pym/_emerge/search.py
+++ b/pym/_emerge/search.py
@@ -7,9 +7,12 @@ import re
 import portage
 from portage import os
 from portage.dbapi.porttree import _parse_uri_map
+from portage.dbapi.IndexedPortdb import IndexedPortdb
+from portage.dbapi.IndexedVardb import IndexedVardb
 from portage.localization import localized_size
 from portage.output import  bold, bold as white, darkgreen, green, red
 from portage.util import writemsg_stdout
+from portage.util.iterators.MultiIterGroupBy import MultiIterGroupBy
 
 from _emerge.Package import Package
 
@@ -25,15 +28,17 @@ class search(object):
 	# public interface
 	#
 	def __init__(self, root_config, spinner, searchdesc,
-		verbose, usepkg, usepkgonly):
+		verbose, usepkg, usepkgonly, search_index = True):
 		"""Searches the available and installed packages for the supplied search key.
 		The list of available and installed packages is created at object instantiation.
 		This makes successive searches faster."""
 		self.settings = root_config.settings
-		self.vartree = root_config.trees["vartree"]
-		self.spinner = spinner
 		self.verbose = verbose
 		self.searchdesc = searchdesc
+		self.searchkey = None
+		# Disable the spinner since search results are displayed
+		# incrementally.
+		self.spinner = None
 		self.root_config = root_config
 		self.setconfig = root_config.setconfig
 		self.matches = {"pkg" : []}
@@ -45,6 +50,10 @@ class search(object):
 		bindb = root_config.trees["bintree"].dbapi
 		vardb = root_config.trees["vartree"].dbapi
 
+		if search_index:
+			portdb = IndexedPortdb(portdb)
+			vardb = IndexedVardb(vardb)
+
 		if not usepkgonly and portdb._have_root_eclass_dir:
 			self._dbs.append(portdb)
 
@@ -53,16 +62,23 @@ class search(object):
 
 		self._dbs.append(vardb)
 		self._portdb = portdb
+		self._vardb = vardb
 
 	def _spinner_update(self):
 		if self.spinner:
 			self.spinner.update()
 
 	def _cp_all(self):
-		cp_all = set()
+		iterators = []
 		for db in self._dbs:
-			cp_all.update(db.cp_all())
-		return list(sorted(cp_all))
+			i = db.cp_all()
+			try:
+				i = iter(i)
+			except TypeError:
+				pass
+			iterators.append(i)
+		for group in MultiIterGroupBy(iterators):
+			yield group[0]
 
 	def _aux_get(self, *args, **kwargs):
 		for db in self._dbs:
@@ -97,7 +113,7 @@ class search(object):
 		return {}
 
 	def _visible(self, db, cpv, metadata):
-		installed = db is self.vartree.dbapi
+		installed = db is self._vardb
 		built = installed or db is not self._portdb
 		pkg_type = "ebuild"
 		if installed:
@@ -171,8 +187,11 @@ class search(object):
 
 	def execute(self,searchkey):
 		"""Performs the search for the supplied search key"""
+		self.searchkey = searchkey
+
+	def _iter_search(self):
+
 		match_category = 0
-		self.searchkey=searchkey
 		self.packagematches = []
 		if self.searchdesc:
 			self.searchdesc=1
@@ -181,6 +200,7 @@ class search(object):
 			self.searchdesc=0
 			self.matches = {"pkg":[], "set":[]}
 		print("Searching...   ", end=' ')
+		print()
 
 		regexsearch = False
 		if self.searchkey.startswith('%'):
@@ -206,8 +226,24 @@ class search(object):
 			if self.searchre.search(match_string):
 				if not self._xmatch("match-visible", package):
 					masked=1
-				self.matches["pkg"].append([package,masked])
+				yield ("pkg", package, masked)
 			elif self.searchdesc: # DESCRIPTION searching
+				# Check for DESCRIPTION match first, so that we can skip
+				# the expensive visiblity check if it doesn't match.
+				full_package = self._xmatch("match-all", package)
+				if not full_package:
+					continue
+				full_package = full_package[-1]
+				try:
+					full_desc = self._aux_get(
+						full_package, ["DESCRIPTION"])[0]
+				except KeyError:
+					portage.writemsg(
+						"emerge: search: aux_get() failed, skipping\n",
+						noiselevel=-1)
+					continue
+				if not self.searchre.search(full_desc):
+					continue
 				full_package = self._xmatch("bestmatch-visible", package)
 				if not full_package:
 					#no match found; we don't want to query description
@@ -217,14 +253,8 @@ class search(object):
 						continue
 					else:
 						masked=1
-				try:
-					full_desc = self._aux_get(
-						full_package, ["DESCRIPTION"])[0]
-				except KeyError:
-					print("emerge: search: aux_get() failed, skipping")
-					continue
-				if self.searchre.search(full_desc):
-					self.matches["desc"].append([full_package,masked])
+
+				yield ("desc", full_package, masked)
 
 		self.sdict = self.setconfig.getSets()
 		for setname in self.sdict:
@@ -235,16 +265,11 @@ class search(object):
 				match_string = setname.split("/")[-1]
 			
 			if self.searchre.search(match_string):
-				self.matches["set"].append([setname, False])
+				yield ("set", setname, False)
 			elif self.searchdesc:
 				if self.searchre.search(
 					self.sdict[setname].getMetadata("DESCRIPTION")):
-					self.matches["set"].append([setname, False])
-			
-		self.mlen=0
-		for mtype in self.matches:
-			self.matches[mtype].sort()
-			self.mlen += len(self.matches[mtype])
+					yield ("set", setname, False)
 
 	def addCP(self, cp):
 		if not self._xmatch("match-all", cp):
@@ -257,17 +282,32 @@ class search(object):
 
 	def output(self):
 		"""Outputs the results of the search."""
-		msg = []
+
+		class msg(object):
+			@staticmethod
+			def append(msg):
+				writemsg_stdout(msg, noiselevel=-1)
+
 		msg.append("\b\b  \n[ Results for search key : " + \
 			bold(self.searchkey) + " ]\n")
-		msg.append("[ Applications found : " + \
-			bold(str(self.mlen)) + " ]\n\n")
-		vardb = self.vartree.dbapi
+		vardb = self._vardb
 		metadata_keys = set(Package.metadata_keys)
 		metadata_keys.update(["DESCRIPTION", "HOMEPAGE", "LICENSE", "SRC_URI"])
 		metadata_keys = tuple(metadata_keys)
-		for mtype in self.matches:
-			for match,masked in self.matches[mtype]:
+
+		if self.searchkey is None:
+			# Handle results added via addCP
+			addCP_matches = []
+			for mytype, (match, masked) in self.matches.items():
+				addCP_matches.append(mytype, match, masked)
+			iterator = iter(addCP_matches)
+
+		else:
+			# Do a normal search
+			iterator = self._iter_search()
+
+		for mtype, match, masked in iterator:
+				self.mlen += 1
 				full_package = None
 				if mtype == "pkg":
 					full_package = self._xmatch(
@@ -367,12 +407,19 @@ class search(object):
 							+ "   " + desc + "\n")
 						msg.append("      " + darkgreen("License:") + \
 							"       " + license + "\n\n")
-		writemsg_stdout(''.join(msg), noiselevel=-1)
+
+		msg.append("[ Applications found : " + \
+			bold(str(self.mlen)) + " ]\n\n")
+
 	#
 	# private interface
 	#
 	def getInstallationStatus(self,package):
-		installed_package = self.vartree.dep_bestmatch(package)
+		installed_package = self._vardb.match(package)
+		if installed_package:
+			installed_package = installed_package[-1]
+		else:
+			installed_package = ""
 		result = ""
 		version = self.getVersion(installed_package,search.VERSION_RELEASE)
 		if len(version) > 0:
@@ -391,4 +438,3 @@ class search(object):
 		else:
 			result = ""
 		return result
-
-- 
2.0.4



  parent reply	other threads:[~2014-11-01 22:47 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-10-18  3:28 [gentoo-portage-dev] [PATCH] emerge --search: use description index Zac Medico
2014-10-18  5:59 ` [gentoo-portage-dev] " Zac Medico
2014-10-19 21:51   ` Zac Medico
2014-10-23  8:55     ` Brian Dolbec
2014-10-23  9:22       ` Zac Medico
2014-11-01  6:15         ` Zac Medico
2014-11-01 22:46 ` [gentoo-portage-dev] Zac Medico
2014-11-01 22:46   ` [gentoo-portage-dev] [PATCH 1/5] Add egencache --update-pkg-desc-index action Zac Medico
2014-11-04  9:03     ` [gentoo-portage-dev] [PATCH 1/5 v2] " Zac Medico
2014-11-01 22:46   ` [gentoo-portage-dev] [PATCH 2/5] Add IndexStreamIterator and MultiIterGroupBy Zac Medico
2014-11-02  0:18     ` Zac Medico
2014-11-02 22:50     ` [gentoo-portage-dev] [PATCH 2/5 v3] " Zac Medico
2014-11-03  3:07     ` [gentoo-portage-dev] [PATCH 2/5 v4] " Zac Medico
2014-11-01 22:46   ` [gentoo-portage-dev] [PATCH 3/5] Add IndexedPortdb class Zac Medico
2014-11-04  5:07     ` [gentoo-portage-dev] [PATCH 3/5 v2] " Zac Medico
2014-11-04 20:34       ` [gentoo-portage-dev] [PATCH 3/5 v3] " Zac Medico
2014-11-01 22:46   ` [gentoo-portage-dev] [PATCH 4/5] Add IndexedVardb class Zac Medico
2014-11-05  9:59     ` [gentoo-portage-dev] " Zac Medico
2014-11-07  8:45       ` [gentoo-portage-dev] [PATCH] Log changes between vdb_metadata.pickle updates Zac Medico
2014-11-07 16:51         ` Brian Dolbec
2014-11-07 20:17           ` Zac Medico
2014-11-08  9:16         ` [gentoo-portage-dev] [PATCH v2] " Zac Medico
2014-11-01 22:46   ` Zac Medico [this message]
2014-11-01 23:04     ` [gentoo-portage-dev] [PATCH 5/5] Add emerge --search-index option Zac Medico
2014-11-04  5:42       ` [gentoo-portage-dev] [PATCH 5/5 v3] " Zac Medico
2014-11-04  9:10         ` [gentoo-portage-dev] " Zac Medico
2014-11-04 22:09     ` [gentoo-portage-dev] [PATCH 5/5 v4] " Zac Medico
2014-11-03 21:42   ` [gentoo-portage-dev] Brian Dolbec
2014-11-04  9:19     ` [gentoo-portage-dev] Zac Medico

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1414881983-19877-6-git-send-email-zmedico@gentoo.org \
    --to=zmedico@gentoo.org \
    --cc=gentoo-portage-dev@lists.gentoo.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox