From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from lists.gentoo.org (pigeon.gentoo.org [208.92.234.80]) by finch.gentoo.org (Postfix) with ESMTP id 7EEB459CAF for ; Mon, 4 Apr 2016 08:39:25 +0000 (UTC) Received: from pigeon.gentoo.org (localhost [127.0.0.1]) by pigeon.gentoo.org (Postfix) with SMTP id 81810E0807; Mon, 4 Apr 2016 08:39:23 +0000 (UTC) Received: from smtp.gentoo.org (smtp.gentoo.org [140.211.166.183]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by pigeon.gentoo.org (Postfix) with ESMTPS id CE115E0801 for ; Mon, 4 Apr 2016 08:39:22 +0000 (UTC) Received: from [128.39.168.195] (ka-195.studby.hig.no [128.39.168.195]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) (Authenticated sender: bernalex) by smtp.gentoo.org (Postfix) with ESMTPSA id 73791340C7E for ; Mon, 4 Apr 2016 08:39:21 +0000 (UTC) Subject: Re: [gentoo-portage-dev] [PATCH] emerge: add --search-fuzzy and --search-fuzzy-cutoff options (bug 65566) To: gentoo-portage-dev@lists.gentoo.org References: <1459746182-13420-1-git-send-email-zmedico@gentoo.org> From: Alexander Berntsen X-Enigmail-Draft-Status: N1110 Message-ID: <57022835.2060304@gentoo.org> Date: Mon, 4 Apr 2016 10:39:17 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.5.0 Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Id: Gentoo Linux mail X-BeenThere: gentoo-portage-dev@lists.gentoo.org Reply-to: gentoo-portage-dev@lists.gentoo.org MIME-Version: 1.0 In-Reply-To: <1459746182-13420-1-git-send-email-zmedico@gentoo.org> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Archives-Salt: a2f6411f-240a-4aaf-b293-53234bf03384 X-Archives-Hash: 0dff7e9d15e3de9d39c424c10ca19591 -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512 This is a great idea! On 04/04/16 07:03, Zac Medico wrote: > +.BR "\-\-search\-fuzzy [ y | n ]" > +Enable or disable fuzzy search for search actions. This is likely a good place to briefly explain what a "fuzzy search" is. Also, I'm not sold on "seach-fuzzy" as opposed to "fuzzy-search". Is there a particular reasoning for it? Since we don't seem to have a standardised "verbs mean this, nouns mean this" anyway, I would use the latter phrase. You also need to document your note on regexes. Lastly, you also need to document that a fuzzy search is slower than a regular search. > +.TP > +.BR "\-\-search\-fuzzy\-cutoff CUTOFF" > +Set similarity ratio cutoff (a floating-point number between 0 and 1). > +Results with similarity ratios lower than the cutoff are discarded. > +This option has no effect unless the \fB\-\-search\-fuzzy\fR option > +is enabled. This explanation is a bit heavy to read. And I think that using 0 to 1 isn't very nice. And calling the number "floating point" instead of decimal isn't very useful nor nice. How about making it a percentage, and describing it simply as a similarity percentage -- "package names must be at least N% similar to the search term to appear in search results". The option could then be called --seach-fuzzy-similarity, or (in keeping with the previous suggestion) - --fuzzy-search-similarity, or -- wait for it -- something similar. ;) Of course if you agree with this, you'll have to reverse the code to represent which results to show, rather than which ones to not show. You should also document here what happens if there's a mistake in the input. > + "--search-fuzzy-cutoff": { > + "help": "Set similarity ratio cutoff (a floating-point number between 0 and 1)", > + "action": "store" > + }, See comments above regarding how to explain what this actually does. > + if myoptions.search_fuzzy_cutoff: > + try: > + fuzzy_cutoff = float(myoptions.search_fuzzy_cutoff) > + except ValueError: > + fuzzy_cutoff = 0.0 Is this a reasonable fallback? I guess so... but you need to mention it in the manpage, as mentioned. > + > + if fuzzy_cutoff <= 0.0: > + fuzzy_cutoff = None > + if not silent: > + parser.error("Invalid --search-fuzzy-cutoff parameter: '%s'\n" % \ > + (myoptions.search_fuzzy_cutoff,)) > + > + myoptions.search_fuzzy_cutoff = fuzzy_cutoff > + I also don't understand why the first one is just 0.0, but this one is an error. Why aren't both either errors and revert to 0.8 cut-off (or 80% similarity) or 0.0/100? And this needs to go in the manpage too. > + self.fuzzy_cutoff = 0.8 if fuzzy_cutoff is None else fuzzy_cutoff See above. > + fuzzy = False Here's an interesting discussion: maybe this should be True? After all, it's True in any modern search engine. What do you think? > + # Fuzzy search does not support regular expressions, therefore > + # it is disabled for regular expression searches. Manpage. - -- Alexander bernalex@gentoo.org https://secure.plaimi.net/~alexander -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAEBCgAGBQJXAig0AAoJENQqWdRUGk8BOOEQAIEYXkn86ibMiYhN5BBDlsL1 2a6zBOCzygTkpxiBg+8vPsWJcHmzyTO7M6H1x3bUCY/JEfWq0354WdvNMtDM5qZk zpwIg0uPs/Q4Fo40hozHsc66f+jqZxgmy5rML2mO8cAFZANZdNtuvTkVQYF5zQXz 4CI06tVDwXmYAmg7wIBEpWJ8O+is2F1abzPJcr42tLz5ELYm1IRn4Em8WO5m5klm mrYWWeesvNS1l2y8kbKCmtpQbSuzLYfFyVfFkSL/p6t16Tiu7edqGJ0HOrq5B5dx +cwuT+vwbTtA8d/Qo/cifbyuxnNtO8JthhEvemAdCYkDC4DQHDStsKFjA+Za1Sos r/eSQexXNOQ/oMgksm72aX9rIkfurtn73AhIthKEnzrzou3pVW+H5eHR25vF58EO qHUJO9/Z8ZkHec3HopxFtYng16i26VlW2pDehdkWGVoZSXomaOyH7x7XQXZoE7B+ 4e4vDOMbeIvxyA/j1+H35WBZCu6f9FstOrEptD5FIE6/QM4oAW+CBllUQf5iQVEB 4Rpodu2AvKWgqTTOMLcn9+HK8JgnbMlm6cYLT+YXP7j6OnJFB6yq5/L3dfS5rrEX sxwrvVTTx2dCbX/RImQoMpEIQFaTfimZgKQDw3rmtv+JfP3OnpdOrN+QJJfHbCgb 4c9suzs/UTBLbtiFQhdO =XsDv -----END PGP SIGNATURE-----