From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from lists.gentoo.org (pigeon.gentoo.org [208.92.234.80]) by finch.gentoo.org (Postfix) with ESMTP id C6B4F59CAF for ; Fri, 8 Apr 2016 06:22:04 +0000 (UTC) Received: from pigeon.gentoo.org (localhost [127.0.0.1]) by pigeon.gentoo.org (Postfix) with SMTP id CF481E08A0; Fri, 8 Apr 2016 06:22:02 +0000 (UTC) Received: from smtp.gentoo.org (smtp.gentoo.org [140.211.166.183]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by pigeon.gentoo.org (Postfix) with ESMTPS id 1ED47E089F for ; Fri, 8 Apr 2016 06:22:02 +0000 (UTC) Received: from [192.168.0.20] (ip68-5-185-102.oc.oc.cox.net [68.5.185.102]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) (Authenticated sender: zmedico) by smtp.gentoo.org (Postfix) with ESMTPSA id C7F88340C53 for ; Fri, 8 Apr 2016 06:22:00 +0000 (UTC) Subject: Re: [gentoo-portage-dev] [PATCH] emerge: add --search-fuzzy and --search-fuzzy-cutoff options (bug 65566) To: gentoo-portage-dev@lists.gentoo.org References: <1459746182-13420-1-git-send-email-zmedico@gentoo.org> <57022835.2060304@gentoo.org> From: Zac Medico Message-ID: <57074E05.4030202@gentoo.org> Date: Thu, 7 Apr 2016 23:21:57 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.7.0 Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Id: Gentoo Linux mail X-BeenThere: gentoo-portage-dev@lists.gentoo.org Reply-to: gentoo-portage-dev@lists.gentoo.org MIME-Version: 1.0 In-Reply-To: <57022835.2060304@gentoo.org> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Archives-Salt: 1d40b87c-347d-46bf-a14b-32de61f0add3 X-Archives-Hash: 24cacdda88ffe6d922f881631b767ab2 On 04/04/2016 01:39 AM, Alexander Berntsen wrote: > This is a great idea! Yeah, we should have done this sooner. The search index makes our search function so much nicer, so that gave me some incentive to continue improving it. > > > On 04/04/16 07:03, Zac Medico wrote: >> +.BR "\-\-search\-fuzzy [ y | n ]" >> +Enable or disable fuzzy search for search actions. > This is likely a good place to briefly explain what a "fuzzy search" > is. Okay, will do. > Also, I'm not sold on "seach-fuzzy" as opposed to "fuzzy-search". Is > there a particular reasoning for it? Since we don't seem to have a > standardised "verbs mean this, nouns mean this" anyway, I would use > the latter phrase. Okay, that will work for me. > You also need to document your note on regexes. Will do. > Lastly, you also need to document that a fuzzy search is slower than a > regular search. Will do. >> +.TP >> +.BR "\-\-search\-fuzzy\-cutoff CUTOFF" >> +Set similarity ratio cutoff (a floating-point number between 0 and 1). >> +Results with similarity ratios lower than the cutoff are discarded. >> +This option has no effect unless the \fB\-\-search\-fuzzy\fR option >> +is enabled. > This explanation is a bit heavy to read. And I think that using 0 to 1 > isn't very nice. And calling the number "floating point" instead of > decimal isn't very useful nor nice. How about making it a percentage, > and describing it simply as a similarity percentage -- "package names > must be at least N% similar to the search term to appear in search > results". The option could then be called --seach-fuzzy-similarity, > or (in keeping with the previous suggestion) > --fuzzy-search-similarity, or -- wait for it -- something similar. ;) Okay, that will work for me. > Of course if you agree with this, you'll have to reverse the code to > represent which results to show, rather than which ones to not show. Reverse? You want it to measure dissimilarity? Not sure what you mean. > You should also document here what happens if there's a mistake in the > input. > >> + "--search-fuzzy-cutoff": { >> + "help": "Set similarity ratio cutoff (a floating-point number between 0 and 1)", >> + "action": "store" >> + }, > See comments above regarding how to explain what this actually does. Yeah, the N% similar thing. >> + if myoptions.search_fuzzy_cutoff: >> + try: >> + fuzzy_cutoff = float(myoptions.search_fuzzy_cutoff) >> + except ValueError: >> + fuzzy_cutoff = 0.0 > Is this a reasonable fallback? I guess so... but you need to mention > it in the manpage, as mentioned. It's not supposed to be a fallback, but rather a failure path. It triggers an error message and unsuccessful exit. >> + >> + if fuzzy_cutoff <= 0.0: >> + fuzzy_cutoff = None >> + if not silent: >> + parser.error("Invalid --search-fuzzy-cutoff parameter: '%s'\n" % \ >> + (myoptions.search_fuzzy_cutoff,)) >> + >> + myoptions.search_fuzzy_cutoff = fuzzy_cutoff >> + > I also don't understand why the first one is just 0.0, but this one > is an error. Why aren't both either errors and revert to 0.8 cut-off > (or 80% similarity) or 0.0/100? I just want it to fail if the input is invalid. > And this needs to go in the manpage too. > >> + self.fuzzy_cutoff = 0.8 if fuzzy_cutoff is None else fuzzy_cutoff > See above. > >> + fuzzy = False > Here's an interesting discussion: maybe this should be True? After > all, it's True in any modern search engine. What do you think? Yeah, I agree. >> + # Fuzzy search does not support regular expressions, therefore >> + # it is disabled for regular expression searches. > Manpage. Will do. -- Thanks, Zac