public inbox for gentoo-portage-dev@lists.gentoo.org
 help / color / mirror / Atom feed
From: Zac Medico <zmedico@gentoo.org>
To: gentoo-portage-dev@lists.gentoo.org
Subject: Re: [gentoo-portage-dev] [PATCH] emerge: add --search-fuzzy and --search-fuzzy-cutoff options (bug 65566)
Date: Thu, 7 Apr 2016 23:21:57 -0700	[thread overview]
Message-ID: <57074E05.4030202@gentoo.org> (raw)
In-Reply-To: <57022835.2060304@gentoo.org>

On 04/04/2016 01:39 AM, Alexander Berntsen wrote:
> This is a great idea!

Yeah, we should have done this sooner. The search index makes our search
function so much nicer, so that gave me some incentive to continue
improving it.

> 
> 
> On 04/04/16 07:03, Zac Medico wrote:
>> +.BR "\-\-search\-fuzzy [ y | n ]"
>> +Enable or disable fuzzy search for search actions.
> This is likely a good place to briefly explain what a "fuzzy search"
> is.

Okay, will do.

> Also, I'm not sold on "seach-fuzzy" as opposed to "fuzzy-search". Is
> there a particular reasoning for it? Since we don't seem to have a
> standardised "verbs mean this, nouns mean this" anyway, I would use
> the latter phrase.

Okay, that will work for me.

> You also need to document your note on regexes.

Will do.

> Lastly, you also need to document that a fuzzy search is slower than a
> regular search.

Will do.

>> +.TP
>> +.BR "\-\-search\-fuzzy\-cutoff CUTOFF"
>> +Set similarity ratio cutoff (a floating-point number between 0 and 1).
>> +Results with similarity ratios lower than the cutoff are discarded.
>> +This option has no effect unless the \fB\-\-search\-fuzzy\fR option
>> +is enabled.
> This explanation is a bit heavy to read. And I think that using 0 to 1
> isn't very nice. And calling the number "floating point" instead of
> decimal isn't very useful nor nice. How about making it a percentage,
> and describing it simply as a similarity percentage -- "package names
> must be at least N% similar to the search term to appear in search
> results". The option could then be called --seach-fuzzy-similarity,
> or (in keeping with the previous suggestion)
> --fuzzy-search-similarity, or -- wait for it -- something similar. ;)

Okay, that will work for me.

> Of course if you agree with this, you'll have to reverse the code to
> represent which results to show, rather than which ones to not show.

Reverse? You want it to measure dissimilarity? Not sure what you mean.

> You should also document here what happens if there's a mistake in the
> input.
> 
>> +		"--search-fuzzy-cutoff": {
>> +			"help": "Set similarity ratio cutoff (a floating-point number between 0 and 1)",
>> +			"action": "store"
>> +		},
> See comments above regarding how to explain what this actually does.

Yeah, the N% similar thing.

>> +	if myoptions.search_fuzzy_cutoff:
>> +		try:
>> +			fuzzy_cutoff = float(myoptions.search_fuzzy_cutoff)
>> +		except ValueError:
>> +			fuzzy_cutoff = 0.0
> Is this a reasonable fallback? I guess so... but you need to mention
> it in the manpage, as mentioned.

It's not supposed to be a fallback, but rather a failure path. It
triggers an error message and unsuccessful exit.

>> +
>> +		if fuzzy_cutoff <= 0.0:
>> +			fuzzy_cutoff = None
>> +			if not silent:
>> +				parser.error("Invalid --search-fuzzy-cutoff parameter: '%s'\n" % \
>> +					(myoptions.search_fuzzy_cutoff,))
>> +
>> +		myoptions.search_fuzzy_cutoff = fuzzy_cutoff
>> +
> I also don't understand why the first one is just 0.0, but this one
> is an error. Why aren't both either errors and revert to 0.8 cut-off
> (or 80% similarity) or 0.0/100?

I just want it to fail if the input is invalid.

> And this needs to go in the manpage too.
> 
>> +		self.fuzzy_cutoff = 0.8 if fuzzy_cutoff is None else fuzzy_cutoff
> See above.
> 
>> +		fuzzy = False
> Here's an interesting discussion: maybe this should be True? After
> all, it's True in any modern search engine. What do you think?

Yeah, I agree.

>> +			# Fuzzy search does not support regular expressions, therefore
>> +			# it is disabled for regular expression searches.
> Manpage.

Will do.
-- 
Thanks,
Zac


  reply	other threads:[~2016-04-08  6:22 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-04-04  5:03 [gentoo-portage-dev] [PATCH] emerge: add --search-fuzzy and --search-fuzzy-cutoff options (bug 65566) Zac Medico
2016-04-04  8:39 ` Alexander Berntsen
2016-04-08  6:21   ` Zac Medico [this message]
2016-04-08 11:33     ` Alexander Berntsen
2016-07-25  2:58       ` Zac Medico

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=57074E05.4030202@gentoo.org \
    --to=zmedico@gentoo.org \
    --cc=gentoo-portage-dev@lists.gentoo.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox