Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Fzf?


Fuzzy search is not the same. For instance, it might by default match not only “FooBar” and “foo_bar” but also e.g. “FooQux(BarQuux)”, which in a large code base might mean hundreds of false positives.


Ideally there'd be some sort of ranking or scoring that would happen to sort by. FooQux(BarQuux) would seemingly rank much lower then FooBar when searching for FooBar or "Foo Bar" but might still be useful in results if ranked and displayed lower.


Indeed, that's a good solution – and I believe e.g. fzf does some sort of ranking by default. The devil is however in the details:

One minor inconvenience is that the scoring should ideally be different per filetype. For instance, Python would count "foo-bar" as two symbols ("foo minus bar") whereas Lisp would count it was one symbol, and that should ideally result in different scores when searching for "foobar" in both. Similarly, foo(bar) should ideally have a lower different score than "foo_bar" for symbol search even though the keywords are separated by the same number of characters.

I think this can be accomodated by keeping a per-language list of symbols and associated "penalties", which can be used to calculate "how far" keywords are from each other in the search results weighted by language semantics :)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: