Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I wrote it up. The full system reference is here: https://blakecrosley.com/guides/obsidian — vault architecture, hybrid retrieval (Model2Vec + FTS5 + RRF), MCP integration, incremental indexing, operational patterns. Covers everything from a 200-file vault to the 16,000-file setup I run.

The hybrid retriever piece has its own deep dive with the RRF math and an interactive fusion calculator: https://blakecrosley.com/blog/hybrid-retriever-obsidian

See what your coding agent thinks of it and let me know if you have ways to improve it.



I implemented this as well successfully. Re structured data i transformed it from JSON into more "natural language". Also ended up using MiniLM-L6-v2. Will post GitHub link when i have packaged it independently (currently in main app code, want to extract into independent micro-service)

You wrote:

>A search for “review configuration” matches every JSON file with a review key.

Its good point, not sure how to de-rank the keys or to encode the "commonness" of those words


IDF handles most of it. In BM25, inverse document frequency naturally down-weights terms that appear in every document, so JSON keys like "id", "status", "type" that show up in every chunk get low IDF scores automatically. The rare, meaningful keys still rank.

For the remaining noise, I chunk the flattened key-paths separately from the values. The key-path goes into a metadata field that BM25 indexes but with lower weight. The value goes into the main content field. So a search for "review configuration" matches on the value side, not because "configuration" appeared as a JSON key in 500 files.

MiniLM-L6-v2 is solid. I went with Model2Vec (potion-base-8M) for the speed tradeoff. 50-500x faster on CPU, 89% of MiniLM quality on MTEB. For a microservice where you're embedding on every request, the latency difference matters more than the quality gap.


Thank you !



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: