More

nvdnadj92 · 2025-11-09T23:11:56 1762729916

Amazing! I was planning on making a tool just like this, will take yours for a spin!

igor47 · 2025-11-10T17:22:18 1762795338

Thanks, let me know what you think!

nvdnadj92 · 2025-11-09T22:52:24 1762728744

Building a set of experiments that explores LLMs visual understanding of your photos to learn about you, especially given the recent learnings from deepseek-OCR. Part of the experiments delve into storing the memories with GraphRAG so they can be effectively retrieved without losing too information.

nvdnadj92 · 2025-11-01T00:01:04 1761955264

Laptop: Apple M2 Max, 32GB memory (2023)

Setup:

Terminal:

- Ghostty + Starship for modern terminal experience

- Homebrew to install system packages

IDE:

- Zed (can connect to local models via LM-Studio server)

- also experimenting with warp.dev

LLMs:

- LM-studio as open-source model playground

- GPT-OSS 20B

- QWEN3-Coder-30B-AEB-quantized-4bit

- Gemma3-12B

Other utilities:

- Rectangle.app (window tile manager)

- Wispr.flow - create voice notes

- Obsidian - track markdown notes

nvdnadj92 · 2025-10-30T22:39:35 1761863975

After reading this post and the readme, I'm not convinced that this is solving a real, observed problem. You outline an example with the long-term coaching mentorship, but why or how is your solution preferable to telling Claude to maintain a set of notes and observations about you, similar to https://github.com/heyitsnoah/claudesidian?

the jazz metaphors do not help provide additional context.

neurobloom · 2025-11-13T13:03:56 1763039036

Fair feedback. Claudesidian is a productivity system where you organize knowledge and Claude assists. StoryKeeper is relational infrastructure that maintains emotional continuity across AI sessions and agent handoffs. Different layers of the stack, both valuable. I'll update the docs to make this clearer — appreciate the push for concreteness.

nvdnadj92 · 2025-10-26T16:33:00 1761496380

In case you need conversational data for the experiment you want to try, I developed an open-source cli tool [1] that create transcripts from voice chats on discord. Feel free to try it out!

[1] https://github.com/naveedn/audio-transcriber

nvdnadj92 · 2025-10-16T22:55:40 1760655340

Agree with the first half of the article, but every example the author pointed out predates AI. What are examples of companies that have been founded in the past 3 years and prove the authors point that the data model is the definitive edge?

dafelst · 2025-10-17T01:47:07 1760665627

What does AI have to do with anything here?

ako · 2025-10-17T06:19:50 1760681990

Just had a chat with AI to see how we could address the issues mentioned in the article. You can create models that cater to multiple use cases. You can split the domain model into facts (tables) and perspectives (views). This gives you a lot of flexibility in addressing the different perspectives presented in the artcile from a shared domain model.

nvdnadj92 · 2025-10-15T13:00:45 1760533245

I vibecoded a similar app. Here’s the open source link, if folks want to build their own:

https://github.com/naveedn/audio-transcriber

rezivor · 2025-10-15T13:16:38 1760534198

Slower

nvdnadj92 · 2025-10-15T13:51:57 1760536317

Yes, but by a negligible margin. My program is designed for multi-track audio, which means I run this in parallel on multiple 3 hour recordings, and get results in 12 minutes.

You haven’t shared any architectural details. What model? What size? How can anyone be sure that what you’re building is truly offline?

ramon156 · 2025-10-15T13:19:45 1760534385

Yours isn't OSS, meaning I have no idea what I'm running

rezivor · 2025-10-15T13:23:39 1760534619

OSS would be incredibly slow, also seems like overkill for this use case

mpeg · 2025-10-15T13:41:44 1760535704

I was going to buy the app, but these responses are putting me off massively. How would making it OSS slow it down?

kamranjon · 2025-10-15T14:12:06 1760537526

I suspect, from the responses of the creator here, that this app they are selling is likely violating a number of open source licenses…

user- · 2025-10-15T15:34:34 1760542474

the obnoxious site deisgn and comments like this stopped me from clicking buy in the apple store

fl_rn_st · 2025-10-15T13:26:13 1760534773

What does that even mean? Why would OSS make it slower? Why would it be an overkill? This is not Producthunt, you have to give at least some kind of explanation for your claims.

konart · 2025-10-15T14:45:36 1760539536

OSS as in open source software. Not Open Sound System. Just in case.

ideashower · 2025-10-15T15:32:11 1760542331

Can you back up your claim that it's slow?

nvdnadj92 · 2025-10-13T00:10:57 1760314257

I wanted to build my own speech-to-text transcription program [1] for Discord, similar to how zoom or google hangouts works. I built it so that I can record my group's DND sessions and build applications / tools for VTTs (Virtual TableTop gaming).

It can process a set of 3-hour audio files in ~20 mins.

I recorded a demo video of how it works here: https://www.youtube.com/watch?v=v0KZGyJARts&t=300s

[1] https://github.com/naveedn/audio-transcriber

I alluded to building this tool on a previous HN thread: https://news.ycombinator.com/item?id=45338694

rudderdev · 2025-10-13T01:58:57 1760320737

I have found a hack. If you wait long enough, someone will build what you wanted to build :)

Thanks for building this. I am trying to set it up but facing this issu

> `torch` (v2.3.1) only has wheels for the following platforms: `manylinux1_x86_64`, `manylinux2014_aarch64`, `macosx_11_0_arm64`, `win_amd64`

nvdnadj92 · 2025-10-13T02:53:07 1760323987

Ah lovely! I’d be happy to assist, create an issue on GitHub and we can go from there!

nvdnadj92 · 2025-09-22T20:28:55 1758572935

I would suggest 2 speaker-diarization libraries:

- https://huggingface.co/pyannote/speaker-diarization-3.1 - https://github.com/narcotic-sh/senko

I personally love senko since it can run in seconds, whereas py-annote took hours, but there is a 10% WER (word error rate) that is tough to get around.

nvdnadj92 · 2025-09-22T19:59:27 1758571167

I'm working on the same project myself and was planning to write a blog post similar to the author's. However, I'll share some additional tips and tricks that really made a difference for me.

For preprocessing, I found it best to convert files to a 16kHz WAV format for optimal processing. I also add low-pass and high-pass filters to remove non-speech sounds. To avoid hallucinations, I run Silero VAD on the entire audio file to find timestamps where there's a speaker. A side note on this: Silero requires careful tuning to prevent audio segments from being chopped up and clipped. I also use a post-processing step to merge adjacent VAD chunks, which helps ensure cohesive Whisper recordings.

For the Whisper task, I run Whisper in small audio chunks that correspond to the VAD timestamps. Otherwise, it will hallucinate during silences and regurgitate the passed-in prompt. If you're on a Mac, use the whisper-mlx models from Hugging Face to speed up transcription. I ran a performance benchmark, and it made a 22x difference to use a model designed for the Apple Neural Engine.

For post-processing, I've found that running the generated SRT files through ChatGPT to identify and remove hallucination chunks has a better yield.

adzm · 2025-09-23T00:26:33 1758587193

I added EQ to a task after reading this and got much more accurate and consistent results using whisper, thanks for the obvious in retrospect tip.

bnmoch3 · 2025-09-23T07:31:56 1758612716

Please can you share the prompt you use in ChatGPT to remove hallucination chunks

eevmanu · 2025-09-23T14:57:11 1758639431

If I understood correctly, VAD has superior results than using ffmpeg silencedetect + silentremove, right?

I think latest version of ffmpeg could use whisper with VAD[1], but I still need to explore how with a simple PoC script

I'd love to know more about the post-processing prompt, my guess is that looks like an improved version of `semantic correction` prompt[2], but I may be wrong ¯\_(ツ)_/¯ .

[1] https://ffmpeg.org/ffmpeg-filters.html#toc-whisper-1

[2] https://gist.github.com/eevmanu/0de2d449144e9cd40a563170b459...