Cool, well let me know when Opus 4.5 level performance is available locally, at ...

am17an · 2026-05-11T07:50:46 1778485846

Local models embody the hacker spirit, constant Claude glazing is spiritually incompatible with tinkering. Don't upload your spirit to the cloud.

stingraycharles · 2026-05-11T11:43:51 1778499831

That’s like saying cloud computing is spiritually incompatible with tinkering.

xyzal · 2026-05-11T12:36:25 1778502985

It is. You can't tinker with someone else's machine.

seizethecheese · 2026-05-11T15:27:56 1778513276

You can’t tinker with someone else’s machine but you can tinker through it.

captainbland · 2026-05-11T11:37:25 1778499445

They may well do but in practice if you want to embody the hacker spirit, the best thing is to hack rather than trying to get some clearly inadequate local LLM to do it.

thot_experiment · 2026-05-11T18:09:54 1778522994

Yo, MTP for Qwen is sick, thank you! Your work is invaluable.

AlienRobot · 2026-05-11T11:20:43 1778498443

But if I run the model locally I have to pay for it, whereas with Claude I can, oh wait I just hit my 5 hour free limit with 2 messages.

Aurornis · 2026-05-11T02:35:33 1778466933

I experiment a lot with local models, and I agree.

I have a lot of fun with the local models and seeing what they can do.

I appreciate the SOTA models even more after my local experiments. The local models are really impressive these days, but the gap to SOTA is huge for complex tasks.

ThunderSizzle · 2026-05-16T00:27:01 1778891221

What if you split it into less complex tasks? E.g. use the model to help decompose the task into parts, then help it iterate through it.

Gives you more control over the outcome and more steering anyway.

janalsncm · 2026-05-11T05:27:59 1778477279

Reasoning over a large codebase is only one use case for large models. For the use cases in the article (summarizing, classifying, basic text rewrites) most phones can handle them just fine.

binyu · 2026-05-11T02:18:35 1778465915

DeepSeek V4 with 1 million token context window is pretty powerful, although still not there. There's hope that Opus 4.5 level performance locally is not that far away.

Aurornis · 2026-05-11T02:36:53 1778467013

Running DeepSeek V4 without extreme quantization locally requires a lot of hardware.

The IQ2 quants that fit into 128GB machines are very degraded.

binyu · 2026-05-11T02:40:45 1778467245

That is true, it is a 1.6T parameters model so it requires a great deal of memory. I also heard there's a 2bit quantization that works well on Apple metal.

tuananh · 2026-05-11T02:34:53 1778466893

From what I read, ds v4 is very close with opus 4.6 performance.

DeathArrow · 2026-05-11T05:40:28 1778478028

The full model is, not the quantized versions.

tuananh · 2026-05-11T06:30:37 1778481037

yeah that goes without saying. how can openweight, quantized version beat SOTA :)

array_key_first · 2026-05-11T16:38:19 1778517499

Well it depends on the task. For agentic coding, more is more, but for tasks that normal consumers use them for there really is a ceiling. OCR, text to speech, that type of thing doesn't really improve when going to a SOTA model, so you'd just be wasting your money. I think local LLMs have more value than software engineers give them credit for.

tuananh · 2026-05-12T03:03:04 1778554984

totally agree with that. local llm doesn't need to match SOTA performance in order to be useful.

agnishom · 2026-05-11T06:15:07 1778480107

The article is not about those use cases. There are plenty of use cases for which local models are already pretty good

moffkalast · 2026-05-11T10:58:00 1778497080

Should be relatively quickly, 1-2 years for local models to catch up to today's SOTA.

Of course then you'll be asking "uhh lemme know when Opus 6.8 level performance is available locally". People are never happy.

Gemma 4 and Qwen 3.6 are legit beast models that would steamroll every API offering from 2 years ago.

storus · 2026-05-11T02:26:05 1778466365

Depending on a task, there are already models matching Opus 4.5. Just not in everything. But you can always swap a local model for a particular task.

stingraycharles · 2026-05-11T11:43:14 1778499794

Opus is probably somewhere in the 5TB parameter range and needs terabytes of GPU memory.

The economics of running SOTA locally just does not make sense, because you’re not using it 24/7 at 80%+ utilization while the cloud based providers can.

thefounder · 2026-05-11T02:19:15 1778465955

Next year there will be Opus 4.5 level available on open source models so theoretically you may be able to run it locally but in reality it will be too expensive (i.e maybe 2 x max Studio 512GB ram each) for “normal” users.

bugglebeetle · 2026-05-11T02:30:30 1778466630

The frontier Chinese open source models are already at this level, GLM-5.1 and Kimi K2.6 specifically.

DeathArrow · 2026-05-11T05:41:39 1778478099

But you can't run the locally at full quality. And quantized versions you can run locally are a far cry from Opus 4.6.

bugglebeetle · 2026-05-11T05:44:48 1778478288

Anthropic serves quantized versions of their models and you can run q8 locally.

nicce · 2026-05-11T07:43:10 1778485390

I don't even use Sonnet anymore. Current feels worse than Claude 3.5 couple years ago. They have quantized that much? Switched to GPT 5.5, let's see how long it will stay good.

greenavocado · 2026-05-11T19:58:12 1778529492

The problem with GPT is it doesn't have the same wholistic worldview of the problem space as Claude

nicce · 2026-05-11T20:04:44 1778529884

Hmm, what that means or how can you even measure it? At least for all my recent problems, GPT 5.5 has performed better.