More

electroglyph · 2026-04-28T05:31:48 1777354308

i'm doing inference on a free mi300x instance from AMD right now. not sure if the software stack is just old or what, but here's what i've observed: stuck on an old version of vllm pre-Transformers 5 support. it lacks MoE support for qwen3 models. oss-120b is faaaar slower than it should be.

int8 quantization seems like it's almost supported, but not quite. speeds drop to a fraction of full precision speed and the server seems like it intermittently hangs. int4 quantization not supported. fp8 quantization not supported.

again, maybe AMD is just being lazy with what they've provided, but it's not a great look.

right now the fastest smart model i can run is full precision qwen3-32b. with 120 parallel requests (short context) i'm getting PP @ 4500 tokens/sec and TG @ 1300 tokens/sec

electroglyph · 2026-04-26T02:31:32 1777170692

but should you drive or walk to the car wash?

electroglyph · 2026-04-25T22:57:57 1777157877

i dunno, Opus is losing it's edge imo. i regularly use a mix of models, including Opus, glm 5.1, kimi 2.6, etc. and i find that all of them are pretty much equally good at "average" coding, but on difficult stuff they're nearly equally bad. i can't deny that Opus has an edge, but it's not a huge one.

electroglyph · 2026-04-25T22:27:38 1777156058

> they also don't know what they don't know

they sort of do tho:

https://transformer-circuits.pub/2025/introspection/index.ht...

2ndorderthought · 2026-04-25T23:08:16 1777158496

I won't quibble even though I likely should. Have to remember this is HN and companies need to shill their work otherwise ... Yes.

I will play along and assume this is sound. 10-40% +/- 10% is along the lines of "sort of" in a completely unreliable, unguaranteed and unproven way sure.

electroglyph · 2026-04-25T10:15:42 1777112142

how about a unicode art tool?

https://electroglyph.github.io/atheriz_draw/

electroglyph · 2026-04-20T09:00:25 1776675625

https://sleepingrobots.com/dreams/stop-using-ollama/

electroglyph · 2026-04-13T10:10:25 1776075025

flow matching is making some strides right now, too

electroglyph · 2026-04-11T05:20:22 1775884822

i don't buy this. distilled how? you don't get access to logprobs, and the thinking traces are fake and compressed. it's an expensive way to get potentially substandard training data.

electroglyph · 2026-04-09T10:31:17 1775730677

nah, a crypto grifter released one with cooked benchmarks

electroglyph · 2026-04-08T08:41:02 1775637662

better than Opus? not even close. after struggling thru server overload for the past couple hours i finally put 5.1 thru the paces and it's....okay. failed some simple stuff that Sonnet/Opus/Gemini didn't. failed it badly and repeatedly actually. this was in typescript, btw. not sure if i'll keep the subscription or not