It hallucinates a lot more then Sonnet or even MiniMax M2.5. Especially in tool ...

noelsusman · 2026-04-02T20:21:54 1775161314

My initial experiments are not encouraging. I have a basic planning prompt that includes instructions not to edit any files or implement anything. Qwen-3.6-Plus will consistently ignore that completely and proceed with implementation. I expect that kind of behavior from small models I run locally, not a hosted closed model claiming to compete with the frontier models.

justinclift · 2026-04-02T21:22:50 1775164970

> It hallucinates a lot more then Sonnet or even MiniMax M2.5.

Ugh, that's not good.

I evaluated Kimi K2 a while back for some text understanding -> summarisation tasks, and of the 100 tasks it hallucinated about 30% of the output. :( :( :(

dryarzeg · 2026-04-03T09:09:17 1775207357

> I evaluated Kimi K2 a while back

I guess that it was Kimi K2-Instruct, the first model (or it's fine-tune) in the lineup of Kimi-K2 models. And I remember trying it just for the sake of curiosity, and... except for the almost total absence of the sycophancy and "sugar syrup" in it's outputs, it was not very good at the time. Right now though, if you're still interested in this model family, you could look at Kimi-K2.5 which is way better.

That said, it's still not perfect, and to be honest, looking where things are going with LLMs right now I prefer the use of my own brain (local private inference with power consumption of ~20-25W, having a capability for continuous learning and performing real-world tasks) to the use of any "AI" model (including proprietary models such as Claude 4.6 Opus, Gemini 3.1 Pro and others).

: )