More

mohsen1 · 2026-02-19T17:01:37 1771520497

is there something in your prompt about hats? why the pelican always wearing a hat recently?!

bigfishrunning · 2026-02-19T17:05:43 1771520743

At this point, i think maybe they're training on all of the previous pelicans, and one of them decided to put a hat on it?

Disclaimer: This is an unsubstantiated claim that i made up

mohsen1 · 2026-02-19T09:28:33 1771493313

SWE-bench Verified is nice but we need better SWE benchmarks. Making a fair benchmark is a lot of work and a lot of money needed to run it continuously.

Most of "live" benchmarks are not running enough with recent models to give you a good picture of which models win.

The idea of a live benchmark is great! There are thousands of GitHub issues that are resolved with a PR every day.

cbracketdash · 2026-02-19T09:55:30 1771494930

Help us out with Terminal Bench 3.0!

https://docs.google.com/document/d/1pe_gEbhVDgORtYsQv4Dyml8u...

mohsen1 · 2026-02-18T00:06:52 1771373212

top hat was there in another attempt I saw in the comments here.

mohsen1 · 2026-02-13T12:32:42 1770985962

I have some very difficult to debug bugs that Opus 4.6 is failing at. Planning to pay $250 to see if it can solve those.

mohsen1 · 2026-02-12T19:34:41 1770924881

Hi there!

I'm thinking about the same things and landed on Rust. I think we're at a very critical point in software development and would love to chat with you and share/learn ideas. Please let me know if you're interested.

mohsen1 · 2026-02-12T12:46:23 1770900383

I am planning to add similar concepts to Yek. Either tree-sitter or ast-grep. Your work here and Aider's work would be my guiding prior art. Thank you for sharing!

https://github.com/mohsen1/yek

mohsen1 · 2026-02-11T22:36:42 1770849402

I am using it with Claude Code and so far so good. Can't tell if it's as good as Opus 4.6 or not yet

mohsen1 · 2026-02-10T20:58:37 1770757117

I am not willing to share my sheepish prompts with my team. Sorry!

ibejoeb · 2026-02-10T21:39:50 1770759590

Hah. "If it's not too much trouble, would you mind if we disable the rimraf root feature?"

Gotta bully that thing man. There's probably room in the market for a local tool that strips the superfluous niceties from instructions. Probably gonna save a material amount of tokens in aggregate.

schaefer · 2026-02-10T22:03:24 1770761004

I'm with you. I start every new prompt with: "Good morning", even at midnight. I'll be so embarrassed if that leaks.

LightBug1 · 2026-02-10T23:47:33 1770767253

mohsen1 · 2026-02-07T22:46:33 1770504393

I am thinking about this a lot right now. Pretty existential stuff.

I think builders are gonna be fine. The type of programmer were people would put up with just because they could really go in their cave for a few days and come out with a bug fix that nobody else on the team could figure out is going to have a hard time.

Interestingly AI coding is really good at that sort of thing and less good at fully grasping user requirements or big picture systems. Basically things that we had to sit in meetings a lot for.

ericpauley · 2026-02-07T23:08:40 1770505720

This has been my experience too. That insane race condition inside the language runtime that is completely inscrutable? Claude one-shots it. Ask it to work on that same logic to add features and it will happily introduce race conditions that are obvious to an engineer but a local test will never uncover.

pointlessone · 2026-02-08T08:49:46 1770540586

I’m not convinced. That sort of thing usually depends on some very specific arcana or weird interaction between systems that is not in the code. It usually requires either external knowledge or deep investigation and compilation of evidence from multiple sources. I haven’t seen AI do that much.

Look at recent examples of browsers and matrix servers. AI can’t even follow extremely detailed specs with extensive test suites.

If anything, nice and friendly but mediocre devs are in more immediate danger than rough but extremely competent devs.

But we’ve seen C-suits losing institutional knowledge at a drop of a hat for decades so who knows? Maybe knowledge and skill are not that valued.

wiseowise · 2026-02-07T23:21:25 1770506485

> The type of programmer were people would put up with just because they could really go in their cave for a few days and come out with a bug fix that nobody else on the team could figure out is going to have a hard time.

Amen. It was a good time while it lasted.

oytis · 2026-02-07T23:16:31 1770506191

All software engineers become pretty much the same in this world though. Anyone can sit in the meetings.

falloutx · 2026-02-07T23:24:11 1770506651

meetings hardly reach anywhere. most of the details are eventually figured out by developers when interacting with the code. If all ideas from PMs are implemented in a software, it would eventually turn into bloatware before even reaching MVP stage.

diob · 2026-02-08T00:41:16 1770511276

Not really, in my experience you still have to be good at solving problems to use it effectively. Claude (and other AI) can help folks find a "fix", but a lot of times it's a band-aid if the user doesn't understand how to debug / solve things themselves.

So the type of programmers you're talking about, who could solve complex problems, are actually just enhanced by it.

ratrace · 2026-02-08T17:15:15 1770570915

> The type of programmer were people would put up with just because they could really go in their cave for a few days and come out with a bug fix that nobody else on the team could figure out is going to have a hard time.

This is the exact type of programmer that isn't going to have any issues - ones who actually know what they're doing and aren't just going to vibecode react slop.

mohsen1 · 2026-02-05T20:34:19 1770323659

At this point, if you're paying out of pocket you should use Kimi or GLM for it to make sense

andai · 2026-02-06T00:00:57 1770336057

GLM is OK (haven't used it heavily but seems alright so far), a bit slow with ZAI's coding plan, amazingly fast on Cerebras but their coding plan is sold out.

Haven't tried Kimi, hear good things.

bluerooibos · 2026-02-05T22:25:17 1770330317

These are super slow to run locally, though, unless you've got some great hardware - right?

At least, my M1 Pro seems to struggle and take forever using them via Ollama.

corysama · 2026-02-06T04:01:41 1770350501

Try this https://unsloth.ai/docs/models/qwen3-coder-next