More

dhorthy · 2026-03-17T21:44:49 1773783889

it is very hard for me to take seriously any system that is not proven for shipping production code in complex codebases that have been around for a while.

I've been down the "don't read the code" path and I can say it leads nowhere good.

I am perhaps talking my own book here, but I'd like to see more tools that brag about "shipped N real features to production" or "solved Y problem in large-10-year-old-codebase"

I'm not saying that coding agents can't do these things and such tools don't exist, I'm just afraid that counting 100k+ LOC that the author didn't read kind of fuels the "this is all hype-slop" argument rather than helping people discover the ways that coding agents can solve real and valuable problems.

knes · 2026-03-18T05:10:00 1773810600

Agreed.this paper studied 33k+ agent-authored PRs on GitHub (https://arxiv.org/pdf/2601.15195)

#1 rejection reason: missing context. 80% needed human fixes. Agents can write code fine. They just don't know what "done" looks like in your codebase.

Count successful merges into repos with real history instead of LOC and the hard part is specification, not execution.

Wrote about this topic @ https://www.augmentcode.com/blog/the-end-of-linear-work

dhorthy · 2026-03-01T01:43:36 1772329416

software engineering is still software engineering.

just because you don't type out the characters doesn't mean you're not designing systems and thinking critically and leveraging your experience.

also: do we think this is written by ai? do we care anymore?

frez131 · 2026-03-01T05:07:07 1772341627

I personally care deeply when content intended as communication is AI generated (much more so than if code is generated).

On the surface level, I find it a bit disrespectful when I'm communicating with someone who's just using an LLM to generate their responses. Imagine if you are talking to someone in-person, and they pull out a phone, generate a response then read it back out to you?

On a deeper level, if someone's generated a bunch of text and clearly hasn't devoted the time into generating/editing it that they're expecting me to invest while reading it, I'm just not going to read it.

tkel · 2026-03-01T04:43:31 1772340211

Yes it's very obviously written by AI and made me immediately close the tab. Not gonna read a self-promotional piece written by an LLM that someone probably only gave it one sentence prompt: "merge these ideas".

dhorthy · 2026-01-19T18:37:15 1768847835

I don’t think anyone serious would recommend it for serious production systems. I respect the Ralph technique as a fascinating learning exercise in understanding llm context windows and how to squeeze more performance (read: quality) from today’s models

Even if in the absolute the ceiling remains low, it’s interesting the degree to which good context engineering raises it

ossa-ma · 2026-01-19T19:20:09 1768850409

How is it a “fascinating learning exercise” when the intention is to run the model in a closed loop with zero transparency. Running a black box in a black box to learn? What signals are you even listening to to determine whether your context engineering is good or whether the quality has improved aside from a brief glimpse at the final product. So essentially every time I want to test a prompt I waste $100 on Claude and have it an entire project for me?

I’m all for AI and it’s evident that the future of AI is more transparency (MLOPs, tracing, mech interp, AI safety) not less.

alansaber · 2026-01-19T19:40:36 1768851636

Current transparency is rubbish but people will continue to put up with it if they're getting decent output quality

dhorthy · 2026-01-19T22:43:25 1768862605

there is the theoretical "how the world should be" and there is the practical "what's working today" - decry the latter and wait around for the former at your peril

dhorthy · 2026-01-19T18:35:10 1768847710

there are hundreds of useful resources, including many linked in the article itself

dhorthy · 2026-01-19T18:34:38 1768847678

the note about the crypto token was intended to “okay this is now hype slop and it’s time to move on”

dhorthy · 2026-01-08T15:51:02 1767887462

I read it. i agree this is out of touch. Not because the things its saying are wrong, but because the things its saying have been true for almost a year now. They are not "getting worse" they "have been bad". I am staggered to find this article qualifies as "news".

If you're going to write about something that's been true and discussed widely online for a year+, at least have the awareness/integrity to not brand it as "this new thing is happening".

flumpcakes · 2026-01-08T15:59:31 1767887971

Perhaps the advertising money from the big AI money sinks is running out and we are finally seeing more AI scepticism articles.

minimaxir · 2026-01-08T16:00:52 1767888052

> They are not "getting worse" they "have been bad".

The agents available in January 2025 were much much worse than the agents available in November 2025.

Snuggly73 · 2026-01-08T16:16:17 1767888977

Yes, and for some cases no.

The models are gotten very good, but I rather have an obviously broken pile of crap that I can spot immediately, than something that is deep fried with RL to always succeed, but has subtle problems that someone will lgtm :( I guess its not much different with human written code, but the models seem to have weirdly inhuman failures - like, you would just skim some code, cause you just cant believe that anyone can do it wrong, and it turns out to be.

minimaxir · 2026-01-08T16:18:33 1767889113

That's what test cases are for, which is good for both humans and nonhumans.

Snuggly73 · 2026-01-08T16:26:16 1767889576

Test cases are great, but not a total solution. Can you write a test case for the add_numbers(a, b) function?

Snuggly73 · 2026-01-08T16:42:25 1767890545

Well, for some reason it doesnt let me respond to the child comments :(

The problem (which should be obvious) is that with a/b real you cant construct an exhaustive input/output set. The test case can just prove the presence of a bug, but not its absence.

Another category of problems that you cant just test and have to prove is concurrency problems.

And so forth and so on.

minimaxir · 2026-01-08T16:34:34 1767890074

Of course you can. You can write test cases for anything.

Even an add_numbers function can have bugs, e.g. you have to ensure the inputs are numbers. Most coding agents would catch this in loosely-typed languages.

Snuggly73 · 2026-01-08T16:32:46 1767889966

I mean "have been bad" doesnt exclude "getting worse" right :)

dhorthy · 2026-01-05T20:41:57 1767645717

engineers always want to re write from scratch and it never works.

a tale as old as time - my second job out of college back in like 2016, I landed at the tail end of a 3-month feature-freeze refactor project. was pitched to the CEO as 1-month, sprawled out to 3 months, still wasn't finished. Non-technical teams were pissed, technical teams were exhausted, all hope was lost. Ended up cutting a bunch of scope and slopping out a bunch of bugs anyway.

dhorthy · 2026-01-05T20:09:59 1767643799

i had the privilege of working w/ some incredible eng leaders at my previous gig - they were very good at working both upwards and downwards to execute against the "50/50" rule - half of any given sprint's work is focused on new features, and half is focused on bug fixes, chores, things that improve team velocity.

dhorthy · 2026-01-05T19:56:31 1767642991

the people yearn for refactoring

dhorthy · 2025-12-01T01:21:49 1764552109

I think the key here is “if X then Y syntax” - this seems to be quite effective at piercing through the “probably ignore this” system message by highlighting WHEN a given instruction is “highly relevant”

throwaway314155 · 2025-12-01T04:09:19 1764562159

What?

xpe · 2025-12-01T16:12:38 1764605558

It helps when questions intended to resolve ambiguity are not themselves hopelessly ambiguous.

See also: "Help me help you" - https://en.wikipedia.org/wiki/Jerry_Maguire