> A general pattern for LLMs is that they look really good at things you are bad at.
This is true for coding, too, which I think, to a large degree, might explain the polarized differences in opinions on HN about the quality of LLM-produced code. You have the 1. "AI produces code better than I could possibly write, one shots things it would take me days to do, and has made me 10X more productive!" camp, and you have the 2. "AI constantly produces poor code needing rework, makes mistakes, has to be babysat, and ultimately costs me time!" camp, with a spectrum in between those. How could the output of the same product be seen so differently? Well, I have bad news for camp 1...
I've caught Claude Code generating some pretty egregious security vulnerabilities. I'm using it to build an AI RPG site and the goal is to use web assembly as a bridge between author submitted code and LLMs in order to help shore up state management at the game level.
The language that I picked for the game runtime is Python. Claude really thought that the best way to validate user submitted Python was to bypass the WASM sandbox and execute it within the application container using shell exec - essentially opening up an RCE vulnerability.
I also find that the quality of Claude Code degrades substantially. Claude really wants to implement every feature in as bespoke way as possible. This is fine when you first generate the project but over time you'll find that every web modal is implemented differently. Every button is different. Business logic is disconnected. It's why agentically produced codebases are MUCH larger than they should be; every feature is developed in a vacuum.
Then I'm trying to shove stuff in my AGENTS.md or CLAUDE.md files like "ALWAYS look for existing patterns within the codebase to keep it consistent." But the harness doesn't always work and it'll generate useless, verbose code anyways.
In some cases it's useful - like if I am shaky on the DSA knowledge needed for a specific operation or optimization then Claude can replace Stackoverflow. But, man, I'm so frustrated with it.
> Business logic is disconnected. It's why agentically produced codebases are MUCH larger than they should be; every feature is developed in a vacuum.
I just had Opus 4.7 build a feature twice, because it didn't close a ticket the first time. (I'm trying to solo-build a fairly large greenfield project, and am at the point where I let it go ham over my codebase because of the scale of things.)
I then spent a couple of hours asking it to compare the features. It argued that they were completely different features for a while, then eventually acquiesced and said that they were redundant.
That's a couple of thousands of tokens and time I'm never going to get back.
I think there are some factors beyond just skill too - the kinds of tasks you're giving the AI, and how involved you are in ensuring the output is good (via either extensive planning guidance, extensive review/testing, or a combination).
The 'camp 1' people in the pre-LLM days were probably the ones that often just copy+pasted code from SO they didn't really understand, but since the code seemed to work when they ran it, they thought it was all fine and continued on.
Whereas the 'camp 2' people when trying to find an answer to something, discarded 99% of SO and other similar answers, having the knowledge to see how they had broken edge cases, were limited in some way, didn't actually solve the underlying problem, etc.
Nowadays, 'camp 1' people just use the LLM output and it "seems to work" and consider it all fine. Whereas the 'camp 2' people still continuously see all the faults with it.
Being faster than humans at mundane and verifiable tasks is a useful thing. Great for format conversions. Api mappings etc. if you don't understand the algorithm you are asking it to implement, you better at least understand how to generate a large set of correct input and output pairs yourself, because it will absolutely make stuff up and adjust the test cases to pass.
LLMs can generate code, but the quality of the code at scale is just not there currently by all important metrics such as security, maintainability, separation of concerns, etc.
Today, it's a kind of chaos magic wherein you summon the beast and try your best to contain him, knowing that someone will probably die in the process. Sometimes literally. It's still a force multiplier in the right hands and domain, and agentic coding is a paradigm that won't retract, at least until something better supplants it.
The problem is that few engineers actually have the discipline available to constrain these models appropriately and instead rely on a hodgepodge network of "skills" aka prompt fragments which are passed around and glued together.
I consider myself as having such discipline, being strongly architecturally-minded, user-first, etc. in both design and implementation. And I still struggle to contain the beast many days. I just got through screaming at Claude for intentionally taking a shortcut that I'd forbidden, leading to a ton of wasted time and tokens.
Sometimes I feel like I saved weeks of R&D with a single ten-minute task handed off to an agent, other times I feel like I'd get better returns playing slots in Vegas at the alarming rate Claude burns through money.
I'm tired as fuck of anti-ai zealots pretending like every human is a fucking paragon at programming. I've literally never seen Claude Code produce as bad of code as generated by humans. Literally never. Yet the anti-ai zealots pretend like humans never introduce a bug into a system. Only LLMs produce slop or take shortcuts or ignore tests or do incredibly dumb fucking shit. It's fucking ridiculous. As if The Daily WTF didn't exist before LLMs. The reality is the "average" programmer is far below the skill floor of Claude Code or other frontier models. Those models will write test and explore more edge cases than the "average" developer ever will. But all these zealots pretend like they have only ever worked with the top 1% of the top 1% who never make mistakes or introduce bugs. Ultimately they are full of shit. You're lucky as fuck if your developers can even tell you what common design patterns are. The bar is that low and the HN crowd likes to pretend every developer is Linus Torvalds and not a clueless moron desperately coordinating API layers.
I’d probably word it differently but I agree with much of the sentiment here. I’m also reminded of the stat where 93% of drivers rated themselves as above average.
I'm in camp 3, where sometimes I don't really care how good or bad the code is. For internal tools for example, you can let the LLM crunch out code really fast, you can validate output but don't even have to look at the code. These kind of "weekend projects" can get finished in an hour or two, and so are really 10x.
For bigger production ready code, you indeed have to guard the architecture. But for the code, in some corners you can get away with sloppy code, as long as it kind of works.
What I'm saying is, code doesn't always has to be great. You will just have to judge the places where it needs to be high quality, and other places where you can get away with sloppy code.
That judgment is an essential skill of an experienced programmer, and it is required at every level of the big picture, from high level architecture decisions to the development of particular features: what should I polish and what needs to be developed fast? How exactly should I cut corners in the safest way?
I used LLM to teach me how to code and get through obstacles that would have me spending a lot of time doing ???. Typically, I just write code that I know a lot of time is absolutely wrong but the LLM helpfully point out mistakes.
I am slowly doing more of my own code and cutting out the LLM out of the loop in the unfamiliar territory I am working in.
My main concern is not so much productivity but understanding the code I have written and feeling agency over it.
I think it matters everywhere -- just because some fields get away with making trash doesn't mean that they're not vulnerable to people taking their lunch by making something distinctly-not-trash. People put up with a lot when there's lock-in, but there's a breaking point. (I say now using a linux desktop about 90% of the time now because windows has become such a fucking disaster)
Being vulnerable makes money unfortunately. And making money now has always been seen as more important than being sustainable in the long-term. Even if an exploit later takes away every cent of earnings.
I see this sentiment occasionally brought up, and at the same time see what’s happening to Github where the majority of their distributions is not security or efficiency related (not saying it’s because of LLMs, we don’t know). The point is, these things matter beyond beautiful code. You loose trust and you lose customers and money.
The hard part too is it's not like you can just learn the basics and be able to tell good code apart from bad -- the more you learn to code, the more intricate your understanding of good code is. It's like becoming a good writer; just knowing grammar and spelling doesn't make your writing interesting. Not to mention that there's just a lot of bad advice out there that you can't recognize as bad advice if you're not a regular practitioner. Like, "Clean Code" is IMO a terrible book, but a ton of people follow it because it has the sheen of respectability.. until, hopefully, they learn some new patterns and realize those old ones aren't very good. But you pick these things up with experience and doing the work! Otherwise if you're just reading other peoples opinions, you'll see a bunch of people say "Clean Code is great" and a bunch of other people say it's rubbish, and you'll have no way to know who you should listen to. (If you disagree with me on Clean Code the book that's fine -- I'm just using it to make a point -- sub in a different book/ideology if it suits you)
I think looking at an LLM code and thinking you're now a coder is like watching a someone play guitar and think you can just pick up a guitar and play a song. The truth is, if you want to be good, you have to do the work.
One of the things I hate about AI is that we're going to have a generation of "programmers" that are absolutely shit at programming, create problems for everyone else, and will have absolutely no idea how bad they are. And they'll probably never get better, because you can't get better by just asking claude to do shit for you. And then the LLMs themselves will probably start to degrade because they'll be trained on the slop since it'll heavily outnumber handwritten code..
>I think looking at an LLM code and thinking you're now a coder is like watching a someone play guitar and think you can just pick up a guitar and play a song. The truth is, if you want to be good, you have to do the work.
So many posts here on HN claiming they created another useful tool with AI.
No, you didn't create it. AI did. You only had a supporting role. You're Ringo Starr and the AI is John Lennon.
I disagree this is the source of the polarization. Maybe it's part of it.
I have been coding since about 1983 or so. I shipped high quality products that have been used by millions of people. From embedded software to desktop applications to distributed systems.
I don't think I'm in the "don't understand what code should look like camp" (I mean you never know but the evidence seems to show that I do know what I'm doing). I use AI as a tool and it helps me be more productive. I don't "one shot things that would take me days to do". I use it to help me automate things that I could do manually where it is faster and more effective. I review every step and if I don't like something I adjust. There are some specific situations where it basically does as good a job as I would do in running some experiments, doing some analysis or writing some small amount of code. I still know what the changes need to look like broadly, where to make them, and what patterns to follow. It just automates the work and sometimes does have some additional insight that can complement my views. Unlike me it is all knowing about everything in terms of access to "knowledge". It knows all the details of how a certain runtime manages memory, Linux internals and various open source software. I could go look it up myself (which I'd do before AI) but I don't hold it all in my head like AI basically does. It is also "all knowing" in the code base I work in (more so than me, it's a huge code base, I have an outline and a high level picture in my head but not every single code line) where again I can dive into the code but it helps me extract the relevant information faster.
I think the polarization is more on the how you use the tool, what situations you use the tool for, which domain are you operating in (languages, applications etc.). You can also one-shot simple tools and helpers that are not the production software which is another way to accelerate your workflow.
I’ve seen people coding for 4 decades, thinking the same as you about themselves, and were bad coders. Unfortunately, nobody can tell you whether you’re good or bad without seeing your code. Your claims means nothing on the internet.
What about the buisness side of things that does not care for shiny code, but shipping things to make money?
That simple arcade game (without in game transactions) needs to be fun, that website that needs to attract visitors (but not sell them anything or handle sensitive data)?
They don't care about abstract code quality, they care if it works and useful.
So a good coder here means he or she could get to working results according to what the client wants fast. And those things likely make up the vast majority of written code. So no wonder AI gets adopted as it is a powerful tool here to be even faster.
Not all code runs in airplanes, handles financial transaction or sensitive user data - for this you need the best code possible and nothing vibe coded or quick and dirty hacks.
And oh wonder, it is possible to combine both. Because yes, websites often include financial transactions nowdays, but that part can and should be handled with care. People who move slow and check things. And then those who are quick to build things on top of it.
But I strongly object to dividing programmers absolutely in good and bad programmers, when the field is so big and the requirements not the same.
Some optimize in speed, some in quality. And yes, some are just bad in both. And some can do both - but they are very rare, in my experience.
> And some can do both - but they are very rare, in my experience.
As far as I know basically all of the successful software companies had these quite early. Of course, you need other kinds of people too. And not everybody needs to be like that. But you absolutely need those kind of people.
But if you give me a few examples where this was not the case, and not recently, or during the dot com boom, where hype overwrote everything, then I’ll change my mind.
I'm just trying to provide useful context. I can claim anything and you don't know me either way. Last I checked other than your peers (which you can imagine I've had) there is no subjective "stamp" for good vs. bad. Many people who have shipped nothing and have no experience think they're great coders as well.
That can still easily mean that they didn’t give any value or minuscule amount of value to the project which caused their success.
The most successful projects which I’ve seen closely, all of them had only a few people who mattered, everybody else could be replaced at any given time basically, without a real impact. All of the failing ones were those in which those people didn’t exist, or were too few of them. This is exponentially more important in early phases of projects.
fwiw I wrote key features/pieces of pretty much everything I worked on. I was more or less the technical lead (and later engineering manager) on all the software I shipped.
It seems people really want to not believe AI can be useful for strong developers. That's fine. I don't really have a bone in this game, I'm anonymous here, and people can think whatever they choose to ;)
Anonymity is a two way thing. Maybe, I don't need to use "more or less". Maybe, I'm an ultra heavy user of AI. It's even possible that I'm magnitudes better developer in every single aspects.
Maybe not.
I'm just waiting for somebody who send me code, which was generated by AI, not overwritten almost completely, and it's not shit. The funny thing is that some people here were so convinced that AI is great, that they recorded how they work with AI. And two things:
- They were slower than manual copy pasting
- They still somehow introduced bugs, and very suboptimal solutions...
Also, it would be good, that anybody could show me anything, that shows, how people became not terrible with code review suddenly. Because before AI, it was a common knowledge, that almost everybody was bad with it, and people rarely did it properly, because it was considered annoying, and not because they were useless. There were jokes about rewriting things, exactly because of the same reasons, and they heavily based on reality. And suddenly, we pretend that this changed.
And somehow you should really would need to convince me that the 100s of thousands of lines of code which I generated with AI in the past years, somehow, it's better than what it is. But I'm sure, that it's easier to assume that I didn't try something, than showing only once what "good" means in this case. Unfortunately, there is nothing similar here, than for example "Groovy is a great programming language", which is a dead giveaway that whoever said that is not just bad developer, but somebody who I would fire immediately from every single project to which I'm related to. Especially if they are tech lead. But there are such people, and some of them would claim the same thing as you. // Obviously interns and juniors can think whatever they want. They are labeled as such, because they cannot know yet.
What I do at work is (obviously) not something I can share.
Groovy ;) You must love Jenkins. At least I can rest at ease you won't fire me for that infraction. I used C and C++ most of my career and more recently Go (which I really loved when it was created but my take is a bit more nuanced these days after seeing a really large code base evolve over years).
I'm confused though. You generated 100s of thousands of lines of code using AI and you think it's crap? The code AI generates for me is not some pinnacle of software engineering. It is repeating existing patterns or fairly simple concepts. I treat AI like an quick intern that scales infinitely (or a junior developer). And yes, juniors and interns don't write the best code but in many organizations there is still a fair amount of code written by them.
The thing is that in a large team/project (and the one I'm on has hundreds of developers of various skill levels) there's an endless backlog of things that can be improved including relatively easy features or refactoring. The constraints are either organizational or time. AI enables these things to get done with very little overhead so that's a net positive. It moves the needle for how much time/effort does it take to address "not that hard" issues and with proper prompting and examples it does a decent job. The bar isn't code that John Carmack would write in a week, the bar is improving a certain crappy area of the code to be more reliable or more performant or a little bit cleaner. This is life for most software projects. Yes, in a perfect world every software project is perfection. And maybe some organizations are able to approximate that.
> I really loved when it was created but my take is a bit more nuanced these days after seeing a really large code base evolve over years
At least, I can be sure that we are not near the same level. But at least, you hopefully will recognize the same thing with new languages… without seeing them failing first.
> You generated 100s of thousands of lines of code using AI and you think it's crap?
This is a funny question. First of all, there are people whose job is to test LLMs. However, I’m not one of them. I simply tried them, generated, and still generate a ton of code with them, then I rewrite basically every single line of them. Because they use for example outdated patterns, which causes the same problems what you’ve seen with Go.
> It is repeating existing patterns or fairly simple concepts.
Yes, and most of the most popular ones are mediocre the best. Average code from which LLMs are learning are made by beginners, not experienced ones, because their sheer number. So LLMs will use those.
> I treat AI like an quick intern
This is always the funniest sentence regarding this. Before AI, it was quite well known that you don’t ever allow interns near important parts of the code. Now, people who supposed to know this, and the reasons for this, somehow forgot this aspect also, just like the review thing.
> AI enables these things to get done with very little overhead so that's a net positive.
No, it does allow to tick a ticket in Jira. And if you handle this in any other way, then you will fail miserably, as how for example Microsoft quite openly did with this.
> a little bit cleaner
Ah yes, the infamous “cleaner”, about which the exact opposite is quite well known, and it’s quite obviously not true with every single vibe coded projects, without exceptions. If that’s cleaner in any environment, then I have a bad news: you’ve never worked with even medior developers, ever. Seriously, that code quality, especially architecturally, is junior level shit.
My previous boss did these low hanging fruits, he at least would never tell anything more than “it’s better than nothing”. And only regarding non-important code, which can fail without real consequences. And can be shit, obviously. The whole point was that even shit is better than nothing. Not that it’s acceptable quality in any way.
At least, you were obvious at least, that your “success” is magnitudes different, than mine. And not regarding code quality, but when a project/product successful. I completely forgot that I’ve met people who sold that their teams completed the most tickets at a company in a given time frame as success. Probably we are closer than this, but still very far away.
Do you have an example of a large scale open source project where you would consider that entire code base to be high quality to your standards?
I think you're saying "my bar is so high you even can't understand where it is". I've worked with hundreds if not thousands of software developers, in many companies from startups to well established ones, including producing products that are what I'd call critical infrastructure that work reliably and do what they're supposed to do. I think I have a pretty decent idea of what an "average" software developer looks like and the overall shape of that curve, and similarly the architecture/design curve of various real world projects. I've built software on my own as a team of one and I've worked with teams of more than 100 people. Anyways, if your assertion is what I wrote above then clearly LLMs can't replace the mythical programmer that you are. But that's not what they're aiming to replace. As to "vibe coded projects" I already said that's not how I use LLMs and I agree that can easily end up like a pile of garbage (but still has its place in the new ecosystem).
The only real test of software is whether it does what it's supposed to do: reliably, is maintainable, can be extended and evolve without losing these attributes. If you've shipped systems that are used by many, work well, can evolve to support new features etc. - kudos to you.
It really depends. If you're cranking out prototypes or testing ideas, it's genuinely great. But if you're familiar with the code it's very easy to spot its (many) mistakes. It's Gell-Mann amnesia.
Then again, I just caught Claude writing setTransparent(!opaque == false), opaque being a bool, on a purely vibecoded project. Which was pretty impressive. ("• You're right, that's nonsense.")
In my case I see Claude produce code much worse than I would, but it's certainly much quicker and, even after reworking, it makes me finish tasks in less time.
Camp 1 is winning because we did it. We built an artificial brain. Frontier models can think, reason, and produce code better than the average human programmer. (You have not met many actual programmers slaving away on enterprise code bases if you do not understand that this is the case; the self-selecting HN crowd does not represent the profession as a whole by a long shot.) It's just a matter of, how committed are you (is your organization) to really learning and leveraging the tools?
If you’re typing 100s of lines, you’re already doing it wrong. My most used operation is completion, just before copy-paste. The reason I like vim is that it makes such operation so fast. And the reason I like emacs is that it has superpowered version of those operations.
Yes, the third camp and probably the most effective is to do a decent amount of writing yourself and use the LLMs as codegen machines, but where the DSL is natural language. Deepseek v4 flash is an incredible model for this, you can actually get into flow state as you write code and then delegate boring code to the magic autocompletion machine to autocomplete.
The better workflow, and I think the one adopted by people in the second group, is to take a step back from coding, do a bit of thinking about the domain, design a better abstraction for the problem (architecture, data structure, algorithms), and then write the small amount of code you probably need.
Code should grow according to need, not for its own sake. Start small, use it in the real world, and then improve it.
I agree with that, but there still is some code that eventually needs to be written and there is a subset of that code which can be generated. I think it depends on the domain as well - for example, UI code is trivially generatable by LLMs.
RAD tools like Delphi, qt creator, glade, Android Studio, Xcode’s Interface builder,… have always make it trivial to generate UI code. And there’s libraries for other concerns.
The majority of a project code are written at the beginning or when a major feature is introduced. The daily work is mostly tweaking. And you can’t tweak without a good understanding of the module.
> the polarized differences in opinions on HN about the quality of LLM-produced code
Are there strong differences of opinion about the quality? I've seen very few people claim that LLMs write better code than they do.
> one shots things it would take me days to do, and has made me 10X more productive
This is an entirely different claim from the former, and you're conflating them.
The boost from LLM-assisted code isn't _expertise_, it's the power of having an always-on team of reasonable junior developers from every discipline you can possibly imagine willing to do your whim.
Take for example Jesse Vincent / obra[0], who is an exceptional developer, with great taste, and a stack of well-received open-source software to his name. He posts a lot on how he's being made more productive by AI-assisted development. Do you have bad news for him about the quality of his work...?
Eric S. Raymond has basically stopped writing code by hand altogether. He consistently delivers high quality code without intervening to fix the LLM's output himself, much faster than he would have been able to alone. This is very bad news for camp 2 because it means one of three things:
1) he is extraordinarily lucky
2) he is extraordinary brilliant at manipulating LLMs
3) you really are "holding it wrong" and you are hobbling yourself with your failure to properly learn the tools
I'm very confused by this logic. Why should I care about his output compared to what I observe from the larger group if he's not an outlier, and if he is an outlier, why would the second one be unlikely? The only way I can make sense of this is if you're claiming that he's both an exceptional coder and that skill in coding by hand is completely uncorrelated to skill in using LLMs to code, and it's not clear to me why that would be more likely than either or both of those being false.
This is true for coding, too, which I think, to a large degree, might explain the polarized differences in opinions on HN about the quality of LLM-produced code. You have the 1. "AI produces code better than I could possibly write, one shots things it would take me days to do, and has made me 10X more productive!" camp, and you have the 2. "AI constantly produces poor code needing rework, makes mistakes, has to be babysat, and ultimately costs me time!" camp, with a spectrum in between those. How could the output of the same product be seen so differently? Well, I have bad news for camp 1...