Hacker Newsnew | past | comments | ask | show | jobs | submit | adriand's commentslogin

The president of a company I work with is a youngish guy who has no technical skills, but is resourceful. He wanted updated analytic dashboards, but there’s no dev capacity for that right now. So he decided he was going to try his hand at building his own dashboard using Lovable, which is one of these AI app making outfits. I sent him a copy of the dev database and a few markdown files with explanations regarding certain trickier elements of the data structure and told him to give them to the AI, it will know what they mean. No updates yet, but I have every confidence he’ll figure it out.

Think about all the cycles this will save. The CEO codes his own dashboards. The OP has a point.


I'd argue it's not CEOs job to code his own dashboards...

This sounds like a vibe coding side project. And I'm sorry, but whatever he builds will most likely become tech debt that has to be rewritten at some point.


Or to steel-man it, it could also end up as a prototype that forced the end user to deal with decision points, and can serve as a framework for a much more specific requirements discussion.

That's a good point

Exactly -- vibe coded PoC becomes a living spec for prod

We perpetually find worse and more expensive ways to reinvent Microsoft Access.

Interesting comment. Which ways have people been doing this?

At a certain scale the CEO's time is likely better spent dictating the dashboard they want rather than implementing it themselves. But I guess to your point, the future may allow for the dictation to be the creation.

Agree, as engineers we should be making the car easier to operate instead of making everyone a mechanic.

Focus on the simple iteration loop of "why is it so hard to understand things about our product?" maybe you cant fix it all today but climb that hill more instead of make your CEO spend some sleepless nights on a thing that you could probably build in 1/10th the time.

If you want to be a successful startup saas sw eng then engaging with the current and common business cases and being able to predict the standard cache of problems they're going to want solved turns you from "a guy" to "the guy".


Most engineers like being mechanics though.

All tech problems are actually people problems.

once the Csuite builds their own dashboards, they quickly decide what they actually need versus what is a nice to have.


And I wonder if they will discover that in order to interpret those numbers in a lot of cases they will need to bring in their direct reports to contextualise them.

If corporate decisions could be made purely from the data recorded then you don't need people to make those decisions. The reason you often do is that a lot of the critical information for decision making is brought in to the meeting out-of-band in people's heads.


Totally!

I have also seen multiple similar use cases where non-technical users build internal tools and dashboards on top of existing data for our users (I'm building UI Bakery). This approach might feel a bit risky for some developers, but it reduces the number of iterations non-technical users need with developers to achieve what they want.


> No updates yet, but I have every confidence he’ll figure it out.

"It" being "that it's harder than it looks"?


> "It" being "that it's harder than it looks"?

Honestly, I'm not sure what to expect. There are clearly things he can't do (e.g. to make it work in prod, it needs to be in our environment, etc. etc.) but I wouldn't be at all surprised if he makes great headway. When he first asked me about it, I started typing out all the reasons it was a bad idea - and then I paused and thought, you know, I'm not here to put barriers in his path.


Update us when you have an actual success story.

> Like, is there truly an agentic way to go 10x or is there some catch?

Yes. I think it’s practice. I know this sounds ridiculous, but I feel like I have reached a kind of mind meld state with my AI tooling, specifically Claude Code. I am not really consciously aware of having learned anything related to these processes, but I have been all in on this since ChatGPT, and I honestly think my brain has been rewired in a way that I don’t truly perceive except in terms of the rate of software production.

There was a period of several months a while ago where I felt exhausted all the time. I was getting a lot done, but there was something about the experience that was incredibly draining. Now I am past that and I have gone to this new plateau of ridiculous productivity, and a kind of addictive joy in the work. A marvellous pleasure at the orchestration of complex tasks and seeing the results play out. It’s pure magic.

Yes, I know this sounds ridiculous and over-the-top. But I haven’t had this much fun writing software since my 20s.


> Yes, I know this sounds ridiculous and over-the-top.

in that case you should come with more data. tell us how you measured your productivity improvement. all you've said here is that it makes you feel good


What's worked best with Gemini such I made a DSL that transpiles to C with CUDA support to train small models in about 3 hours... (all programs must run against an image data set, must only generate embeddings)

Do not; vibe code from top down (ex. Make me a UI with React, with these buttons and these behaviors to each button)

Do not; chat casually with it. (ex. I think it would look better if the button was green)

Do; constrain phrasing to the next data transform goal (ex. You must add a function to change all words that start with lowercase to start with uppercase)

Do; vibe code bottom up (ex. You must generate a file with a function to open a plaintext file and appropriate tests; now you must add a function to count all words that begin with "f")

Do; stick to must/should/may (ex. You must extend the code with this next function)

Do; constrain it to mathematical abstractions (ex. sys prompt: You must not use loops, you must only use recursion and functional paradigms. You must not make up abstractions and stick to mathematical objects and known algorithms)

Do; constrain it to one file per type and function. This makes it quick to review, regenerate only what needs to change.

Using those patterns, Gemini 2.5 and 3 have cranked out banging code with little wandering off in the weeds and hallucinating.

Programming has been mired in made up semantics of the individual coder for the luls, to create mystique and obfuscate the truth to ensure job security; end of the day it's matrix math and state sync between memory and display.


THis is remarkably similar to the process we had to follow a couple of decades ago, when offshoring to IT mills: spell out every little detail in small steps, iterate often, and you'll usually get most of what you want.

Awesome comment, thank you. No idea why it was flagged as dead. Vouched for it to not be.

This. I find constraints to be very important. It's fairly obvious an llm can tackle a class or function. It's still up to the human to string it all together. I'm not quite sure how long that will last though. Seems more of an engineering problem to me. At the end of the day you absolutely can get good outputs from these things if you provide the proper input. Everything else is orchestration.

Work that would have taken me 1-2 weeks to complete, I can now get done in 2-3 hours. That's not an exaggeration. I have another friend who is as all-in on this as me and he works in a company (I work for myself, as a solo contractor for clients), and he told me that he moved on to Q1 2026 projects because he'd completed all the work slated for 2025, weeks ahead of schedule. Meanwhile his colleagues are still wading through scrum meetings.

I realize that this all sounds kind of religious: you don't know what you're missing until you actually accept Jesus's love, or something along those lines. But you do have to kinda just go all-in to have this experience. I don't know what else to say about it.


My sympathies go out to the friend's coworkers. They are probably wading through a bunch of stuff right now, but given the context you have given us, its probably not "scrum meetings"..

I don't even care about the llm, I just want the confidence you have to assess that any given thing will take N weeks. You say 1-2 weeks.. thats like a big range! Something that "would" take 1 week takes ~2 hours, something that "would" take 2 weeks also takes ~2 hours. How does that even make sense? I wonder how long something that would of taken three weeks would take?

Do you still charge your clients the same?


> They are probably wading through a bunch of stuff right now, but given the context you have given us, its probably not "scrum meetings"..

This made me laugh. Fair enough. ;)

In terms of the time estimations: if your point is that I don't have hard data to back up my assertions, you're absolutely correct. I was always terrible at estimating how long something would take. I'm still terrible at it. But I agree with the OP. I think the labour required is down 90%.

It does feel to me that we're getting into religious believer territory. There are those who have firsthand experience and are all-in (the believers), there are those who have firsthand experience and don't get it (the faithless), and there are those who haven't tried it (the atheists). It's hard to communicate across those divides, and each group's view of the others is essentially, "I don't understand you".


Religions are about faith, faith is belief in the absence of evidence. Engineering output is tangible and measurable, objectively verifiable and readily quantifiable (both locally and in terms of profits). Full evidence, testable assertions, no faith required.

Here we have claims of objective results, but also admissions we’re not even tracking estimations and are terrible at making them when we do. People are notoriously bad at estimating actual time spent versus output, particularly when dealing with unwanted work. We’re missing the fundamental criteria of assessment, and there are known biases unaccounted for.

Output in LOC has never been the issue, copy and paste handles that just fine. TCO and holistic velocity after a few years is a separate matter. Masterful orchestration of agents could include estimation and tracking tasks with minimal overhead. That’s not what we’re seeing though…

Someone who has even a 20% better method for deck construction is gonna show me some timetables, some billed projects, and a very fancy new car. If accepting Mothra as my lord and saviour is a prerequisite to pierce an otherwise impenetrable veil of ontological obfuscation in order to see the unseeable? That deck might not be as cheap as it sounds, one way or the other.

I’m getting a nice learning and productivity bump from LLMs, there are incredible capabilities available. But premature optimization is still premature, and claims of silver bullets are yet to be demonstrated.


Here's an example from this morning. At 10:00 am, a colleague created a ticket with an idea for the music plugin I'm working on: wouldn't it be cool if we could use nod detection (head tracking) to trigger recording? That way, musicians who use our app wouldn't need a foot switch (as a musician, you often have your hands occupied).

Yes, that would be cool. An hour later, I shipped a release build with that feature fully functional, including permissions plus a calibration UI that shows if your face is detected and lets you adjust sensitivity, and visually displays when a nod is detected. Most of that work got done while I was in the shower. That is the second feature in this app that got built today.

This morning I also created and deployed a bug fix release for analytics on one platform, and a brand-new report (fairly easy to put together because it followed the pattern of other reports) for a different platform.

I also worked out, argued with random people on HN and walked to work. Not bad for five hours! Do I know how long it would have taken to, for example, integrate face detection and tracking into a C++ audio plugin without assistance from AI? Especially given that I have never done that before? No, I do not. I am bad at estimating. Would it have been longer than 30 minutes? I mean...probably?


Just having a 'count-in' type feature for recording would be much much more useful. Head nodding is something I do all the time anyway as a musician :).

I don't know what your user makeup is like, but shipping a CV feature same day sounds so potentially disastrous.. There are so many things I would think you would at least want to test, or even just consider with the kind of user emapthy we all should practice.


I appreciate this example. This does seem like a pretty difficult feature to build de novo. Did you already have some machine vision work integrated into your app? How are you handling machine vision? Is it just a call to an LLM API? Or are you doing it with a local model?

There was no machine vision stuff in the app at that point. Claude suggested a couple of different ways of handling this and I went with the easiest way: piggybacking on the Apple Vision Framework (which means that this feature, as currently implemented, will only work on Macs - I'm actually not sure if I will attempt a Windows release of this app, and if I do, it won't be for a while).

Despite this being "easier" than some of the alternatives, it is nonetheless an API I have zero experience with, and the implementation was built with code that I would have no idea how to write, although once written, I can get the gist. Here is the "detectNodWithPitch" function as an example (that's how a "nod" is detected - the pitch of the face is determined, and then the change of pitch is what is considered a nod, of course, this is not entirely straightforward).

```

- (void)detectNodWithPitch:(float)pitch { // Get sensitivity-adjusted threshold // At sensitivity 0: threshold = kMaxThreshold degrees (requires strong nod) // At sensitivity 1: threshold = kMaxThreshold - kThresholdRange degrees (very sensitive) float sens = _cppOwner->getSensitivity(); float threshold = NodDetectionConstants::kMaxThreshold - (sens * NodDetectionConstants::kThresholdRange);

    // Debounce check
    NSTimeInterval now = [NSDate timeIntervalSinceReferenceDate];
    if (now - _lastNodTime < _debounceSeconds)
        return;

    // Initialize baseline if needed
    if (!_hasBaseline)
    {
        _baselinePitch = pitch;
        _hasBaseline = YES;
        return;
    }

    // Calculate delta: positive when head tilts down from baseline
    // (pitch increases when head tilts down, so delta = pitch - baseline)
    float delta = pitch - _baselinePitch;

    // Update nod progress for UI meter
    // Normalize against a fixed max (20 degrees) so the bar shows absolute head movement
    // This allows the threshold line to move with sensitivity
    constexpr float kMaxDisplayDelta = 20.0f;
    float progress = (delta > 0.0f) ? std::min(delta / kMaxDisplayDelta, 1.0f) : 0.0f;
    _cppOwner->setNodProgress(progress);

    if (!_nodStarted)
    {
        _cppOwner->setNodInProgress(false);

        // Check if nod is starting (head tilting down past nod start threshold)
        if (delta > threshold * NodDetectionConstants::kNodStartFactor)
        {
            _nodStarted = YES;
            _maxPitchDelta = delta;
            _cppOwner->setNodInProgress(true);
            DBG("HeadNodDetector: Nod started, delta=" << delta);
        }
        else
        {
            // Adapt baseline slowly when not nodding
            _baselinePitch = _baselinePitch * (1.0f - _baselineAdaptRate) + pitch * _baselineAdaptRate;
        }
    }
    else
    {
        // Track maximum delta during nod
        _maxPitchDelta = std::max(_maxPitchDelta, delta);

        // Check if head has returned (delta decreased below return threshold)
        if (delta < threshold * _returnFactor)
        {
            // Nod complete - check if it was strong enough
            if (_maxPitchDelta > threshold)
            {
                DBG("HeadNodDetector: Nod detected! maxDelta=" << _maxPitchDelta << " threshold=" << threshold);
                _lastNodTime = now;
                _cppOwner->handleNodDetected();
            }
            else
            {
                DBG("HeadNodDetector: Nod too weak, maxDelta=" << _maxPitchDelta << " < threshold=" << threshold);
            }

            // Reset nod state
            _nodStarted = NO;
            _maxPitchDelta = 0.0f;
            _baselinePitch = pitch;  // Reset baseline to current position
            _cppOwner->setNodInProgress(false);
            _cppOwner->setNodProgress(0.0f);
        }
    }
}

@end

```


> An hour later, I shipped a release build

I would love to see that pull request, and how readable and maintainable the code is. And do you understand the code yourself, since you've never done this before?


I think you have to make a distinction between indvidual experience and claims about general truths.

If I know someone as an honest and serious professional, and they tell me that some tool has made them 5x or 10x more productive, then I'm willing to believe that the tool really did make a big difference for them and their specific work. I would be far more sceptical if they told me that a tool has made them 10% more productive.

I might have some questions about how much technical debt was accumulated in the process and how much learning did not happen that might be needed down the road. How much of that productivity gain was borrowed from the future?

But I wouldn't dismiss the immediate claims out of hand. I think this experience is relevant as a starting point for the science that's needed to make more general claims.

Also, let's not forget that almost none of the choices we make as software engineers are based on solid empirical science. I have looked at quite a few studies about productivity and defect rates in software engineering projects. The methodology is almost always dodgy and the conclusions seem anything but robust to me.


Not to pick apart your analogy, but asserting that atheists haven't tried religion is misinformed.

Brain-rot can be associated with heavy LLM usage.

But then does this not give you pause, that it "feels religious"? Is there not some morsel of critical/rational interrogation on this? Aren't you worried about becoming perhaps too fundamentalist in your belief?

To extend the analogy: why charge clients for your labor anymore, which Claude can supposedly do in a fraction of the time? Why not just ask if they have heard the good word, so to speak?


So, you say that AI has made you "ridiculously faster", but then admit you've always been terrible at estimating how long something would take?

> It does feel to me that we're getting into religious believer territory. There are those who have firsthand experience and are all-in (the believers), there are those who have firsthand experience and don't get it (the faithless), and there are those who haven't tried it (the atheists). It's hard to communicate across those divides, and each group's view of the others is essentially, "I don't understand you".

What a total crock. Your prose reminds of of the ridiculously funny Mike Meyers in "The Love Guru".


If your work maps exceedingly well to the technology it is true, it goes much faster. Doubly so when you have enough experience and understanding of things to find its errors or suboptimal approaches and adjust it that much faster.

The second you get to a place where the mapping isn’t there though, it goes off rails quickly.

Not everyone programs in such a way that they may ever experience this but I have, as a Staff engineer at a large firm, run into this again and again.

It’s great for greenfield projects that follow CRUD patterns though.


this is just not a very interesting way to talk about technology. I'm glad it feels like a religious experience to you, I don't care about that. I care about reality

it seems to me if these things were real and repeatable there would be published traces that show the exact interactions that led to a specific output and the cost in time and money to get there.

do such things exist?


Assuming 40 hours a week of work time, you’re claiming a ~25x speed up, which is laughably absurd to me.

It will take you 2.5 months to accomplish what would have taken you five years, that is the kind of productivity increase you’re describing.

It doesn’t pass the smell test. I’m not sure that going from assembly to python would even have such a ludicrous productivity enhancement.


Nobody had a robust, empirical metric of programmer productivity. Nobody. Ticket count, function points, LoC, and others tell you nothing about the fitness of the product. It’s all feels.

ok, but there's a spectrum between fully reproducible empirical evidence and divine revelation. I'm not convinced it's impossible to measure productivity in a meaningful way, even if it isn't perfect. it at least seems better to try than... whatever this is

Just as an aside I also think I am way more productive now but a really convincing datapoint would be someone who does project work and now has 5x the hourly rate they had last year. If there are not plenty of people like this, it cannot be 10x

That's not a very convincing argument. Even if you can do 10x the work, that doesn't necessarily mean you can easily find customers ready to pay 5x the hourly rate.

Not everyone bills hourly. I mostly do fixed price contracts

I understood their comment as going from

$100 / hour * 100 hours

to

$100 / hour * 500 hours

not to

$500 / hour * 100 hours


Yeah the last one. The others would require 5x deal flow which LLMs might not help deliver at all. But the last one should exist for people if 10x is true. Not every client can have already fully price in LLM improvements, people have contracts negotiated pre LLM. I have not heard of this though so I have to remain sceptical

But it specifically mentions having 5x the previous hourly rate.

> Yes, I know this sounds ridiculous and over-the-top. But I haven’t had this much fun writing software since my 20s.

But...you're not writing it. The culmination of many sites, many people, Stack Overflow, etc. all wrote it through the filtering mechanism being called AI.

You didn't write a damn thing.


Lol that's like saying that because you found the solution on stack overflow you didn't write the program

News flash buddy: YOU never wrote any code yourself either. Literally every single line of code you've ever committed to a repo was first written by someone else and you just copied it and modified it a little.


That's really interesting.

May I ask what kinds of projects, stack and any kind of markdown magic you use?

And any specific workflow? And are there times when you have to step in manually?


Currently three main projects. Two are Rails back-ends and React front-ends, so they are all Ruby, Typescript, Tailwind, etc. The third is more recent, it's an audio plugin built using the JUCE framework, it is all C++. This is the one that has been blowing my mind the most because I am an expert web developer, but the last time I wrote a line of C++ was 20 years ago, and I have zero DSP or math skills. What blows my mind is that it works great, it's thread safe and performant.

In terms of workflow, I have a bunch of custom commands for tasks that I do frequently (e.g. "perform code review"), but I'm very much in the loop all the time. The whole "agent can code for hours at a time" thing is not something I personally believe. It depends on the task how involved I get, however. Sometimes I'm happy to just let it do work and then review afterwards. Other times, I will watch it code and interrupt it if I am unhappy with the direction. So yes, I am constantly stepping in manually. This is what I meant about "mind meld". The agent is not doing the work, I am not doing the work, WE are doing the work.


I maintain a few rails apps and Claude Code has written 95% of the code for the last 4 months. I deploy regularly.

I make my own PRs then have Copilot review them. Sometimes it finds criticisms, and I copy and paste that chunk of critique into Claude Code, and it fixes it.

Treat the LLMs like junior devs that can lookup answers supernaturally fast. You still need to be mindful of their work. Doubtful even. Test, test, test.


Can we see any of this software created by this amazing LLMs?

Why do you need to use Tailwind if the code is generated? Can't there be something more efficient?

Extensive tailwind training data in the models. Sure there's something more efficient but it's just safer to let the model leverage what it was trained on.

Surely there is an order of magnitude more training data on plain CSS than tailwind, right?

In my experience the LLMs work better with frameworks that have more rigid guidance. Something like Tailwind has a body of examples that work together, language to reason about the behavior needed, higher levels of abstraction (potentially), etc. This seems to be helpful.

The LLMs can certainly use raw CSS and it works well, the challenge is when you need consistent framing across many pages with mounting special cases, and the LLMs may make extrapolate small inconsistencies further. If you stick within a rigid framework, the inconsistencies should be less across a larger project (in theory, at least).


Research -> Plan -> Implement

Start by having the agent ask you questions until it has enough information to create a plan.

Use the agent to create the plan.

Follow the plan.

When I started, I had to look at the code pretty frequently. Rather than fix it myself, I spent time thinking about what I could change in my prompts or workflow.


I’m super cautious with these messages like I’m sure we all are but on Monday I ordered a printer from Amazon. They said it would arrive on Wednesday. On Wednesday I was working from home and I got a text from “Purolator” saying they’d tried to deliver my package and failed. Shit! I’d been listening to beats too loud to hear the knock on the door! I ran outside to see if the delivery guy was still on my street. No one was around…and then I realized, damn, they got me (to dash outside, anyway).

These things can fail 99.99% of the time but when they land on someone at just the right moment, it’s so easy to just go on autopilot and do the dumb thing.


I had an issue on the toll payment device on my car, so I was expecting some 'pay now or you get a fine' message. I got one on my phone, but when I logged in directly to the toll company website my account was in the green. I was _so_ close to following the link I just got lucky that I prefer using my laptop for admin rather than my phone.

Anecdotally, I swear I see an increase in those messages when I have a package on the way. It seems like too much to be a coincidence.

Exactly. Once I was connecting to my VPN in AWS and was totally prepared for 90% of the websites to throw human verification at me. Then a faked cloudflare one almost got me. It was 3AM and my brain was barely functioning. (it didn't work, only because it instructed me to run a PowerShell command and I was on macOS).

Yep when a scam randomly aligns with something you’re expecting it’s much easier to fall into the trap.

I’m starting to believe there are situations where the human code review is genuinely not necessary. Here’s a concrete example of something that’s been blowing my mind. I have 25 years of professional coding experience but it’s almost all web, with a few years of iOS in the objective C era. I’m also an amateur electronic musician. A couple of weeks ago I was thinking about this plugin that I used to love until the company that made it went under. I’ve long considered trying to make a replacement but I don’t know the first thing about DSP or C++.

You know where this is going. I asked Claude if audio plugins were well represented in its training data, it said yes, off I went. I can’t review the code because I lack the expertise. It’s all C++ with a lot of math and the only math I’ve needed since college is addition and calculating percentages. However, I can have intelligent discussions about design and architecture and music UX. That’s been enough to get me a functional plugin that already does more in some respects than the original. I am (we are?) making it steadily more performant. It has only crashed twice and each time I just pasted the dump into Claude and it fixed the root cause.

Long story short: if you can verify the outcome, do you need to review the code? It helps that no one dies or gets underpaid if my audio plugin crashes. But still, you can’t tell me this isn’t remarkable. I think it’s clear there will be a massive proliferation of niche software.


I don’t think I’ve ever seen someone seriously argue that personal throwaway projects need thorough code reviews of their vibe code. The problem comes in when I’m maintaining a 20 year old code base used by anywhere from 1M to 1B users.

In other words you can’t vibe code in an environment where evaluating “does this code work” is an existential question. This is the case where 7k LOC/day becomes terrifying.

Until we get much better at automatically proving correctness of programs we will need review.


My point about my experience with this plugin isn’t that it’s a throwaway or meaningless project. My point is that it might be enough in some cases to verify output without verifying code. Another example: I had to import tens of thousands of records of relational data. I got AI to write the code for the import. All I verified was that the data was imported correctly. I didn’t even look at the code.

In this context I meant throwaway as "low stakes" not "meaningless". Again, evaluating the output of a database import like that could be existensial for your company given the context. Not to mention there's many cases where evaluating the output isn't feasible for a human.

Human code review does not prove correctness. Almost every software service out there contains bugs. Humans have struggled for decades to reliably produce correct software at scale and speed. Overall, humans have a pretty terrible track record of producing bug-free correct code no matter how much they double-check and review their code along the way.

So the solution is to stop doing code reviews and just YOLO-merge everything? After all, everything is fucked already, how much worse could it get?

For the record, there are examples where human code review and design guidelines can lead to very low-bug code. NASA published their internal guidelines for producing safety-critical code[1]. The problem is that the development cost of software when using such processes is too high for most companies, and most companies don't actually produce safety-critical software.

My experience with the vast majority of LLM code submitted to projects I maintain is that it has subtle bugs that I managed to find through fairly cursory human review. The copilot code review feature on GitHub also tends to miss actual bugs and report nonexistent bugs, making it worse than useless. So in my view, the death of the benefits of human code review have been wildly exaggerated.

[1]: https://en.wikipedia.org/wiki/The_Power_of_10:_Rules_for_Dev...


No, that's not what I wrote, and it's not the correct conclusion. What I wrote (and what you, in fact, also wrote) is that in reality we generally do not actually need provably correct software except in rare cases (e.g., safety-critical applications). Suggesting that human review cannot be reduced or phased out at all until we can automatically prove correctness is wrong, because fully 100% correct and bug-free software is not needed for the vast majority of code being produced. That does not mean we immediately throw out all human review, but the bar for making changes for how we review code is certainly much lower than the above poster suggested.

I don't really buy your premise. What you're suggesting is that all code has bugs, and those bugs have equal severity and distribution regardless of any forethought or rigor put into the code.

You're right, human review and thorough design are a poor approximation of proving assumptions about your code. Yes bugs still exist. No you won't be able to prove the correctness of your code.

However, I can pretty confidently assume that malloc will work when I call it. I can pretty confidently assume that my thoroughly tested linked list will work when I call it. I can pretty confidently assume that following RAII will avoid most memory leaks.

Not all software needs meticulous careful human review. But I believe that the compounding cost of abstractions being lost and invariants being given up can be massive. I don't see any other way to attempt to maintain those other than human review or proven correctness.


I did suggest all code has bugs (up to some limit -- while I wasn't careful to specify this, as discussed above, there does exist an extraordinary level of caution and review that if used can approximate perfect bug-free code, as in your malloc example and in the example of NASA, but that standard is not currently applied to 99.9% of human-generated and human-reviewed code, and it doesn't need to be). I did not suggest anything else you said I suggested, so I'm not sure why you made those parts up.

"Not all software needs meticulous careful human review" is exactly the point. The question of exactly what software needs that kind of review is one whose answer I expect to change over the next 5-10 years. We are already at the point where it's so easy to produce small but highly non-trivial one-off applications that one needn't examine the code at all -- I completely agree with the above poster that we're rapidly discovering new examples of software development where output-verification is all you need, just like right now you don't hand-inspect the machine code generated by your compiler. The question is how far that will be able to go, and I don't think anybody really knows right now, except that we are not yet at the threshold. You keep bringing up examples where the stakes are "existential", but you're underestimating how much software development does not have anything close to existential stakes.


I agree that's remarkable, and I do expect a proliferation of LLM-assisted development in similar niches where verification is easy and correctness isn't critical. But I don't think most software developers today are in such niches.

These stories never fail to astonish me. Why the same deity? It’s so interesting.

The fact the mind is able to create these powerful visions and patterns and other realities is really incredible. We have this machinery for perceiving the world and moving though it, but that machinery is capable of so many other insane and beautiful and terrifying things - capabilities which are inaccessible except in rare instances.

It’s really quite remarkable. Underneath our prosaic experience of consciousness is something that can generate infinite fractals, awe-inspiring visions of otherworldly creatures, dream landscapes of colour and shape. Why? Where does it all come from? Is this what life would be like all the time without us filtering the information coming into our senses?


The night hag comes to mind, a cross-cultural supernatural creature with a mundane physiological origin:

https://en.wikipedia.org/wiki/Night_hag

So, some common sensory interference might suggest many-limbed things, maybe. Like how LSD makes things wobble and crawl about.


May I suggest "Man And His Symbols" by Car Jung? It was his final writing and, I believe, his only one that focused on the common(ish) reader as the audience. The basis of the book (and generally his studies and beliefs) is that the subconscious is as meaningful as the conscious, it just communicates in ways that are harder to access in modern society, and therefore it's been pushed away and ignored.


Absolutely ground breaking and mind shattering book!


Don't throw away what's working for you just because some other company (temporarily) leapfrogs Anthropic a few percent on a benchmark. There's a lot to be said for what you're good at.

I also really want Anthropic to succeed because they are without question the most ethical of the frontier AI labs.


Aren’t they pursuing regulatory capture for monopoly like conditions? I can’t trust any edge in consumer friendliness when those are their longer term goal and tactics they employ today toward it. It reeks of permformativity


> I also really want Anthropic to succeed because they are without question the most ethical of the frontier AI labs.

I wouldn't call Dario spending all this time lobbying to ban open weight models “ethical”, personally but at least he's not doing Nazi signs on stage and doesn't have a shady crypto company trying to harvest the world's biometric data, so it may just be the bar that is low.


I can’t speak to his true motives but there are ethical reasons to oppose open weights. Hinton is an example of a non-conflicted advocate for that. If you believe AI is a powerful dual use tech technology like nuclear, open weights are a major risk.


My home town of Hamilton, Ontario (population 560k) recently made the news because a guy stole a bus, with passengers onboard, and started driving it through the city. It was newsworthy because he also dropped people off at their stops, and even rejected someone who tried to board with an expired bus pass. But what stood out for me in addition to all that was the police response. They quietly followed the bus, intentionally not using sirens to avoid “spooking” the guy. They waited for the right moment, boarded the bus and arrested him peacefully and without incident.

I recognize my little city is not like LA (which I’ve visited twice) - the types of crimes, the types of criminals and the prevalence of weapons are far different, although we also have our share of gun violence and murder. But we have also not militarized our police, and there’s very much a police culture of service to the community. Here, when a cop uses their weapon, it’s seen as a failure. This was a situation handled properly, and it made me proud.


I'm Canadian and American, and have lived in both places and seen the stark differences myself. In the US, the police culture is certainly militarized and proud of it. Even in small towns you have days where the police roll out the biggest armored vehicles they have to show off, and that's their idea of a "community event", kids think its cool obviously, but it's really just "lets show off all of our high power toys".


Those high-powered toys by the cops are merely for showing off and to victimize the weak. Those toys typically never come into play to protect the citizens.

Case in point: during the Uvalde school shooting incident in 2022, when a shooter (Salvador Ramos) went on a killing spree inside the school, then hundreds of cops gathered outside with brand new body armor (gifted to them just months ago) and armed with automatic guns, but they never dared to go inside to tackle the shooter. Not only that, those cowardly cops actively prevented parents and state patrol officers from going in to rescue their kids. The cowardly cops were led by a cowardly police chief, who later gave excuses for the delayed response to the deadly situation and his mishandling of the police force, by claiming to have forgotten his walkie talkie!

Ultimately one of the border patrol officers and some US deputy marshalls (who had travelled 70 miles to reach the scene after getting an alert) managed to sneak in to the back, break the locked door, and used a tactical shield to corner and finally kill the shooter, thus ending his bloodbath (19 children and 2 teachers were tragically killed).

And if you think arming cowardly showoff cops with guns and armor is useless and potentially dangerous, you should know the Uvalde school shooter was a minor but he managed to buy the guns legally from a gun shop on credit!

That's how lax and evil the gun laws and resulting shootouts in USA are.

USA has more mass shootings and more school shootings than any other place in the world.

No wonder they facilitate and glorify high-speed car chases. It is all a thrillride for these adrenaline junkies high on power.


You forgot the most insane part of this (or at least of the aftermath) - the police chief was re-elected shortly after!


```you should know the Uvalde school shooter was a minor but he managed to buy the guns legally from a gun shop on credit!```

That does not appear to be true. The investagiom reporting shows that the shooter bought the guns after he turned 18 - the legal age to purchase them (long guns, aka rifles - different from pistols) in the state of Texas.

Buying things on credit seems like a reasonable way to do business in general - are you suggesting that all deadly weapons should be sold for cash to increase the difficulty of legally acquiring them and so lowering the frequency of mass shootings?


In my country, no firearm can be issued to any civilian (certainly not a minor), without verification and license from police.

In Texas, there is no minimum age for purchasing ammunition beyond federal limits, no requirement for an ammunition seller to keep a record of the purchaser, and no specific license to buy or sell ammunition, according to the Giffords Law Center.

https://www.kxan.com/investigations/uvalde-shooter-had-1600-...

Salvador Ramos, the Uvalde school shooter, legally purchased two AR platform rifles Ramos got his guns legally through Oasis Outback, a Uvalde sporting goods store and federal firearms licensee, according to published reports. He also purchased hundreds of rounds of ammunition, on his 18th birthday.

https://edition.cnn.com/2022/05/25/us/uvalde-texas-school-sh...

I know the USA has a bad habit of buying things on credit, but firearms & ammo should never be allowed to be purchased on credit. Let it be purchased only after a verification and license from police, and only via debit card or bank transaction with proper legal paper trail, not credit or cash. And any firearm and ammo purchase should be ratified with local police, so they know if someone is making a suspicious purchase.


Reminds me of the story where two guys went for a joyride in a Tram in Braunschweig (DE). They boarded a tram during the night, drove for a few stops (including letting passengers board & leave) and left the tram there.

The funniest part of the story is that they didn't commit any crime and were let go.

Story here (in German): https://www.spiegel.de/panorama/justiz/braunschweig-junge-ma...


> Maybe they were new, or maybe they hadn't slept much because of a newborn baby

Reminds me of House of Dynamite, the movie about nuclear apocalypse that really revolves around these very human factors. This outage is a perfect example of why relying on anything humans have built is risky, which includes the entire nuclear apparatus. “I don’t understand why X wasn’t built in such a way that wouldn’t mean we live in an underground bunker now” is the sentence that comes to mind.


Anthropic has a fairly significant lead when it comes to enterprise usage and for coding. This seems like a workable business model to me.


I feel this is a tenuous position though. I find it incredibly easy to switch to Gemini CLI when I want a second opinion, or when Claude is down.


The enterprise sales cycle is often quite long, though, and often includes a lot of hurdles around compliance, legal, etc. It would take a fairly sustained loss of edge before a lot of enterprises would switch once they're hooked into a given platform. It's interesting to me that Sonnet 4.5 still edges Gemini 3 on SWE bench. This seems to bode well for the trajectory that Anthropic is on.


Related situation: you're at a family gathering and everyone has young kids running around. You hear a thump, and then some kid starts screaming. Conversation stops and every parent keenly listens to the screams to try and figure out whose kid just got hurt, then some other parent jumps up - it's not your kid! #phewphoria


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: