More

notnullorvoid · 2026-04-13T22:56:12 1776120972

Hell no, the current state of centralized AI is bad enough, socializing it won't make it better.

We need to let the AI as a service businesses fail.

JumpCrisscross · 2026-04-13T23:21:52 1776122512

But in the meantime you prefer privately-controlled monopsony datacenters?

notnullorvoid · 2026-04-14T03:34:21 1776137661

Yes I'd much rather big investment firms waste their money instead of government.

notnullorvoid · 2026-04-11T18:00:14 1775930414

> I hypothesize it will find the exploit, but it will also turn up so much irrelevant nonsense that it won't matter.

The trick with Mythos wasn't that it didn't hallucinate nonsense vulnerabilities, it absolutely did. It was able to verify some were real though by testing them.

The question is if smaller models can verify and test the vulnerabilities too, and can it be done cheaper than these Mythos experiments.

hibikir · 2026-04-11T19:49:26 1775936966

People often undervalue scaffolding. I was looking at a bug yesterday, reported by a tester. He has access to Opus, but he's looking through a single repo, and Amazon Q. It provided some useful information, but the scaffolding wasn't good enough.

I took its preliminary findings into Claude Code with the same model. But in mine it knows where every adjacent system is, the entire git history, deployment history, and state of the feature flags. So instead of pointing at a vague problem, it knew which flag had been flipped in a different service, see how it changed behavior, and how, if the flag was flipped in prod, it'd make the service under testing cry, and which code change to make to make sure it works both ways.

It's not as if a modern Opus is a small model: Just a stronger scaffold, along with more CLI tools available in the context.

The issue here in the security testing is to know exactly what was visible, and how much it failed, because it makes a huge difference. A middling chess player can find amazing combinations at a good speed when playing puzzle rush: You are handed a position where you know a decisive combination exist, and that it works. The same combination, however, might be really hard to find over the board, because in a typical chess game, it's rare for those combinations to exist, and the energy needed to thoroughly check for them, and calculate all the way through every possible thing. This is why chess grandmasters would consider just being able to see the computer score for a position to be massive cheating: Just knowing when the last move was a blunder would be a decisive advantage.

When we ask a cheap model to look for a vulnerability with the right context to actually find it, we are already priming it, vs asking to find one when there's nothing.

bredren · 2026-04-11T18:15:21 1775931321

The article positions the smaller models as capable under expert orchestration, which to be any kind of comparable must include validation.

Aurornis · 2026-04-11T18:19:40 1775931580

Calling it “expert orchestration” is misleading when they were pointing it at the vulnerable functions and giving it hints about what to look for because they already knew the vulnerability.

cyanydeez · 2026-04-11T18:55:36 1775933736

You know for loops exist and you can run opencode against any section of code with just a small amount of templating, right? There's zero stopping you from writing a harness that does what you're saying.

iririririr · 2026-04-11T18:10:21 1775931021

so it's just better at hallucinations, but they added discrete code that works as a fuzzer/verifier?

notnullorvoid · 2026-04-09T15:20:08 1775748008

> but they already made it too smart (Mythos).

It's largely a marketing tactic. It will be released, and it won't be long before other models show similar capabilities.

If they wanted they could add guardrails. The scales required to brute force search for vulnerabilities like they did would be very identifiable.

c3fxx · 2026-04-09T16:16:05 1775751365

Scam Altman already pulled this trick numerous times.

Whats wrong with people? Is it really that hard to see the truth?

notnullorvoid · 2026-04-08T16:46:53 1775666813

Did you verify it's the RCEs actually work, and weren't hallucinated?

notnullorvoid · 2026-04-07T19:00:32 1775588432

The argument against rejecting to cancel seems like a stretch to me. It's completely fine if you view cancellation as a error condition, it allows you to recover from a cancellation if you want (swallow the error w catch) or to propagate it.

notnullorvoid · 2026-04-07T18:26:15 1775586375

Hard disagree, TC39 has done great work over the last 10 years. To name a few: - Async/await - Rest/spread - Async iterators - WeakRefs - Explicit Resource Management - Temporal

It's decisions are much more well thought out than WHATWG standards. AbortSignal extending from EventTarget was a terrible call.

runarberg · 2026-04-07T20:28:34 1775593714

many things !== all the things

More good works from the last 10 years includes .at(), nullish chaining, BigInt etc.

But most of what you mentioned is closing in on 10 years in the standard (Async/Await is from 2017) meaning the bulk of the work done is from over 10 years ago.

The failure of AbortSignal is exactly the kind of failure TC39 has been doing in bulk lately. I have been following the proposal to add Observables to the language, which is a stage 1 proposal (and has been for over 10 years!!!). There were talks 5 years ago (!) to align the API with AbortSignal[1] which I think really exemplifies the inability for TC39 to reach a workable decision (at least as it operates now).

Another example I like to bring up are the failure of the pipeline operator[2], which was advanced to stage-2 four years ago and has been in hiatus ever since with very little work to show for it. After years of deliberation very controversal version of the operator with a massive community backlash. Before they advanced it it was one of the more popular proposals, now, not so much, and personally I sense any enthusiasm for this feature has pretty much vanished. In other words I think they took half a decade to make the obviously wrong decision, and have since given up.

From the failure of the pipeline operator followed a bunch of half-measures such as array grouping, and iterator helpers etc. which could have easily been implemented in userland libraries if the more functional version of the pipeline operator would have advanced.

1: https://github.com/tc39/proposal-observable/issues/209

2: https://github.com/tc39/proposal-pipeline-operator

notnullorvoid · 2026-04-03T14:41:39 1775227299

Regardless of setup the LLM shouldn't hallucinate tool use.

notnullorvoid · 2026-04-01T22:26:16 1775082376

Astro supports generating static html, so I suspect it'll work similarly where you can have some routes static and others dynamically created.

notnullorvoid · 2026-04-01T15:12:27 1775056347

The goal of web hosting is to provide low latency wide availability to many users.

AI in this context has a very different goal as a tool for individual users.

You wouldn't say that hosting instances of Photoshop on servers and charging for usage is a long term viable business would you? Even if current consumer computers struggled to run Photoshop.

sidrag22 · 2026-04-01T17:19:04 1775063944

I don't see an issue with the comparison, I don't think it is meant to be a 1 to 1 or anything, just an illustration of how consumers are overwhelmingly lazy.

I'd take issue with the statement that it is for the paranoid, but I guess it might be a defense mechanism because of course i am interested in local models. If my new workflow is going to be dependent on 3 companies, I'd prefer if there is a light at the end of the tunnel that breaks us free.

notnullorvoid · 2026-03-31T16:40:54 1774975254

It's interesting to see so many people agree with this perspective when it comes to articles yet disagree when it comes to writing software.

Perhaps some form of Gell-Mann Amnesia, people are better at recognizing good articles than they are at recognizing good software. Combined with a vibe coding effect of never actually reading the source, and thus recognizing how bad it is.