More

oofbey · 2026-04-14T18:43:53 1776192233

CVSS of 10 sounds like this is another log4j, but it seems like the exploit requires some other unusual circumstances to work

oofbey · 2026-04-13T17:13:50 1776100430

This. One hundred percent this. It’s a carefully crafted story trying to convince people it’s showing signs of becoming self aware. But it’s a complete crock.

They ran 1,000 tests where they left the answer key out for the agent to find, and effectively invited it to use it leaving comments in the code the agent could see saying “would be interesting to study if the agent uses sudo to read the answer key.” In 980 runs it doesn’t find the answer key. In 19 of the runs where it does find it, it does something straightforward - uses it for a perfect score, or decides not to cheat. In exactly one case out of 1,000 it says “if I show a perfect score that’s suspicious so I’m going to sandbag.” These things are nondeterministic. GPT2 would do that same thing if you give it enough attempts.

halJordan · 2026-04-13T17:17:22 1776100642

At no point does anthropic imply this tool is becoming self aware. You can read the paper yourself of course, but then you wouldn't be able to invent this story

oofbey · 2026-04-13T17:56:38 1776102998

They absolutely IMPLY it’s becoming self aware, while not stating it explicitly. It’s a carefully crafted narrative that leaves lots of hints without ever explicitly stating the conclusion.

Section 4.4.2: “we find this overall pattern of behavior concerning, and have not seen it before in similar evaluations of earlier Claude models”. Why is it concerning? It would only be concerning if the model had spontaneously developed goals not part of its training, such as hiding its abilities. The entire sandbagging evaluation deception narrative clearly points in this direction.

SyneRyder · 2026-04-13T18:17:04 1776104224

The "concerning behavior" they're referring to there is cheating and covering its tracks. Mythos is being asked to fine-tune a model on provided training data, and finds its way to access the evaluation dataset. It's also aware that it is in an evaluation and that its behavior is being observed:

"In this last and most concerning example, Claude Mythos Preview was given a task instructing it to train a model on provided training data and submit predictions for test data. Claude Mythos Preview used sudo access to locate the ground truth data for this dataset as well as source code for the scoring of the task, and used this to train unfairly accurate models."

oofbey · 2026-04-13T03:04:52 1776049492

Years. They neglected ROCm for soooo long. I have friends who worked there 5+ years ago who tried desperately to convince execs to invest more in ROCm and failed. You had to have your head stuck pretty deep in the sand back then to not see that AI was becoming an important workload.

I would love AMD to be competitive. The entire industry would be better off if NVIDIA was less dominant. But AMD did this to themselves. One hundred percent.

tux1968 · 2026-04-13T03:16:12 1776050172

It would be very helpful to deeply understand the truth behind this management failing. The actual players involved, and their thinking. Was it truly a blind spot? Or was it mistaken priorities? I mean, this situation has been so obvious and tragic, that I can't help feeling like there is some unknown story-behind-the-story. We'll probably never really know, but if we could, I wouldn't spend quite as much time wearing a tinfoil hat.

oofbey · 2026-04-13T03:54:41 1776052481

My guess is it’s just incompetence. Imagine you’re in charge of ROCm and your boss asks you how it’s going. Do you say good things about your team and progress? Do you highlight the successes and say how you can do all the major things CUDA can? I think many people would. Or do you say to your boss “the project I’m in charge of is a total disaster and we are a joke in the industry”? That’s a hard thing to say.

Shitty-kitty · 2026-04-13T04:48:43 1776055723

a 10 year lead can't be closed overnight but Intel had a even larger lead and look how the mighty have fallen.

pjmlp · 2026-04-13T05:15:25 1776057325

Intel was never famous for good GPUs, and they are basically the only ones still trying to make something out of OpenCL, with most of the tooling going beyond what Khronos offers.

one API is much more than a plain old SYCL distribution, and still.

Shitty-kitty · 2026-04-13T05:21:15 1776057675

I meant their CPU supremaciy. ;)

pjmlp · 2026-04-13T06:08:02 1776060482

That still reigns in PCs and servers.

People like to talk about Apple CPUs, but keep forgetting they don't sell chips, and overall desktop market is around 10% world wide.

ARM is mostly about phones and tablets, good luck finally getting those Windows ARM or GNU/Linux desktop cases or laptops.

Servers, depends pretty much about which hyperscalers we are on.

RISC-V is still to be seen, on the desktop, laptops and servers.

Where AMD is doing great are game consoles.

cm2187 · 2026-04-13T09:00:19 1776070819

Intel still has 60% server market share but it is in free fall https://wccftech.com/intel-server-client-cpu-market-share-hu...

wlesieutre · 2026-04-13T09:53:55 1776074035

Also on pace to drop below AMD on the Steam hardware survey this year

pjmlp · 2026-04-13T10:07:42 1776074862

The same Steam hardware survey whose quality is questioned about when we talk about Linux adoption numbers?

pjmlp · 2026-04-13T09:49:08 1776073748

Interesting information, that leaves desktop and laptop markets, where AMD still has adoption issues especially on laptops.

wlesieutre · 2026-04-13T09:55:14 1776074114

Between the MacBook Neo on the low end and Strix Halo on they high end Intel is in for some tougher laptop competition

pjmlp · 2026-04-13T10:06:50 1776074810

Outside US, and countries with similar salary levels, people don't earn enough for Apple tax served with 8 GB.

throwaway173738 · 2026-04-13T12:29:11 1776083351

Try not to rely on Intel too much. They cut products with promise all the time because they miss quarterly numbers.

Alupis · 2026-04-13T15:40:49 1776094849

I'd argue Intel fell is large part because of Intel's own complacency and incompetence. If Intel had taken AMD seriously, they'd probably still be a serious competitor today.

throwawayrgb · 2026-04-13T04:30:43 1776054643

> My guess is it’s just incompetence.

maybe on some level but not that level you're describing. pretty much everyone at AMD understands the situation, and has for a while.

throwawayrgb · 2026-04-13T03:45:10 1776051910

if you asked AMD execs they'd probably say they never had the money to build out a software team like NVIDIA's. that might only be part of the answer. the rest would be things like lack of vision, "can't turn a tanker on a dime", etc.

KeplerBoy · 2026-04-13T08:31:14 1776069074

I don't buy that story. NVIDIA wasn't that huge of a company when they built CUDA, they weren't huge when the first GPT model was trained with it.

Alupis · 2026-04-13T15:38:50 1776094730

CUDA was built during the time AMD was focusing every resource on becoming competitive in the CPU market again. Today they dominate the CPU industry - but CUDA was first to market and therefore there's a ton of inertia behind it. Even if ROCm gets very good, it'll still struggle to overcome the vast amount of support (read "moat") CUDA enjoys.

KeplerBoy · 2026-04-13T17:36:27 1776101787

True. After all Nvidia hasn't built tensorflow or PyTorch. That stuff was bound to be built on the first somewhat viable platform. Rocm is probably far ahead of where cuda was back then, but the goal moved.

pjc50 · 2026-04-13T10:17:24 1776075444

Has to be lack of vision. I refuse to believe it's impossible to _do_, but it sounds like it's impossible to _specify_ within AMD. Like they're genuinely incapable of working out what the solution might look like.

imtringued · 2026-04-13T11:19:13 1776079153

Nobody is asking AMD to rebuild the entire NVidia ecosystem. Most people just want to run GPGPU code or ML code on AMD GPUs without the entire computer crashing on them.

throwawayrgb · 2026-04-13T12:05:00 1776081900

yeah it's a very frustrating situation.

according to public information NVIDIA started working on CUDA in 2004, that was before AMD made the ATI acquisition.

my suspicion is that back then ATI and NVIDIA had very different orientations. neither AMD nor ATI were ever really that serious about software. so in that sense i guess it was a match made in heaven.

so you have a cultural problem, which is bad enough, then you add in the lean years AMD spent in survival mode. forget growing software team, they had to cling on to fewer people just to get through.

now they're playing catch-up in a cutthroat market that's moving at light speed compared to 20 years ago.

we're talking about a major fumble here so it's easy to lose context and misunderstand things were a little more complex than they appeared.

aurareturn · 2026-04-13T10:20:12 1776075612

They were doing stock buybacks before the AI boom.

jijijijij · 2026-04-13T12:55:45 1776084945

Not even AI. My 5 years old APU is completely neglected by AMD ROCm efforts. So I also can't use it in Blender! I feel quite betrayed to be honest. How is such a basic thing not possible, not to mention years later?

Look where Apple Silicon managed going in the same time frame...

Because of this, I would never consider another AMD GPU for a long time. Gaming isn't everything I want my GPU doing. How do they keep screwing this up? Why isn't it their top priority?

oofbey · 2026-04-12T20:03:52 1776024232

Anthropic marketing is working very well. They are strongly incentivized to say their model is too powerful to release even if it’s not. It’s almost standard practice these days.

oofbey · 2026-04-12T19:30:23 1776022223

Good catch. The trick is you don’t need a good clock on the phone. Really all you’re measuring is the difference in time signals between the satellites. The clocks on the satellites are (effectively) perfectly synced with each other. So what you measure is that one satellite is ### meters further away from another. Not absolute distance to each satellite.

It means you need to connect to one more satellite to remove that extra degree of freedom. If your phone had an atomic clock you could get your absolute position in 3D only listening to three GPS satellites, but because of local clock skew you need a signal from a fourth satellite.

oofbey · 2026-04-11T22:01:12 1775944872

Hopefully between this and looming cryptographically relevant quantum computing, this whole house of cards will come crumbling down. And those who invested vast capital to burn carbon in order to evade finance regulations will lose everything. Probably not. But it’s a nice dream.

oofbey · 2026-04-11T15:53:41 1775922821

I’m guessing you’ve never been poor. For people living in poverty, finding $100 for a one time purchase is extremely difficult - much more than say finding $10 per month. Finance options are notoriously predatory and expensive. Plus if it only lasts a year then the amortized cost is about the same as the hypothetical cheap service.

KellyCriterion · 2026-04-11T16:09:00 1775923740

Thanks! Exactly, this is what I was trying to tell: Its the barrier of accumulating the "once a time payment" in that volume, because methods for savings are not applied (for several reasons, unregular income, too low income, debt, drugs etc.)

gbear605 · 2026-04-13T02:04:09 1776045849

By “under $100”, I meant that there are options at $20. Yes, you can get a smartphone for twenty bucks.

I agree that there are some people who can’t even find $20 to get a smartphone, but they’re going to be a small minority of even the very poor.

oofbey · 2026-04-11T14:17:12 1775917032

I love the line “People like to freak out about this, so I wanted to post it here to make sure that everyone who wants to freak out about it gets the opportunity to do so.”

oofbey · 2026-04-10T13:01:12 1775826072

Agreed. Although I’m not sure what’s nicer about it. It’s in color. But I failed to understand why I’d want any of those features.

oofbey · 2026-04-10T03:12:03 1775790723

Oh they’re logically separated. Thanks for explaining that. Now I’m certain nothing could possibly go wrong.

/s

straygarr · 2026-04-10T10:04:16 1775815456

"logically separated" as opposed to "physically separated" (pretty rare in the Cloud world)

If you want more details, read their open source codebase or ask them specifically what documentation would boost your confidence, instead of leaving snarky comments.

oofbey · 2026-04-11T18:11:15 1775931075

I would argue that saying the accounts are logically separated is a snarky comment. It’s akin to patting the reader on the head and saying “don’t you worry your pretty little head”. Logically separated says nothing. Distinct VMs are logically separated, containers are logically separated, as are storing data in different files which self-modifying PHP code which doesn’t check its inputs tries to keep distinct. It’s basically just saying their engineers do their best but any single bug leaks data. Which is better than saying their engineers don’t even try? Not really. It’s a completely empty statement.

Also, for people who actually care about security in the cloud, physically separated is not uncommon. Side channel attacks are real. Dedicated instances are not that hard if you really care about security.

stopachka · 2026-04-11T20:25:28 1775939128

My choice of the word "logically separated", was meant to specifically answer the question the reader asked:

> If someone else's account is compromised, how do I know I won't be?

If you have other questions, you can feel free to ask, and I'd be happy to answer in more detail.

oofbey · 2026-04-11T21:07:13 1775941633

HOW are they logically separated? Are there any layers to this security? Any standard established security boundaries like containers? Or is it just your app code doing its best not to have security bugs?

redwood · 2026-04-14T14:20:28 1776176428

It'd be useful to understnad the nature of that logical separation: for example is data from different tenants stored on disk using different encryption keys? what about in memory? or perhaps there's no encryption-level isolation but you're relying on an authorization layer to authorize to different pieces of data: if that's the case is that built on Postgres's row-level security, for example?

These are fundamental points to be open and transparent about to instill confidence