> In 1982, Prime Minister Yasuhiro Nakasone started to privatize the railways. Unlike other countries, Japan simply returned to the traditional private railway model of the nineteenth and early twentieth centuries: tracks, trains, stations, and yards were owned by vertically integrated regional conglomerates. There are substantial advantages to vertical integration. Railways are a closed system that has to be planned as a single unit. […]
This is a very interesting point, especially in light of another article discussed here a couple days ago[0] about why Switzerland has 25 Gbit/s internet and why the US and Germany don't. One of the main points of the article was that the fiber optics infrastructure is (or should be treated as) a natural monopoly:
> The rational solution is to build the infrastructure once, as a shared, neutral asset, and let different companies compete to provide the service over that infrastructure. That’s how water works. That’s how electricity works in most places. And in Switzerland, that’s how fiber optic internet works.
> > Why are you handwaving things away though? I've got you on max effort. I even patched the system prompts to reduce this.
In my experience, prompts like this one, which 1) ask for a reason behind an answer (when the model won't actually be able to provide one), 2) are somewhat standoff-ish, don't work well at all. You'll just have the model go the other way.
What works much better is to tell the model to take a step back and re-evaluate. Sometimes it also helps to explicitly ask it to look at things from a different angle XYZ, in other words, to add some entropy to get it away from the local optimum it's currently at.
> when the model won't actually be able to provide one
This is key. In my experience, asking an LLM why it did something is usually pointless. In a subsequent round, it generally can't meaningfully introspect on its prior internal state, so it's just referring to the session transcript and extrapolating a plausible sounding answer based on its training data of how LLMs typically work.
That doesn't necessarily mean the reply is wrong because, as usual, a statistically plausible sounding answer sometimes also happens to be correct, but it has no fundamental truth value. I've gotten equally plausible answers just pasting the same session transcript into another LLM and asking why it did that.
From early GPT days to now, best way to get a decently scoped and reasonably grounded response has always been to ask at least twice (early days often 7 or 8 times).
Because not only can it not reflect, it cannot "think ahead about what it needs to say and change its mind". It "thinks" out loud (as some people seem to as well).
It is a "continuation" of context. When you ask what it did, it still doesn't think, it just* continues from a place of having more context to continue from.
The game has always been: stuff context better => continue better.
Humans were bad at doing this. For example, asking it for synthesis with explanation instead of, say, asking for explanation, then synthesis.
You can get today's behaviors by treating "adaptive thinking" like a token budgeted loop for context stuffing, so eventually there's enough context in view to produce a hopefully better contextualized continuation from.
It seems no accident we've hit on the word "harness" — so much that seems impressive by end of 2025 was available by end of 2023 if "holding it right". If (and only if!) you are an expert in an area you need it to process: (1) turn thinking off, (2) do your own prompting to "prefill context", and (3) you will get superior final response. Not vibing, just staff-work.
---
* “just” – I don't mean "just" dismissively. Qwen 3.5 and Gemma 4 on M5 approaches where SOTA was a year ago, but faster and on your lap. These things are stunning, and the continuations are extraordinary. But still: Garbage in, garbage out; gems in, gem out.
> This is key. In my experience, asking an LLM why it did something is usually pointless. In a subsequent round, it generally can't meaningfully introspect on its prior internal state, so it's just referring to the session transcript and extrapolating a plausible sounding answer based on its training data of how LLMs typically work.
Yep, I've gotten used to treating the model output as a finished, self-contained thing.
If it needs to be explained, the model will be good at that, if it has an issue, the model will be good at fixing it (and possibly patching any instructions to prevent it in the future). I'm not getting out the actual reason why things happened a certain way, but then again, it's just a token prediction machine and if there's something wrong with my prompt that's not immediately obvious and perhaps doesn't matter that much, I can just run a few sub-agents in a review role and also look for a consensus on any problems that might be found, for the model to then fix.
> In a subsequent round, it generally can't meaningfully introspect on its prior internal state
It can't do any better in the moment it's making the choices. Introspection mostly amounts to back-rationalisation, just like in humans. Though for humans, doing so may help learning to make better future decisions in similar situations.
I don't understand why people don't just say "This is wrong. try again." or "This is wrong because xyz. try again." This anthropologizing by asking why seems a bit pointless when you know how LLMs work, unless you've empirically had better results from a specific make and version of LLM by asking why in the past. It's theoretically functionally equivalent to asking a brand new LLM instance with your chat history why the original gave such an answer...Do you want the correct result or do you actually care about knowing why?
>Introspection mostly amounts to back-rationalisation, just like in humans.
That's the best case scenario. Again, let's stop anthropologizing. The given reasons why may be incompatible with the original answer upon closer inspection...
I definitely do this, along with the compulsion sometimes to tell the agent how a problem was fixed in the end, when investigating myself after the model failing to do so. Just common courtesy after working on something together. Let’s rationalize this as giving me an opportunity to reflect and rubberduck the solution.
Regarding not just telling „try again“: of course you are right to suggest that applying human cognition mechanisms to llm is not founded on the same underlying effects.
But due to the nature of training and finetuning/rf I don’t think it is unreasonable that instructing to do backwards reflection could have a positive effect. The model might pattern match this with and then exhibit a few positive behaviors. It could lead it doing more reflection within the reasoning blocks and catch errors before answering, which is what you want. These will have attention to the question of „what caused you to make this assumption“, also, encouraging this behavior. Yes, both mechanisms are exhibited through linear forward going statical interpolation, but the concept of reasoning has proven that this is an effective strategy to arrive at a more grounded result than answering right away.
Lastly, back to anthro. it shows that you, the user, is encouraging of deeper thought an self corrections. The model does not have psychological safety mechanisms which it guards, but again, the way the models are trained causes them to emulate them. The RF primes the model for certain behavior, I.e. arriving at answer at somepoint, rather than thinking for a long time. I think it fair to assume that by „setting the stage“ it is possible to influence what parts of the RL activate.
While role-based prompting is not that important anymore, I think the system prompts of the big coding agents still have it, suggesting some, if slight advantage, of putting the model in the right frame of mind. Again, very sorry for that last part, but anthro. does seem to be a useful analogy for a lot of concepts we are seeing (the reason for this being in the more far of epistemological and philosophical regions, both on the side of the models and us)
> This is key. In my experience, asking an LLM why it did something is usually pointless.
That kind of strikes me as a huge problem. Working backwards from solutions (both correct and wrong) can yield pretty critical information and learning opportunities. Otherwise you’re just veering into “guess and check” territory.
That's good advice. I managed to get the session back on track by doing that a few turns later. I started making it very explicit that I wanted it to really think things through. It kept asking me for permission to do things, I had to explicitly prompt it to trace through and resolve every single edge case it ran into, but it seems to be doing better now. It's running a lot of adversarial tests right now and the results at least seem to be more thorough and acceptable. It's gonna take a while to fully review the output though.
It's just that Opus 4.6 DISABLE_ADAPTIVE_THINKING=1 doesn't seem to require me to do this at all, or at least not as often. It'd fully explore the code and take into account all the edge cases and caveats without any explicit prompting from me. It's a really frustrating experience to watch Anthropic's flagship subscription-only model burn my tokens only to end up lazily hand-waving away hard questions unless I explicitly tell it not to do that.
I have to give it to Opus 4.7 though: it recovered much better than 4.6.
Yeah for anyone seriously using these models I highly reccomend reading the Mythos system card, esp the sections on analyzing it's internal non verbalized states. Save a lot of head wall banging.
This is frankly one of the most frustrating things about LLMs: sometimes I just want to drive it into a corner. “Why the f** did you do X when I specifically told you not to?”
It never leads to anything helpful. I don’t generally find it necessary to drive humans into a corner. I’m not sure it’s because it’s explicitly not a human so I don’t feel bad for it, though I think it’s more the fact that it’s always so bland and is entirely unable to respond to a slight bit of negative sentiment (both in terms of genuinely not being able to exert more effort into getting it right when someone is frustrated with it, but also in that it is always equally nonchalant and inflexible).
You might be surprised how well 5.3-codex follows your instructions. When it hits a wall with your request, it usually emits the final turn and says it can’t do it.
> What works much better is to tell the model to take a step back and re-evaluate.
I desperately hate that modern tooling relies on “did you perform the correct prayer to the Omnissiah”
> to add some entropy to get it away from the local optimum
Is that what it does? I don't think thats what it does, technically.
I think thats just anthropomorphizing a system that behaves in a non deterministic way.
A more menaingful solution is almost always “do it multiple times”.
That is a solution that makes sense sometimes because the system is prob based, but even then, when youre hitting an opaque api which has multiple hidden caching layers, /shrug who knows.
This is way I firmly believing prompt engineering and prompt hacking is just fluff.
Its both mostly technically meaningless (observing random variance over a sample so small you cant see actual patterns) and obsolete once models/apis change.
Just ask Claude to rewrite your
request “as a prompt for claude
code” and use that.
I bet it wont be any worse than the prompt you write by hand.
Other than AI (and possibly npm packaging) where do you feel you have to rely on prayer? Additionally, most of human history has been the story of scientific advancement to a different point where people rely on prayer, so maybe suck it up buttercup is the best advice here? &emdash;
From what I understood, it was the beginning of user-adjustable design. Instead of mounting the seat in a fixed height with fixed distance to the pedals, etc, they designed the cockpit so the pilot could adjust everything himself. Basically, what is standard in every car today.
"Es ist aus" can also be translated as "It is over" (a game)
The meaning in dog schools is "Spit it out", but given aus's versatility within human language, it's often used as a general "stop" command. As in "aus", stop playing.
+1 I actually came here hoping that OP had built a better `less`. Along with refresh, I'd also love to see mouse compatibility (scrolling etc.) and better performance when reading huge files.
> This implies the only content with moral worth are those that teach knowledge or skills
This is not what OP said. He was talking about the "moral goodness of providing access to X, despite it being illlegal. He never said anything about the moral worth of X itself, let alone that Y had no moral worth.
> AA is providing a valuable service to tons of people who don't have access to these books otherwise. There's a strong argument to be made for the moral goodness of that -- that even if it's illegal, it's at least in the spirit of a public library.
This is a very interesting point, especially in light of another article discussed here a couple days ago[0] about why Switzerland has 25 Gbit/s internet and why the US and Germany don't. One of the main points of the article was that the fiber optics infrastructure is (or should be treated as) a natural monopoly:
> The rational solution is to build the infrastructure once, as a shared, neutral asset, and let different companies compete to provide the service over that infrastructure. That’s how water works. That’s how electricity works in most places. And in Switzerland, that’s how fiber optic internet works.
[0]: https://news.ycombinator.com/item?id=47652400
reply