> Is it too anthropomorphic to say that this is a lie? Yes. Current LLMs can onl...

lostmsu · 2025-07-07T02:51:42 1751856702

> Current LLMs can only introspect from output tokens

The only interpretation of this statement I can come up with is plain wrong. There's no reason LLM shouldn't be able to introspect without any output tokens. As the GP correctly says, most of the processing in LLMs happens over hidden states. Output tokens are just an artefact for our convenience, which also happens to be the way the hidden state processing is trained.

Marazan · 2025-07-07T07:29:37 1751873377

"Hidden layers" are not "hidden state".

Saying so is just unbelievably confusing.

positron26 · 2025-07-07T02:57:30 1751857050

There are no recurrent paths besides tokens. How may I introspect something if it is not an input? I may not.

barrkel · 2025-07-07T10:09:38 1751882978

The recurrence comes from replaying tokens during autoregression.

It's as if you have a variable in a deterministic programming language, only you have to replay the entire history of the program's computation and input to get the next state of the machine (program counter + memory + registers).

Producing a token for an LLM is analogous to a tick of the clock for a CPU. It's the crank handle that drives the process.

hackinthebochs · 2025-07-07T10:32:26 1751884346

Important attention heads or layers within an LLM can be repeated giving you an "unrolled" recursion.

positron26 · 2025-07-07T10:41:27 1751884887

An unrolled loop in a feed-forward network is all just that. The computation is DAG.

hackinthebochs · 2025-07-07T10:51:34 1751885494

But the function of an unrolled recursion is the same as a recursive function with bounded depth as long as the number of unrolled steps match. The point is whatever function recursion is supposed to provide can plausibly be present in LLMs.

positron26 · 2025-07-07T11:46:18 1751888778

And then during the next token, all of that bounded depth is thrown away except for the token of output.

You're fixating on the pseudo-computation within a single token pass. This is very limited compared to actual hidden state retention and the introspection that would enable if we knew how to train it and do online learning already.

The "reasoning" hack would not be a realistic implementation choice if the models had hidden state and could ruminate on it without showing us output.

hackinthebochs · 2025-07-07T12:03:22 1751889802

Sure. But notice "ruminate" is different than introspect, which was what your original comment was about.

throw310822 · 2025-07-07T07:28:29 1751873309

Introspection doesn't have to be recurrent. It can happen during the generation of a single token.

delusional · 2025-07-07T06:49:03 1751870943

> Output tokens are just an artefact for our convenience

That's nonsense. The hidden layers are specifically constructed to increase the probability that the model picks the right next word. Without the output/token generation stage the hidden layers are meaningless. Just empty noise.

It is fundamentally an algorithm for generating text. If you take the text away it's just a bunch of fmadds. A mute person can still think, an LLM without output tokens can do nothing.

Tarq0n · 2025-07-08T20:10:44 1752005444

I think that's almost completely backwards. The input and output layers just convert between natural language and embeddings i.e. shift the format of the language. But operating on the embeddings is where meaning (locations in vector-space) are transformed.