In the years since this came out I have shifted my opinion from 100% agreement to kind of the opposite in a lot of cases, a bitter lesson from using AI to solve complex end-to-end tasks. If you want to build a system that will get the best score on a test set - he's right, get all the data you possibly can, try to make your model as e2e as possible. This often has the advantage of being the easiest approach.
The problem is: your input dataset definitely has biases that you don't know about, and the model is going to learn spurious things that you don't want it to. It can make some indefensible decisions with high confidence and you may not know why. Some domains your model may be basically useless, and you won't know this until it happens. This often isn't good enough in industry.
To stop batshit things coming out of your system, you may have to do the opposite - use domain knowledge to break the problem the model is trying to solve down into steps you can reason about. Use this to improve your data. Use this to stop the system from doing anything you know makes no sense. This is really hard and time consuming, but IMO complex e2e models are something to be wary of.
Your understanding that problem solving has to do with "steps" is the correct one in my opinion.
Also, with data driven approaches, the model isn't necessarily learned in any meaningful way. If you train with certain inputs to get certain realistic looking outputs, you can build sophisticated parrots or chameleons. That's what GPT and stable diffusion are: compressed knowledge bases to get an output without a knowledge model. (No, language models are not knowledge models).
Thinking, rational or otherwise, requires causal steps. Since none of these data driven approaches have even fuzzy causal models, they require memorizing infinite universes to see if search can find a particular universe that's seen this before. That's why they're not intelligent and never will be. An intelligent animal only needs one universe and limited experiences to solve a problem because it knows the latent structure of the problem and can generate approaches. Intelligent entities, unlike these autistic Rain Man automatons, do not need to memorize every book in the library to multiply two large numbers together.
The problem is: your input dataset definitely has biases that you don't know about, and the model is going to learn spurious things that you don't want it to. It can make some indefensible decisions with high confidence and you may not know why. Some domains your model may be basically useless, and you won't know this until it happens. This often isn't good enough in industry.
To stop batshit things coming out of your system, you may have to do the opposite - use domain knowledge to break the problem the model is trying to solve down into steps you can reason about. Use this to improve your data. Use this to stop the system from doing anything you know makes no sense. This is really hard and time consuming, but IMO complex e2e models are something to be wary of.