You can use STT and include a workflow that automatically extracts the requirements (filters all the um's, ah's, pauses) and it becomes more like an interaction where you act as the Product Owner/Manager and Codex is your Architect/Dev.
At least, that's how I code through my phone. But it does require some forethought in establishing your automated workflows. I'm at the point where my entire dev system has established templates for CI/CD so I can preview work in staging and production is still a manual step (obviously).
Sure, I too do that on the computer. Computers have microphones these days, and STT runs on my macOS as well. What was your point about in regards to my comment? I am not sure I understood you.
Human systems have a critical bottleneck, it's run by humans. That doesn't mean it's necessarily a flaw, but it means all systems are corruptible if it's run by corrupted humans.
And I mean this for any sort of system from corporate, nonprofits, dictatorships, oligarchs, and democracy. Democracy is still a human-run system and that people seem to think democracy is somehow this bastion of freedom is a delusion.
If we want better systems we need better people running them, but that's a conversation that's emerging so we'll see how it goes.
People wonder why the state of the world is the way it is. Traumatized child expresses themselves the only way they know how? Let's beat them, surely this is the solution not creating a safe environment.
I'm not saying beating children is good but.. if you think the 'state of the world' is bad, well, most countries have massively reduced how much beating children get over the last 40 years or so.
Does this imply there's room for ethically-sourced AI? I've always thought that at some point there would be some sort of p2p-style way of people contributing their compute resources to training AI models that are distributed for everyone to use.
Because it's a statistical process generating one part of a word at a time. It probably isn't even generating "surprise". It might be generating "sur", then "prise" then "!"
But what is surprise really? Something not following expectation. The distribution may statistically leverage surprise as a concept via how it has seen surprise as a concept e.g. "interesting!"
So it can be both true that it has nothing to do with the emotion of surprise, but appear as the emulation of that emotion since the training data matches the concept of surprise (mismatch between expectation and event).
It’s the emotional and physiological response to a prediction being wrong. At its most primal, it’s the fear and surge of adrenaline when a predator or threat steps out from where you thought there was no threat. That’s not something most people will literally experience these days but even comedic surprise stems from that shock of subversion of expectation.
LLMs do not feel. They can express feeling, just as you can, but it doesn’t stem from a true source of feeling or sensation.
Expressing fake feelings is trivial for humans to do, and apparently for an LLM as well. I’m sure many autistic people or even anyone who’s been given a gift they didn’t like can relate to expressing feelings that they don’t actually feel, because expressing a feeling externally is not at all the same as actually feeling it. Instead it’s how we show our internal state to others, when we want to or can’t help it.
It is a mistake to equate artificial intelligence with sentience and humanity for moral reasons, if nothing else.
We are also technically a statistical process generating one part of a word at a time when we speak. Our neurons form the same kind of vectorised connections LLMs do. We are the product of repeated experiences - the same way training works.
Our brains are more advanced, and we may not experience the world the same way, but I think we have clearly created rudimentary digital consciousness.
Because it has no mind, no cognition, and nothing to "feel" with. Don't mistake programmatic mimicry for intention. That's just your own linguistic-forward primate cognition being fooled by the linguistic signals the training set and prompt are making the AI emit.
I could describe the electrical and chemical signals within your neurons and synapses as proof that you are merely a series of electrochemical reactions, and can only mimic genuine thought.
You could do that if you wanted to ignore reality and be reductive to score points in an argument by purposefully conflating mimicry with intention, yes.
And that is dogma. It's unthinking circular reasoning.
It wasn't very long ago that scientists were certain that animals did not posses thoughts or feelings. Any behaviour which appeared to resemble thinking or feeling was simply unconscious autonomic responses, with no more thought behind them than a sunflower turning towards the sun. Animals, by definition, lack Immortal Souls and Free Will, and therefore they are empty inside. Biological automata.
Of course this dogma was unfalsifiable, because any apparent evidence of animal cognition could be refuted as simply not being cognition, by definition.
Look, either cognition is magic, or it's math. There really isn't a middle ground. If you want to believe that wetware is fundamentally irreducible to math, then you believe it's magic. If that's want you want to believe, then fine. But it's dogma, and maintaining that dogma will require increasingly willful acts of blindness.
You are using word "math" in a magical way. Current LLM programs are reducible to math and human cognition is reducible to math (which is a reasonable hypothesis). What you are implying is that just because word math is used in both sentences it actually means the same thing. And that is a magical thinking. Just because human cognition is reducible to math (let's assume that for sake of discussion) doesn't mean it's the same math as in the LLM programs, or even close enough. Or maybe it is, but we don't have any proof yet.
I agree with this. I'm not arguing that LLMs are conscious. We don't understand the math behind how our brains work; we don't know how close or far LLMs are to that; and we don't know how many different pathways to consciousness there are within math.
All I'm saying is that the argument that "It's not consciousness, it's just <insert any tangentially mathematical claim here>", is dogma. Given everything that we don't know, agnosticism is the appropriate response.
> It wasn't very long ago that scientists were certain that animals did not posses thoughts or feelings. Any behaviour which appeared to resemble thinking or feeling was simply unconscious autonomic responses, with no more thought behind them than a sunflower turning towards the sun. Animals, by definition, lack Immortal Souls and Free Will, and therefore they are empty inside. Biological automata.
It's cool that you can decide to take half-remembered incorrect anecdotes about what "scientists" are certain of at some indeterminate time in the past, sans citation, and use that to underpin your argument about a totally different thing.
> Of course this dogma was unfalsifiable...
...like your post's anecdata.
> Look, either cognition is magic, or it's math.
Yes, when you decide to draw a convoluted imaginary bounding box around the argument, anything can be whatever you want it to be.
LLMs have no mind and no intention. They are programmed to mimic human language. Read some Grice and learn exactly how dependent humans are on the cooperative principle, and exactly how vulnerable we are to seeing intent where none exists in LLM communication that mimics the outputs our inputs expect to receive.
Your cries of "dogma dogma dogma" are unpersuasive and lack grounding in practical reality.
Wow. This benchmark definitely feels more accurate than the other rankings I've seen. My experience with gpt 5.4/5.5 is that they are technically flawless and if there are any technical issues that is because the input didn't provide enough clarity; that's not to say that it doesn't autonomously react to any issues during bug fixes or implementations, but it'll tend to nail its tasks without leaving behind gaps.
Opus otoh is overrated in terms of its technical ability. It is certainly a better designer/developer for beautiful user experiences, but I'll always lean on gpt 5.5 to check its work.
The biggest surprise in the benchmark is Xiao-Mi. I haven't tried it yet, but I will be after looking at this.
Grats on your team for putting together something meaningful to make sense of the ongoing AI speedrun! Great work!
Are we looking at the same data? On that site I see that opus 4.7's and gpt 5.5's g scores are within each others confidence intervals, and both significantly ahead of the number 3 model.
Your comment makes it sound like they are miles apart, which the benchmark doesn't seem to support.
Edit:
I looked at the data more and the two models are only basically equal when looking at the mean of all the tests. Gpt 5.5 significantly outperforms opus 4.7 in coding, while opus 4.7 significantly outperforms in "decision making." I'm not seeing details on what decision making explicitly means.
Decision making refers to the environments where the LLM is called on every tick (like games with social communication), examples here: https://gertlabs.com/spectate.
Because GPT 5.5 just launched and those games take longer to accumulate data for, it just doesn't have enough samples yet. It will end up with a wider lead on Opus, I am sure. Coding evals always have large sample sizes on day 1. Good find, we should probably better adjust the weighting here for decision games with low match counts.
Right, I'm including my own observations in what the leaderboard is showing. Could be confirmation bias, but I use both Opus and GPT extensively and since GPT 5.4 I have noticed that Opus doesn't even begin to touch GPT's level of technical depth. I was hoping Opus 4.7 would close that gap, but unfortunately it doesn't even compare to GPT 5.4 in that sense.
I'm not being a hater, I love Opus for different reasons, but I can't rely on it for its technical ability.
reply