Maybe, just maybe, this is of obvious utility to the many people who have needs that are not yours?
I very regularly need to interact with my work through a python interpreter. My work is scientific programming. So the variables might be arrays with millions of elements. In order to debug, optimize, verify, or improve in any way my work, I cannot rely on any other methods than interacting with the code as it's being run, or while everything is still in memory. So if I want to really leverage LLMs, especially to allow them to work semi-autonomously, they must be able to do the same.
I'm not going to dump tens of GB of stuff to a log file or send it around via pipes or whatever. Why is there a nan in an array that is the product of many earlier steps in a code that took an hour to run? Why are certain data in a 200k-variable system of equations much harder to fit than others, and which equations are in tension with each other to prevent better convergence?
Are interpreters and pdb not great, previously-existing tools for this kind of work? Does a new tool that lets LLMs/agents use them actually represent some sort of hack job because better solutions have existed for years?
I agree that at first glance, it seems like tmux, or even long-running PTY shell calls in harnesses like Claude, solve this. They do keep processes alive across discrete interactions. But in practice, it’s kind of terrible, because the interaction model presented to the LLM is basically polling. Polling is slow and bloats context.
To avoid polling, you need to run the process with some knowledge of the internal interpreter state. Then a surprising number of edge cases start showing up once you start using it for real data science workflows. How do you support built-in debuggers? How do you handle in-band help? How do you handle long-running commands, interrupts, restarts, or segfaults in the interpreter? How do you deal with echo in multi-line inputs? How do you handle large outputs without filling the context window? Do you spill them to the filesystem somewhere instead of just truncating them, so the model can navigate them? What if the harness doesn’t have file tools? And so on.
Then there is sandboxing, which becomes another layer of complexity wrapped into the same tool.
Are you aware that you can use tmux (or zellij, etc.), spin up the interpreter in a tmux session, and then the LLM can interact with it perfectly normally by using send-keys? And that this works quite well, because LLMs are trained on it? You just need to tell the LLM "I have ipython open in a tmux session named pythonrepl"
This is exactly how I do most of my data analysis work in Julia.
In the data science scenario you should just have proper tooling, for you it sounds like a REPL the agent can interface with. I do this with nREPL/CIDER; in Python-land a Jupyter kernel over MCP maybe. For stateful introspection where you don't control the tooling, tmux plus trivial glue gets you most of the way.
edit: There are much better solutions for Python-land below it seems :)
What I do is have a quick command that spins up a worktree on a repo with my ghostty splits as I like them and the tmux named the worktree. I then tell the Claude code about the tmux when it needs to look. It’s pretty good at natively handling the tmux interactions.
Ideally Ghostty would offer primitives to launch splits but c’est la vie. Apple automation it is.
I use both gvim on linux and macvim on mac for a lot of things--not 'real' coding, typically, but opening and editing scripts and config files, writing in markdown, etc; I'm usually opening these from dolphin or finder. In the terminal, working on real code bases and not scripts, I use neovim. My configs for these have diverged a bit over the years but since the use cases are different, it doesn't bother me.
Thanks. I've got an OpenAI subscription and tried this in the past, and got a handful of results, but nothing comprehensive. Perhaps it is better now, or I could change the way I ask.
No prob, see if there's anything useful in any of the links I added to the post. I'm always interested in good benchmarks and test cases, as I usually don't have enough of my own to justify my expensive pro subscriptions. (I did not review them myself as I don't know what I'm looking at.)
A few years later, the gravitational deflection of the Himalayas on a plumb line by Airy proved less than expected, which suggested that mountains have 'roots' that extend below them, displacing more dense rock--like icebergs more or less.
I used the gravitational force of the Longmenshan range to calculate the perturbations in the elastic stress field of the Earth's crust in Sichuan province, China, to estimate the tectonic forces in the region, which caused the 2008 Wenchuan earthquake: https://agupubs.onlinelibrary.wiley.com/doi/full/10.1002/201...
Think of those numbers as one kind of in extreme case argument.
Another reality is that most of the global grid scale energy usage is not transport via mobile batteries that benefits most from high energy density lithium batteries that pack maximal energy from least weight.
Battery farms don't move, they can use other battery chemistries that are cheaper in resources and weigh a lot more per energy unit than lithium while still powering cities, smelters, processing plants, etc.
As for desalination in general, yes, there will be a lot more of that in coming years, fresh potable water supplies are stretched from a global PoV.
There were no alphabets in the Americas before European contact. Mayan had written mathematics and hieroglyphics, and some Quechuan speaking peoples had string that had symbolic knots that had some mathematical representation (I don't know if it allowed arithmetic or was just record keeping).
Sequoia developed the Cherokee syllabary (where symbols represent syllables instead of vowels/consonants) in the 1800s after seeing white men reading, and figuring out what they were doing (he spoke little English and could not read it). This is the first real written indigenous language in the Americas.
The Skeena characters shown here are obviously derived from European characters, as was the Cherokee syllabary. I think most written forms of native languages in the Americas are similar.
The Cree have a script which is far from European characters but was nonetheless developed for the Cree by a missionary in the 1800s. The Inuit have modified it for their language.
I don't know much about indigenous languages in the rest of the world.
Rocks could be potential sources. Crystals that large are by no means rare, with feldspars being the most common on Earth and perhaps on most rocky planets (quartz is well known of course but I think would be rare without the magmatic fractionation that happens due to plate tectonics, which is perhaps unique to Earth in the solar system.)
Volcanic glass (eg obsidian) is also shiny and by no means rare in the solar system.
Many asteroids are also metallic, and perhaps metal crystals or fracture planes could produce reflectors of the right size.
I very regularly need to interact with my work through a python interpreter. My work is scientific programming. So the variables might be arrays with millions of elements. In order to debug, optimize, verify, or improve in any way my work, I cannot rely on any other methods than interacting with the code as it's being run, or while everything is still in memory. So if I want to really leverage LLMs, especially to allow them to work semi-autonomously, they must be able to do the same.
I'm not going to dump tens of GB of stuff to a log file or send it around via pipes or whatever. Why is there a nan in an array that is the product of many earlier steps in a code that took an hour to run? Why are certain data in a 200k-variable system of equations much harder to fit than others, and which equations are in tension with each other to prevent better convergence?
Are interpreters and pdb not great, previously-existing tools for this kind of work? Does a new tool that lets LLMs/agents use them actually represent some sort of hack job because better solutions have existed for years?
reply