When Linus posted Linux 0.01 in 1991, he wrote: "I'm doing a (free) operating system (just a hobby, won't be big and
professional)." It wasn't complete. It wasn't polished. But the core ideas were there.
I've been thinking about what an "operating system" for LLMs would look like. Not an agent framework – an actual OS with
memory hierarchies, execution modes, and something I'm calling a "Sentience Layer."
LLM OS v3.4.0 is my attempt. It's incomplete and probably over-ambitious, but the architecture is interesting:
Four-Layer Stack:
- Sentience Layer – Persistent internal state (valence variables: safety, curiosity, energy, confidence) that influences
behavior. The system develops "moods" based on task outcomes.
- Learning Layer – Five execution modes (CRYSTALLIZED → FOLLOWER → MIXED → LEARNER → ORCHESTRATOR) based on semantic trace
matching
- Execution Layer – Programmatic Tool Calling for 90%+ token savings on repeated patterns
- Self-Modification Layer – System writes its own agents (Markdown) and crystallizes patterns into Python
What makes it different:
- Agents are Markdown files the LLM can edit (hot-reloadable, no restart)
- Traces store full tool calls for zero-context replay
- Repeated patterns become pure Python (truly $0 cost)
- Internal state persists across sessions and influences mode selection
Working examples:
- Quantum computing IDE backend (Qiskit Studio)
- Educational platform for kids (Q-Kids Studio)
- Robot control with safety hooks (RoboOS)
Is it production-ready? No. Will it work as envisioned? I'm figuring that out. But the ideas feel right, and building it is
genuinely fun.
GitHub: https://github.com/EvolvingAgentsLabs/llm-os
Looking for feedback on the architecture, collaboration on making it actually work, and honest criticism. What's missing?
What's overengineered? What would you want from an LLM OS?
I'm working on LLM OS, an experimental project that explores treating the LLM as a CPU and Python as the kernel. The goal is to provide OS-level services—like memory hierarchy, scheduler hooks, and security controls—to agentic workflows using the Claude Agent SDK.
Right now, this is mostly a collection of architectural ideas and prototypes rather than a polished framework. I’ve included several complex examples in the repo to explore the potential of this approach:
- Qiskit Studio Backend: Re-imagining a microservices architecture as a unified OS process for quantum computing tasks.
- Q-Kids Studio: Exploring how an OS layer can manage safety, adaptive difficulty, and state in an educational app.
- RoboOS: Testing how kernel-level security hooks can enforce physical safety constraints on a robot arm.
These examples play with concepts like execution caching (Learner/Follower modes) and multi-agent orchestration, but the project is very much in the early stages and is not yet functional for production.
I’m sharing this early because I believe the "LLM as OS" analogy has a lot of potential. I'm looking for contributors and feedback to help turn these concepts into a functional reality.
Most agent frameworks struggle with long-term, consolidated memory. They either have a limited context window or use simple RAG, but there's no real process for experience to become institutional knowledge.
Inspired by the recent Google Research paper "Nested Learning: The Illusion of Deep Learning Architectures", we've implemented a practical version of its "Continuum Memory System" (CMS) in our open-source agent framework, LLMunix.
The idea is to create a memory hierarchy with different update frequencies, analogous to brain waves, where memories "cool down" and become more stable over time.
Our implementation is entirely file-based and uses Markdown with YAML frontmatter (no databases):
High-Frequency Memory (Gamma):
Raw agent interaction logs and workspace state from every execution. Highly volatile, short retention. (/projects/{ProjectName}/memory/short_term/)
Mid-Frequency Memory (Beta):
Successful, deterministic workflows distilled into execution_trace.md files. These are created by a consolidation agent when a novel task is solved effectively. Much more stable. (/projects/{ProjectName}/memory/long_term/)
Low-Frequency Memory (Alpha):
Core patterns that have been proven reliable across many contexts and projects. Stored in system-wide logs and libraries. (/system/memory_log.md)
Ultra-Low-Frequency Memory (Delta):
Foundational knowledge that forms the system's identity. (/system/SmartLibrary.md)
A new ContinuumMemoryAgent orchestrates this process, automatically analyzing high-frequency memories and deciding what gets promoted to a more stable, lower-frequency tier.
This enables:
Continual Learning: The system gets better and more efficient at tasks without retraining, as successful patterns are identified and hardened into reusable traces.
No Catastrophic Forgetting: Proven, stable knowledge in low-frequency tiers isn't overwritten by new, transient experiences.
Full Explainability:
The entire learning process is human-readable and version-controllable in Git, since it's all just Markdown files.
The idea was originally sparked by a discussion with Ismael Faro about how to build systems that truly learn from doing.
We'd love to get your feedback on this architectural approach to agent memory and learning.
We made LLMunix - an experimental system where you define AI agents in markdown once, then a local model executes them. No API calls after setup.
The strange part: it also generates mobile apps. Some are tiny, some bundle local LLMs for offline reasoning. They run completely on-device.
Everything is pure markdown specs. The "OS" boots when an LLM runtime reads the files and interprets them.
Still figuring out where this breaks. Edge models are less accurate. Apps with local AI are 600MB+. Probably lots of edge cases we haven't hit.
But the idea is interesting: what if workflows could learn and improve locally? What if apps reasoned on your device instead of the cloud?
Try it if you're curious. Break it if you can. Genuinely want to know what we're missing.
What would you build with fully offline AI?
• Have a 2GB local model execute it daily with actual reasoning
• Generate production mobile apps with on-device AI
• All for zero marginal cost
...I would've said "maybe in 5 years."
We built it. It's called LLMunix.
What if you could describe any mobile app
- "personal trainer that adapts,"
- "study assistant that quizzes me"
- and get a working prototype with on-device AI in minutes, not months?
What if every workflow you do more than once becomes an agent that improves each time?
What if AI ran locally, privately, adapting to you - not in the cloud adapting to everyone?
I wanted to share a project I've been refining, called llmunix-starter. I've always been fascinated by the idea of AI systems that can adapt and build what they need, rather than relying on a fixed set of pre-built tools. This is my attempt at exploring that.
The template is basically an "empty factory." When you give it a complex goal through Claude Code on the web (which is great for this because it can run for hours), it doesn't look for existing agents. Instead, it writes the markdown definitions for a new, custom team of specialists on the fly.
For example, we tested it on a university bioengineering problem and it created a VisionaryAgent, a MathematicianAgent, and a QuantumEngineerAgent from scratch. The cool part was when we gave it a totally different problem (geological surveying), it queried its "memory" of the first project and adapted the successful patterns, reusing about 90% of the core logic.
I think it's particularly useful for those weird, messy problems where a generic agent just wouldn't have the context—like refactoring a legacy codebase or exploring a niche scientific field.
The release of Anthropic's "Imagine with Claude" is fascinating. It shows a model that doesn't generate code to build a UI; it uses tools to construct the UI directly. This feels like a major shift from the "AI as a copilot" paradigm to "AI as a runtime."
This has been a core question behind an open-source project I've been working on with Ismael Faro, called LLMunix https://github.com/EvolvingAgentsLabs/llmunix . Our approach is to build an entire OS for agents where the "executables" are not binaries, but human-readable Markdown files. The LLM interprets these files to orchestrate complex workflows.
The linked article is my analysis of these two approaches. It argues that while direct interpretation is incredibly powerful, an open, transparent, and auditable framework (like our Markdown-based one) is crucial for the future of agentic systems.
Curious to hear what HN thinks. Are we moving towards a future where LLMs are the OS, and if so, what should the "assembly language" for that OS look like?
LLM-OS tries to give AI systems persistent, evolving memory by treating everything as a memory artifact:
Crystallized tools: repeated patterns auto-convert into executable Python tools (deterministic memory).
Markdown agents: editable behavioral memory.
Execution traces: procedural memory the system can replay/learn from.
Promotion layers: memory flows from user → team → organization via background “crons.”
The idea is that organizations accumulate AI knowledge automatically, and new members inherit it.
Repo: https://github.com/EvolvingAgentsLabs/llmos
Article: https://www.linkedin.com/pulse/what-your-ai-remembered-every...
Curious whether HN thinks persistent AI memory is workable