Hacker Newsnew | past | comments | ask | show | jobs | submit | matiasmolinas's commentslogin

Came across an early open-source project aiming to fix a big gap in current LLMs: statelessness. Every conversation resets to zero.

LLM-OS tries to give AI systems persistent, evolving memory by treating everything as a memory artifact:

Crystallized tools: repeated patterns auto-convert into executable Python tools (deterministic memory).

Markdown agents: editable behavioral memory.

Execution traces: procedural memory the system can replay/learn from.

Promotion layers: memory flows from user → team → organization via background “crons.”

The idea is that organizations accumulate AI knowledge automatically, and new members inherit it.

Repo: https://github.com/EvolvingAgentsLabs/llmos

Article: https://www.linkedin.com/pulse/what-your-ai-remembered-every...

Curious whether HN thinks persistent AI memory is workable


When Linus posted Linux 0.01 in 1991, he wrote: "I'm doing a (free) operating system (just a hobby, won't be big and professional)." It wasn't complete. It wasn't polished. But the core ideas were there.

  I've been thinking about what an "operating system" for LLMs would look like. Not an agent framework – an actual OS with
  memory hierarchies, execution modes, and something I'm calling a "Sentience Layer."

  LLM OS v3.4.0 is my attempt. It's incomplete and probably over-ambitious, but the architecture is interesting:

  Four-Layer Stack:
  - Sentience Layer – Persistent internal state (valence variables: safety, curiosity, energy, confidence) that influences
  behavior. The system develops "moods" based on task outcomes.
  - Learning Layer – Five execution modes (CRYSTALLIZED → FOLLOWER → MIXED → LEARNER → ORCHESTRATOR) based on semantic trace
  matching
  - Execution Layer – Programmatic Tool Calling for 90%+ token savings on repeated patterns
  - Self-Modification Layer – System writes its own agents (Markdown) and crystallizes patterns into Python

  What makes it different:
  - Agents are Markdown files the LLM can edit (hot-reloadable, no restart)
  - Traces store full tool calls for zero-context replay
  - Repeated patterns become pure Python (truly $0 cost)
  - Internal state persists across sessions and influences mode selection

  Working examples:
  - Quantum computing IDE backend (Qiskit Studio)
  - Educational platform for kids (Q-Kids Studio)
  - Robot control with safety hooks (RoboOS)

  Is it production-ready? No. Will it work as envisioned? I'm figuring that out. But the ideas feel right, and building it is
  genuinely fun.

  GitHub: https://github.com/EvolvingAgentsLabs/llm-os

  Looking for feedback on the architecture, collaboration on making it actually work, and honest criticism. What's missing?
  What's overengineered? What would you want from an LLM OS?


I'm working on LLM OS, an experimental project that explores treating the LLM as a CPU and Python as the kernel. The goal is to provide OS-level services—like memory hierarchy, scheduler hooks, and security controls—to agentic workflows using the Claude Agent SDK.

Right now, this is mostly a collection of architectural ideas and prototypes rather than a polished framework. I’ve included several complex examples in the repo to explore the potential of this approach:

- Qiskit Studio Backend: Re-imagining a microservices architecture as a unified OS process for quantum computing tasks.

- Q-Kids Studio: Exploring how an OS layer can manage safety, adaptive difficulty, and state in an educational app.

- RoboOS: Testing how kernel-level security hooks can enforce physical safety constraints on a robot arm.

These examples play with concepts like execution caching (Learner/Follower modes) and multi-agent orchestration, but the project is very much in the early stages and is not yet functional for production.

I’m sharing this early because I believe the "LLM as OS" analogy has a lot of potential. I'm looking for contributors and feedback to help turn these concepts into a functional reality.

Repo: https://github.com/EvolvingAgentsLabs/llm-os


Most agent frameworks struggle with long-term, consolidated memory. They either have a limited context window or use simple RAG, but there's no real process for experience to become institutional knowledge.

Inspired by the recent Google Research paper "Nested Learning: The Illusion of Deep Learning Architectures", we've implemented a practical version of its "Continuum Memory System" (CMS) in our open-source agent framework, LLMunix.

https://research.google/blog/introducing-nested-learning-a-n...

The idea is to create a memory hierarchy with different update frequencies, analogous to brain waves, where memories "cool down" and become more stable over time.

Our implementation is entirely file-based and uses Markdown with YAML frontmatter (no databases):

High-Frequency Memory (Gamma): Raw agent interaction logs and workspace state from every execution. Highly volatile, short retention. (/projects/{ProjectName}/memory/short_term/)

Mid-Frequency Memory (Beta): Successful, deterministic workflows distilled into execution_trace.md files. These are created by a consolidation agent when a novel task is solved effectively. Much more stable. (/projects/{ProjectName}/memory/long_term/)

Low-Frequency Memory (Alpha): Core patterns that have been proven reliable across many contexts and projects. Stored in system-wide logs and libraries. (/system/memory_log.md)

Ultra-Low-Frequency Memory (Delta): Foundational knowledge that forms the system's identity. (/system/SmartLibrary.md)

A new ContinuumMemoryAgent orchestrates this process, automatically analyzing high-frequency memories and deciding what gets promoted to a more stable, lower-frequency tier.

This enables:

Continual Learning: The system gets better and more efficient at tasks without retraining, as successful patterns are identified and hardened into reusable traces.

No Catastrophic Forgetting: Proven, stable knowledge in low-frequency tiers isn't overwritten by new, transient experiences.

Full Explainability: The entire learning process is human-readable and version-controllable in Git, since it's all just Markdown files. The idea was originally sparked by a discussion with Ismael Faro about how to build systems that truly learn from doing.

We'd love to get your feedback on this architectural approach to agent memory and learning.

GitHub Repo: https://github.com/EvolvingAgentsLabs/llmunix

Key files for this new architecture:

- The orchestrator agent: system/agents/ContinuumMemoryAgent.md

- The memory schema: system/infrastructure/memory_schema.md

- The overall system design: CLAUDE.md (which now includes the CMS theory)

What are your thoughts on this approach to agent memory and learning?


Curious what you think.

  We made LLMunix - an experimental system where you define AI agents in markdown once, then a local model executes them. No API calls after setup.

  The strange part: it also generates mobile apps. Some are tiny, some bundle local LLMs for offline reasoning. They run completely on-device.

  Everything is pure markdown specs. The "OS" boots when an LLM runtime reads the files and interprets them.

  Still figuring out where this breaks. Edge models are less accurate. Apps with local AI are 600MB+. Probably lots of edge cases we haven't hit.

  But the idea is interesting: what if workflows could learn and improve locally? What if apps reasoned on your device instead of the cloud?

  Try it if you're curious. Break it if you can. Genuinely want to know what we're missing.
  What would you build with fully offline AI?


A year ago, if you told me I could:

• Describe a workflow once to Claude

• Have a 2GB local model execute it daily with actual reasoning

• Generate production mobile apps with on-device AI

• All for zero marginal cost

...I would've said "maybe in 5 years."

We built it. It's called LLMunix.

What if you could describe any mobile app - "personal trainer that adapts," - "study assistant that quizzes me" - and get a working prototype with on-device AI in minutes, not months?

What if every workflow you do more than once becomes an agent that improves each time?

What if AI ran locally, privately, adapting to you - not in the cloud adapting to everyone?


I've been thinking about Wabi.ai's vision and Claude Imagine's approach: "software that doesn't exist until you need it."

What if instead of downloading 50 different apps, you just described what you wanted and an AI generated a personalized interface on the fly?

I built a proof-of-concept using LLMunix (pure markdown agent framework):

• UI-MD format: Markdown-based UI definitions (like HTML, but for LLMs)

• Memory-first architecture: Every UI is personalized to your context

• One shell app: Renders any UI-MD in real-time

• No compilation: Generate and display in seconds

Example: "Create a morning briefing app"

→ System queries your preferences (location: SF, interests: tech)

→ Fetches weather, calendar, news in parallel

→ Generates personalized markdown UI

→ Mobile shell renders it instantly

The POC includes:

- 5 specialized agents (memory, UI generation, weather, calendar, news)

- FastAPI backend with RESTful endpoints

- Complete UI-MD specification

What's interesting:

1. Everything is markdown (agents, tools, UI definitions)

2. No app downloads needed after the initial shell

3. Fully personalized from day one

4. Apps "learn" from your usage patterns

5. Share/remix apps as markdown files

What's missing:

- The actual mobile shell

- Real API integrations (weather, news, calendar)

- Multi-user backend infrastructure

- Real-world testing at scale

I'm sharing this to:

1. Test if this approach is fundamentally sound

2. Invite discussion on the architecture

3. Find collaborators interested in building the missing pieces

4. Explore if this could disrupt traditional app distribution

Key questions I'd love to discuss:

• Is markdown the right format for LLM-generated UIs?

• How do we handle complex interactions (forms, animations)?

• What about offline functionality?

• Privacy implications of centralized personalization?

• Business model: Who pays for compute?

• Could this work for web, not just mobile?

The code is open source, fully documented, and ready to run: https://github.com/EvolvingAgentsLabs/llmunix/tree/feature/n...

Quick start:

https://github.com/EvolvingAgentsLabs/llmunix/blob/feature/n...

I'm particularly interested in hearing from:

- Mobile developers

- Anyone who's thought about personal software

- People building LLM agents

- UX researchers interested in adaptive interfaces

- Anyone skeptical of this approach (challenge my assumptions!)

Thoughts?

Is this the future or am I missing something fundamental?


I wanted to share a project I've been refining, called llmunix-starter. I've always been fascinated by the idea of AI systems that can adapt and build what they need, rather than relying on a fixed set of pre-built tools. This is my attempt at exploring that.

The template is basically an "empty factory." When you give it a complex goal through Claude Code on the web (which is great for this because it can run for hours), it doesn't look for existing agents. Instead, it writes the markdown definitions for a new, custom team of specialists on the fly.

For example, we tested it on a university bioengineering problem and it created a VisionaryAgent, a MathematicianAgent, and a QuantumEngineerAgent from scratch. The cool part was when we gave it a totally different problem (geological surveying), it queried its "memory" of the first project and adapted the successful patterns, reusing about 90% of the core logic.

I think it's particularly useful for those weird, messy problems where a generic agent just wouldn't have the context—like refactoring a legacy codebase or exploring a niche scientific field.

Thanks for taking a look!!


Let Claude Code create its own sub-agents and improve itself to achieve your goals. You can ask anything — try it! It’s an open-source Claude Code plugin: https://github.com/EvolvingAgentsLabs/llmunix-marketplace


The release of Anthropic's "Imagine with Claude" is fascinating. It shows a model that doesn't generate code to build a UI; it uses tools to construct the UI directly. This feels like a major shift from the "AI as a copilot" paradigm to "AI as a runtime." This has been a core question behind an open-source project I've been working on with Ismael Faro, called LLMunix https://github.com/EvolvingAgentsLabs/llmunix . Our approach is to build an entire OS for agents where the "executables" are not binaries, but human-readable Markdown files. The LLM interprets these files to orchestrate complex workflows. The linked article is my analysis of these two approaches. It argues that while direct interpretation is incredibly powerful, an open, transparent, and auditable framework (like our Markdown-based one) is crucial for the future of agentic systems. Curious to hear what HN thinks. Are we moving towards a future where LLMs are the OS, and if so, what should the "assembly language" for that OS look like?


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: