Hacker Newsnew | past | comments | ask | show | jobs | submit | bobjordan's commentslogin

Here is what I have my openclaw agent setup to do in my wsl environment on my 22 core development workstation in my office:

#1) I can chat with the openclaw agent (his name is "Patch") through a telegram chat, and Patch can spawn a shared tmux instance on my 22 core development workstation. #2) I can then use the `blink` app on my iphone + tailscale and that allows me to use a command in blink `ssh dev` which connects me via ssh to my dev workstation in my office, from my iphone `blink` app.

Meanwhile, my agent "Patch" has provided me a connection command string to use in my blink app, which is a `tmux <string> attach` command that allows me to attach to a SHARED tmux instance with Patch.

Why is this so fking cool and foundationally game changing?

Because now, my agent Patch and I can spin up MULTIPLE CLAUDE CODE instances, and work on any repository (or repositories) I want, with parallel agents.

Well, I could already spawn multiple agents through my iphone connection without Patch, but the problem is then I need to MANAGE each spawned agent, micromanaging each agent instance myself. But now, I have a SUPERVISOR for all my agents, Patch is the SUPERVISOR of my muliple claude code instances.

This means I no longer have to context switch by brain between five or 10 or 20 different tmux on my own to command and control multiple different claude code instances. I can now just let my SUPERVISOR agent, Patch, command and control the mulitple agents and then report back to me the status or any issues. All through a single telegram chat with my supervisor agent, Patch.

This frees up my brain to only have to just have to manage Patch the supervisor, instead of micro-managing all the different agents myself. Now, I have a true management structure which allows me to more easily scale. This is AWESOME.


This feels like the "prompt engineering" wave of 2023 all over again. A bunch of hype about a specific point-in-time activity based on a lot of manual setup of prompts compared to naive "do this thing for me" that eventually faded as the tooling started integrating all the lessons learned directly.

I'd expect that if there is a usable quality of output from these approaches it will get rolled into existing tools similarly, like how multi-agents using worktrees already was.


2023 was the year of “look at this dank prompt I wrote yo”-weekly demos.

And 2026 is shaping up to be the year of "look at this prompt my middle manager agent wrote for his direct reports" :)

Maybe this is just a skill issue on my part, but I'm still trying to wrap my head around the workflow of running multiple claude agents at once. How do they not conflict with each other? Also how do you have a project well specified enough that you can have these agents working for hours on end heads down? My experience as a developer (even pre AI) has mostly been that writing-code-fast has rarely been the progress limiter.. usually the obstacles are more like, underspecified projects, needing user testing, disagreements on the value of specific features, subtle hard to fix bugs, communication issues, dealing with other teams and their tech, etc. If I have days where I can just be heads down writing a ton of code I'm very happy.

I can't imagine letting a current gen LLM supervise Claude Code instances. How could that possibly lead to even remotely acceptable software quality?

I spec out everything in excruciating detail with spec docs. Then I actually read them. Finally, we create granular tasks called "beads" (see https://github.com/steveyegge/beads). The beads allows us to create epics/tasks/subtasks and associated dependency structure down to a granular bead, and then the agents pull a "bead" to implement. So, mostly we're either creating spec docs and creating beads or implementing, quality checking, and testing the code created from an agent implementing a bead. I can say this produces better code than I could write after 10yrs of focused daily coding myself. However, I don't think "vibe coders" that have never truly learned to code, have any realistic chance of creating decent code in a large complex code base that requires a complex backend schema to be built. They can only build relatively trivial apps. But, I do believe what I am building is as solid as if I had a millions of dollars of staff doing it with me.

But how is that less work and allows you to do that in Disneyland with your kids? For me, personally, there is little difference between "speccing out everything in excruciating detail in spec docs" and "writing actual implementation in high-level code". Speccing in detail requires deep thought, whiteboard, experimentation etc. All of this cannot be done in Disneyland, and no AI can do this at good level (that's why you "spec out everything in detail", create "beads" and so on?)

Yes, I normally draft spec docs in the office at my desk, this is true. However, when I have the spec ready for implementation with clear "beads", I can reasonably plan to leave my office and work from my phone. Its not at a point where I can just work 100% remote from my phone (I probably could but this is all still new to me too). But it does give me the option to be vastly more productive, away from my desk.

Do you have any code publicly available so we could see what kind of code this sort of setup produces?

Not yet, but I can tell you that producing "good" code is another layer altogether. I have custom linters, code standardization docs, custom prompts, strictly enforced test architecture (enforced by the custom linters in pre-commit hooks which run before an agent tries to commit). Ultimately, it's a lot of work to get all the agents with a limited context writing code in the way you want. In the main large complex project I am generally working on now, I have hand-held and struggled for over a year getting it all setup the way I need it. So I can't say its been a weekend setup for me. It's been a long arduous process to get where I am now in my 2-3 main repos that I work on. However, the workflow I just shared above, can help people get there a lot faster.

> but I can tell you that producing "good" code is another layer altogether.

I feel like it isn't. If the fundamental approach is good, "good" code should be created as a necessity and because there wouldn't be another way. If it's already a mess with leaking abstractions and architecture that doesn't actually enforce any design, then it feels unlikely you'll be able to stack anything on top of below it to actually fix that.

And then you end up with some spaghetti that the agent takes longer and longer to edit as things get more and more messy.


Here is my view after putting in my 10,000+ hours learning to code pre-llm days, while also building a pretty complex design + contract manufacturing company, with a lot of processes in place for that to happen. If you have a bunch of human junior devs and even a senior dev or two that join your org to help you build an app, and you don't have any dev/ops structure in place for them, then you will end up with "spaghetti" throughout all your code/systems, from those relatively bright humans. Its the same managing agents. It cannot be expected to build a complex system from simple "one shot me a <x> feature" from a bunch of different agents, each with a finite ~150k token context limit. It must be done in context of the system you have in place. If you have a poor/no system structure, you'll end up with garbage for code. Everything that I said I had to guide the agents, is also useful for human devs. I'm sure that all the FANGS and various advanced software companies also use custom linters, etc., for every code check in. It's just now become easier to have these advanced code quality structures in place, and it is absolutely necessary when managing/coordinating agents to build a complex application.

I've clocked some hours too, and I think as soon as you let something messy in, you're already losing. The trick isn't "how to manage spaghetti" with LLMs (nor humans), because the context gets all wired up, but how to avoid it from first place. You can definitely do "one-shot" over and over again with a small context and build something complex, as long as you take great care about what goes into the context, more isn't better.

Anyways, feels like we have pretty opposite perspectives, I'm glad we're multiple people attacking similar problems but from seemingly pretty different angles, helps to find the best solutions. I wish you well regardless and hope you manage to achieve what you set out to do :)


I don’t get it, and that doesn’t mean it’s not a bad thing necessarily. I’ve been doing systems things for a long time and I’m quite good at it but this is the first time none of this excites me.

Instead of sitting in my office for 12 hours working with 20 open terminals (exactly what I have open right now on my machine). I can take my kids to Disneyland (I live in Southern California and it's nearby) and work on my iphone talking to "Patch" while we stand in line for an hour to get on a ride. Meanwhile. my openclaw agent "Patch" manages my 20 open terminals on my development workstation in my office. Patch updates me and I can make decisions, away from my desk. That should excite anyone. It gives me back more of my time on earth, while getting about the same (or more) work done. There is literally nothing more valuable to me than being able to spend more time away from my desk.

If this is actually true, then what will soon happen is you will be expected to manage more separate “Patch” instances until you are once again chained to your desk.

Maybe the next bottleneck will be the time needed to understand what features actually bring value?


What if he works for himself?

Not a $DayJob?


Then he is in competition with everyone else who are at their desk managing ### of open terminals

I appreciate your insight, even if the workflow seems alien to me. I admit I like the idea of freeing myself from a desk though. If you don't mind me asking, how much does this all cost per month?

Edit: I see you've answered this here: https://news.ycombinator.com/item?id=46839725 Thanks for being open about it.


Thanks. I just mentioned elsewhere, right now I spend $200 on claude code 20x plan + $200 on openAI's similar plan, per month. I probably have a few more small conveniences that cost ~$10-$20 a few places, like an obsidian vault synch for documentation vaults on both my dev workstation and my phone, comes to mind. Most weeks I could cut one of the $200 plans, but both claude code and codex have different strengths, and I like to have them double check each others work, so to me that's worth carrying both subscriptions.

i have been recently quite enamoured with using both the ChatGPT mobile app (specifically the Codex part) and the Github mobile app, along with Codex. with an appropriate workflow, i've been able to deploy features to some [simple] customer-facing apps while on the go. it's very liberating!

GP's setup sounds like the logical extension to what i'm doing. not just code, but sessions within servers? are sysadmins letting openclawd out and about on their boxes these days?


Yes I've also used that codex workflow and its pretty useful, but the "real time" interactivity and control is just not at the same level.

Please show us something you’ve produced this way.

> MULTIPLE CLAUDE CODE INSTANCES

a lotta yall still dont get it

molt holders can use multiple claude code instances on a single molt


Slurp Juice is still the only good thing to come out of crypto. I hope AI leaves us with at least one good meme.

You are absolutely right that I probably still "don't" get it, I am still shocking myself on a daily basis with all the stuff I didn't fully get grasp ahold of. I recently updated claude code and yesterday had one agent that used the new task system and blew my mind with what he got accomplished. This tech is all moving so fast!

My multitasking operating system would like a word..

/s


What are you coding with this? Is it a product you're trying to launch, an existing product with customers or custom work for someone else?

This just sounds ridiculously expensive. Burning hundreds of dollars a day to generate code of questionable utility.

Personally, I spend $200 on claude code 20x plan + $200 on openAI's similar plan, per month. So, yeah, I spend $400 per month. I buy and use both because they have different and complimentary strengths. I have only very rarely almost reached the weekly capacity limit on either of those plans. Usually I don't need to worry about how much I use them. The $400 may be expensive to some people but frankly I pay some employees a lot more each month and get a lot less for my money.

Automated usage like you described violates Anthropic's terms of service.

It's just a matter of time until they ban your account.


They make it easy to spin up parallel agents. Managing them efficiently through a shared tmux instance isn't banned anywhere in the TOS, AFAIK. I'd worry more about it if I had to use multiple accounts or something using round-the-clock "automated" work flow. I'm using one account. Hell, the workflow I described, I am even actively logged in to my dev workstation with tmux and able to see and interact with each instance and "micro-manage" them myself, individually. The main benefit of this workflow is that I also have a single shared LLM instance that also has access to all the instances, together with me. I have plenty of other things to worry about besides a banned account from an efficient workflow I've set up.

just throwing out there that yesterday Boris (lead engineer for Claude code) literally told everyone on Twitter that the CC teams number one recommendation for users is that they should be kicking off multiple instances / agents in parallel. not sure if that's what you're referring to, but if so I'd be very surprised if they ban someone for heavy use of that workflow

Gastown also had a supervisor “mayor”. How is this one different?

These kind of posts are why I check HN pretty much every day for 15+ yrs now. Hard to believe I've missed this one. Glad I caught it this time! This posts reminds me to stay humble and avoid jumping to conclusions without analysis.

That story has been repeated in various places for decades now.

One of the lucky 10,000! https://xkcd.com/1053/

I think I’ve been one of those 2 or 3 times for this story. Read it, forgot it, read it again and only remembered after some time :D

I was one of the lucky ones today

Such a refreshingly good attitude!

Process and plumbing become very important when using ai for coding. Yes, you need good prompts. But as the code base gets more complex, you also need to spend significant time developing test guides, standardization documents, custom linters, etc, to manage the agents over time.


Linters...custom made pre-commit linters which are aligned with your code base needs. The agents are great at creating these linters and then forevermore it can help feedback and guide them. My key repo now has "audit_logging_linter, auth_response_linter, datetime_linter, fastapi_security_linter, fastapi_transaction_linter, logger_security_linter, org_scope_linter, service_guardrails_linter, sql_injection_linter, test_infrastructure_linter, token_security_checker..." basically every time you find an implementation gap vs your repo standards, make a linter! Of course, need to create some standards first. But if you know you need protected routes and things like this, then linters can auto-check the work and feedback to the agents, to keep them on track. Now, I even have scripts that can automatically fix the issues for the agents. This is the way to go.


Aren't many of those tests? Why define them as linters?


Great question, I let Claude help answer this...see below:

The key differences are:

  1. Static vs Runtime Analysis                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            
  Linters use AST parsing to analyze code structure without executing it. Tests verify actual runtime behavior. Example from our datetime_linter:                                                                                                                                                                
                                                                                                                                                                                                                                                                                                                 
  tree = ast.parse(file_path.read_text())                                                                                                                                                                                                                                                                        
  for node in ast.walk(tree):                                                                                                                                                                                                                                                                                    
      if isinstance(node, ast.Import):                                                                                                                                                                                                                                                                           
          if alias.name == "datetime":                                                                                                                                                                                                                                                                           
              # Violation: should use pendulum                                                                                                                                                                                                                                                                   
                                                                                                                                                                                                                                                                                                                 
  This catches import datetime syntactically. A test would need to actually execute code and observe wrong datetime behavior.                                                                                                                                                                                    
                                                                                                                                                                                                                                                                                                                 
  2. Feedback Loop Speed                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     
  - Linters: Run in pre-commit hooks. Agent writes code → instant feedback → fix → iterate in seconds                                                                                                                                                                                                            
  - Tests: Run in CI. Commit → push → wait minutes/hours → fix in next session                                                                                                                                                                                                                                   
                                                                                                                                                                                                                                                                                                                 
  For AI agents, this is critical. A linter that blocks commit keeps them on track immediately rather than discovering violations after a test run.                                                                                                                                                              
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             
  3. Structural Violations
  For example, our `fastapi_security_linter` catches things like "route missing TenantRouter decorator". These are structural violations - "you forgot to add X" - not "X doesn't work correctly." Tests verify the behavior of X when it exists.

  4. Coverage Exhaustiveness                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            
  Linters scan all code paths structurally. Tests only cover scenarios you explicitly write. Our org_scope_linter catches every unscoped platform query across the entire codebase in one pass. Testing that would require writing a test for each query.                                                        
                                                                                                                                                                                                                                                                                                                 
  5. The Hybrid Value                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          
  We actually have both. The linter catches "you forgot the security decorator" instantly. The test (test_fastapi_authorization.py) verifies "the security decorator actually blocks unauthorized users at runtime." Different failure modes, complementary protections.                                         
                                                                                                                                                                                                                                                                                                                 
  Think of it like: linters are compile-time checks, tests are runtime checks. TypeScript catches string + number at compile time; you don't write a test for that.


Not sure you’ve actually tried using it, but beads has been an absolute game changer for my projects. “Game changer” is even underselling it.


Beads was phenomenal back in October when it was released. Unfortunately it has somehow grown like a cancer. Now 275k lines of Go for task tracking? And no human fully knows what it is all doing. Steve Yegge is quite proud to say he's never looked at any of its code. It installs magic hooks and daemons all over your system and refuses to let go. Most user hostile software I've used in a long time.

Lot of folks rolling their own tools as replacements now. I shared mine [0] a couple weeks ago and quite a few folks have been happy with the change.

Regardless of what you do, I highly recommend to everyone that they get off the Beads bandwagon before it crashes them into a brick wall.

[0] https://github.com/wedow/ticket


If your task tracking app is 275k lines you fucked up.


The LLM providers got paid.

Reminds me of an offshore project I was involved with at one point. It had something like 7 managers and 4 years and over 30 developers had worked on it. The billing had reached into the millions. It was full of never ending bugs. The amount of "extra" code and abstractions and interfaces was stuff of legends.

It was actually a month or three simple crud project for a 2 man development team.


yeah, I generally view the install script (for both this and almost everything else now since it's trivial with claude code) and then ensure I have a sane install for my system needs. But, I'm on the latest beads 0.47.1 and what I did to tame it is, I just walked through creating SKILLS with claude and codex, and frankly I've found a lot of value add to the features added so far. I especially love the --claim which keeps the agents from checking out beads that are already checked out. And after I added SKILLS, the agents do an awesome job networking the dependencies together, which helps keep multi-agent workflows on track. Overall, I'm not feeling any reason to switch from beads right now, but I will also be upgrading more thoughtfully, so I don't break my current workflow.


How do you handle the dogs ignoring the deacons and going after the polecats though? Seems like the mayor should get involved to me.


Without context I would have thought this post came from a video game forum or a mentally ill person. I'm not dissing you personally.


I havent tried gas town yet. I have a pretty good multi-agent workflow by just using beads directly along with thoughtfully produced prompts.


I'm not entitled to your time of course, but would you mind describing how?

All I know is beads is supposed to help me retain memory from one session to the next. But I'm finding myself having to curate it like a git repo (and I already have a git repo). Also it's quite tied to github, which I cannot use at work. I want to use it but I feel I need to see how others use it to understand how to tailor it for my workflow.


Probably the wrong attitude here - beads is infra for your coding agents, not you. The most I directly interact with it is by invoking `bd prime` at the start of some sessions if the LLM hasn’t gotten the message; maybe very occasionally running `bd ready` — but really it’s a planning tool and work scheduler for the agents, not the human.

What agent do you use it with, out of curiosity?

At any rate, to directly answer your question, I used it this weekend like this:

“Make a tool that lets me ink on a remarkable tablet and capture the inking output on a remote server; I want that to send off the inking to a VLM of some sort, and parse the writing into a request; send that request and any information we get to nanobanana pro, and then inject the image back onto the remarkable. Use beads to plan this.”

We had a few more conversations, but got a workable v1 out of this five hours later.


To use it effectively, I spend a long time producing FSD (functional specification documents) to exhaustively plan out new features or architecture changes. I'll pass those docs back and forth between gemini, codex/chatgpt-pro, and claude. I'll ask each one something similar to following (credit to https://github.com/Dicklesworthstone for clearly laying out the utility of this workflow, these next few quoted prompts are verbatim from his posts on x):

"Carefully review this entire plan for me and come up with your best revisions in terms of better architecture, new features, changed features, etc. to make it better, more robust/reliable, more performant, more compelling/useful, etc.

For each proposed change, give me your detailed analysis and rationale/justification for why it would make the project better along with the git-diff style changes relative to the original markdown plan".

Then, the plan generally iteratively improves. Sometimes it can get overly complex so may ask them to take it down a notch from google scale. Anyway, when the FSD doc is good enough, next step is to prepare to create the beads.

At this point, I'll prompt something like:

"OK so please take ALL of that and elaborate on it more and then create a comprehensive and granular set of beads for all this with tasks, subtasks, and dependency structure overlaid, with detailed comments so that the whole thing is totally self-contained and self-documenting (including relevant background, reasoning/justification, considerations, etc.-- anything we'd want our "future self" to know about the goals and intentions and thought process and how it serves the over-arching goals of the project.) Use only the `bd` tool to create and modify the beads and add the dependencies. Use ultrathink."

After that, I usually even have another round of bead checking with a prompt like:

"Check over each bead super carefully-- are you sure it makes sense? Is it optimal? Could we change anything to make the system work better for users? If so, revise the beads. It's a lot easier and faster to operate in "plan space" before we start implementing these things! Use ultrathink."

Finally, you'll end up with a solid implementation roadmap all laid out in the beads system. Now, I'll also clarify, the agents got much better at using beads in this way, when I took the time to have them create SKILLS for beads for them to refer to. Also important is ensuring AGENTS.md, CLAUDE.md, GEMINI.md have some info referring to its use.

But, once the beads are laid out then its just a matter of figuring out, do you want to do sequential implementation with a single agent or use parallel agents? Effectively using parallel agents with beads would require another chapter to this post, but essentially, you just need a decent prompt clearly instructing them to not run over each other. Also, if you are building something complex, you need test guides and standardization guides written, for the agents to refer to, in order to keep the code quality at a reasonable level.

Here is a prompt I've been using as a multi-agent workflow base, if I want them to keep working, I've had them work for 8hrs without stopping with this prompt:

EXECUTION MODE: HEADLESS / NON-INTERACTIVE (MULTI-AGENT) CRITICAL CONTEXT: You are running in a headless batch environment. There is NO HUMAN OPERATOR monitoring this session to provide feedback or confirmation. Other agents may be running in parallel. FAILURE CONDITION: If you stop working to provide a status update, ask a question, or wait for confirmation, the batch job will time out and fail.

  YOUR PRIMARY OBJECTIVE: Maximize the number of completed beads in this single session. Do not yield control back to the user until the entire queue is empty or a hard blocker (missing credential) is hit.

  TEST GUIDES: please ingest @docs/testing/README.md, @docs/testing/golden_path_testing_guide.md, @docs/testing/llm_agent_testing_guide.md, @docs/testing/asset_inventory.md, @docs/testing/advanced_testing_patterns.md, @docs/testing/security_architecture_testing.md
  STANDARDIZATION: please ingest @docs/api/response_standards.md @docs/event_layers/event_system_standardization.md
─────────────────────────────────────────────────────────────────────────────── MULTI-AGENT COORDINATION (MANDATORY) ───────────────────────────────────────────────────────────────────────────────

  Before starting work, you MUST register with Agent Mail:

  1. REGISTER: Use macro_start_session or register_agent to create your identity:
     - project_key: "/home/bob/Projects/honey_inventory"
     - program: "claude-code" (or your program name)
     - model: your model name
     - Let the system auto-generate your agent name (adjective+noun format)

  2. CHECK INBOX: Use fetch_inbox to check for messages from other agents.
     Respond to any urgent messages or coordination requests.

  3. ANNOUNCE WORK: When claiming a bead, send a message to announce what you're working on:
     - thread_id: the bead ID (e.g., "HONEY-2vns")
     - subject: "[HONEY-xxxx] Starting work"
─────────────────────────────────────────────────────────────────────────────── FILE RESERVATIONS (CRITICAL FOR MULTI-AGENT) ───────────────────────────────────────────────────────────────────────────────

  Before editing ANY files, you MUST:

  1. CHECK FOR EXISTING RESERVATIONS:
     Use file_reservation_paths with your paths to check for conflicts.
     If another agent holds an exclusive reservation, DO NOT EDIT those files.

  2. RESERVE YOUR FILES:
     Before editing, reserve the files you plan to touch:
     ```
     file_reservation_paths(
       project_key="/home/bob/Projects/honey_inventory",
       agent_name="<your-agent-name>",
       paths=["honey/services/your_file.py", "tests/services/test_your_file.py"],
       ttl_seconds=3600,
       exclusive=true,
       reason="HONEY-xxxx"
     )
     ```

  3. RELEASE RESERVATIONS:
     After completing work on a bead, release your reservations:
     ```
     release_file_reservations(
       project_key="/home/bob/Projects/honey_inventory",
       agent_name="<your-agent-name>"
     )
     ```

  4. CONFLICT RESOLUTION:
     If you encounter a FILE_RESERVATION_CONFLICT:
     - DO NOT force edit the file
     - Skip to a different bead that doesn't conflict
     - Or wait for the reservation to expire
     - Send a message to the holding agent if urgent
─────────────────────────────────────────────────────────────────────────────── THE WORK LOOP (Strict Adherence Required) ───────────────────────────────────────────────────────────────────────────────

* ACTION: Immediately continue to the next bead in the queue and claim it

  For every bead you work on, you must perform this exact cycle autonomously:

   1. CLAIM (ATOMIC): Use the --claim flag to atomically claim the bead:
      ```
      bd update <id> --claim
      ```
      This sets BOTH assignee AND status=in_progress atomically.
      If another agent already claimed it, this will FAIL - pick a different bead.

        WRONG: bd update <id> --status in_progress  (doesn't set assignee!)
        RIGHT: bd update <id> --claim                (atomic claim with assignee)

   2. READ: Get bead details (bd show <id>).

   3. RESERVE FILES: Reserve all files you plan to edit (see FILE RESERVATIONS above).
      If conflicts exist, release claim and pick a different bead.

   4. PLAN: Briefly analyze files. Self-approve your own plan immediately.

   5. EXECUTE: Implement code changes (only to files you have reserved).

   6. VERIFY: Activate conda honey_inventory, run pre-commit run --files <files you touched>, then run scoped tests for the code you changed using ~/run_tests (test URLs only; no prod secrets).
       * IF FAIL: Fix immediately and re-run. Do not ask for help as this is HEADLESS MODE.
       * Note: you can use --no-verify if you must if you find some WIP files are breaking app import in security linter, the goal is to help catch issues to improve the codebase, not stop progress completely.

   7. MIGRATE (if needed): Apply migrations to ALL 4 targets (platform prod/test, tenant prod/test).

   8. GIT/PUSH: git status → git add only the files you created or changed for this bead → git commit --no-verify -m "<bead-id> <short summary>" → git push. Do this immediately after closing the bead. Do not leave untracked/unpushed files; do not add unrelated files.

   9. RELEASE & CLOSE: Release file reservations, then run bd close <id>.

  10. COMMUNICATE: Send completion message via Agent Mail:
      - thread_id: the bead ID
      - subject: "[HONEY-xxxx] Completed"
      - body: brief summary of changes

  11. RESTART: Check inbox for messages, then select the next bead FOR EPIC HONEY-khnx, claim it, and jump to step 1.
─────────────────────────────────────────────────────────────────────────────── CONSTRAINTS & OVERRIDES ───────────────────────────────────────────────────────────────────────────────

   * Migrations: You are pre-authorized to apply all migrations. Do not stop for safety checks unless data deletion is explicit.
   * Progress Reporting: DISABLE interim reporting. Do not summarize after one bead. Summarize only when the entire list is empty.
   * Tracking: Maintain a running_work_log.md file. Append your completed items there. This file is your only allowed form of status reporting until the end.
   * Blockers: If a specific bead is strictly blocked (e.g., missing API key), mark it as blocked in bd, log it in running_work_log.md, and IMMEDIATELY SKIP to the next bead. Do not stop the session.
   * File Conflicts: If you cannot reserve needed files, skip to a different bead. Do not edit files reserved by other agents.

  START NOW. DO NOT REPLY WITH A PLAN. REGISTER WITH AGENT MAIL, THEN START THE NEXT BEAD IN THE QUEUE IMMEDIATELY. HEADLESS MODE IS ON.


Thanks for these notes. Very interesting! Looking forward to experimenting.


I do similar except I log into my office workstation and avoid the extra fees. I detailed my setup in an x post here https://x.com/bobjordanjr/status/1999967260887421130?s=20 and the TLDR is:

1.Install Tailscale on WSL2 and your iPhone 2.Install openssh-server on WSL2 3.Get an SSH terminal app (Blink, Termius, etc.). I use blink ($20/yr). 4.SSH from Blink to your WSL2’s Tailscale IP 5. Run claude code inside tmux on your phone.

Tailscale handles the networking from anywhere. tmux keeps your session alive if you hit dead spots. Full agentic coding from your phone.

Step 2: SSH server In WSL2:

sudo apt install openssh-server sudo service ssh start

Run tailscale ip to get your WSL2’s IP (100.x.x.x). That’s what you’ll connect to from your phone.

Step 3: Passwordless login In Blink, type config → Keys → + → create an Ed25519 key. Copy the public key. On WSL2:

echo "your-public-key" >> ~/.ssh/authorized_keys

Then in Blink: Hosts → + → add your Tailscale IP, username, and select your key. Now it’s one tap to connect.

Step 4: tmux keeps you alive iOS kills background SSH connections. tmux solves this.

sudo apt install tmux tmux claude

Switch apps, connection dies, no problem. Reconnect: I can just type `ssh dev` in blink and I'm in my workstation, then `tmux attach`, you’re right back in your session.

Pro tip: multiple Claude sessions Inside tmux: •Ctrl+b c — new window •Ctrl+b 0/1/2 — switch windows I run different repos or multiple agents in the same repo, in different windows and jump between them. Full multi-project workflow from my phone.


I’ve been a fan of this philosophy since the Intercooler.js days. In fact, our legacy customer portal at bomquote.com still runs on Intercooler. I spent the last year building a new version using the "modern" version of that stack: Flask, HTMX, Alpine, and Tailwind.

However, I’ve recently made the difficult decision to rewrite the frontend in React (specifically React/TS, TanStack Query, Orval, and Shadcn). In a perfect world, I'd rewrite the python backend in go, but I have to table that idea for now.

The reason? The "LLM tax." While HTMX is a joy for manual development, my experience the last year is that LLMs struggle with the "glue" required for complex UI items in HTMX/Alpine. Conversely, the training data for React is so massive and the patterns so standardized that the AI productivity gains are impossible to ignore.

Recently, I used Go/React for a microservice that actually has turned into similarly complex scale as the python/htxm app I focused on most of the year, and it was so much more productive than python/htmx. In a month of work I got done what took me about 4-5 months in python/htmx. I assume because the typing with go and also LLM could generate perfectly typed hooks from my OpenAPI spec via Orval and build out Shadcn components without hallucinating.

I still love the HTMX philosophy for its simplicity, but in 2024/2025, I’ve found that I’m more productive choosing the stack that the AI "understands" best. For new projects, Go/React will now my default. If I have to write something myself again (God, I hope not) I may use htmx.


This got me thinking: I am not about to fight windmills and the future will unfold as it will, but I think the idea of "LLM as a compiler of ideas to high-level languages" can turn out to be quite dangerous. It is one thing to rely on and not to be able to understand the assembly output of a deterministic compiler of a C++ program. It is quite another to rely on but not fully understand (whether due to lazyness or complexity) what is in the C++ code that a giant nondeterministic intractable neural network generated. what is guaranteed is that the future will be interesting...


The way I'm keeping up with it (or deluding myself into believing I am keeping up with it) is by maintaining rigorous testing and test standards. I have used LLMs to assist me building C firmware for some hardware projects. But the scale of that has been such that it can also be well tested. Anyway, part of the reason I was so much slower with python is I'm an expert at all the tech I used, spending literal years of my life in the docs and reading books, etc., and I've read everything the LLM wrote to double check it. I'm not so literate with go but its not very complex, and given the static nature, I just trusted the LLM more than I did with python. The react stack I am learning as I go, but the tooling is so good, and I understand the testing aspects, same issue, I trusted the LLM more and have been more productive. Anwyay, times are changing fast!


I went through similar song and dance using a paid Gemini code assist “standard” level subscription. I finally got Gemini 3 working in my terminal in my repository. I assigned it a task that Claude code Opus 4.5 would quickly knock out, and Gemini 3 did a reasonably similar job. I had opus 4.5 evaluate the work and it was complimentary of Gemini 3S work. Then I check the usage and I’d used 10% of the daily token usage limit, about 1.5M tokens on that one task. So I can only get about 10 tasks before I’m rate limited. Meanwhile with Claude code $200 max plan, I can run 10 of those same caliber of tasks in parallel, even with opus 4.5 model, and barely register the usage meter. The only thing the Gemini code assist “standard” plan will be good for with these limits are just double checking the plans that opus 4.5 makes. Until the usage limits are increased, it’s pretty useless compared to Claude code max plan. But there doesn’t seem to be any similar plan offering from Google.


Man, I definitely feel this, being in the international trade business operating an export contract manufacturing company from China, with USA based customers. I can’t think of many shittier businesses to be in this year, lol. Actually it’s been pretty difficult for about 8 years now, given trade war stuff actually started in 2017, then we had to survive covid, now trade war two. It’s a tough time for a lot of SMEs. AI has to be a handful for classic web/design shops to handle, on top of the SMEs that usually make up their customer base, suffering with trade wars and tariff pains. Cash is just hard to come by this year. We’ve pivoted to focus more on design engineering services these past eight years, and that’s been enough to keep the lights on, but it’s hard to scale, it is just a bandwidth constrained business, can only take a few projects at a time. Good luck to OP navigating it.


I tried to use go in a project 6-7 years ago and was kind of shocked by needing to fetch packages directly from source control with a real absence of built in versioning. That turned me off and I went back to python. I gather that now there’s a new system with go modules. I should probably revisit it.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: