Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

As an experiment, I set it up with a z.ai $3/month subscription and told it to do a tedious technical task. I said to stay busy and that I expect no more than 30 minutes of inactivity, ever.

The task is to decompile Wave Race 64 and integrate with libultraship and eventually produce a runnable native port of the game. (Same approach as the Zelda OoT port Ship of Harkinian).

It set up a timer ever 30 minutes to check in on itself and see if it gave up. It reviews progress every 4 hours and revisits prioritization. I hadn't checked on it in days and when I looked today it was still going, a few functions at a time.

It set up those times itself and creates new ones as needed.

It's not any one particular thing that is novel, but it's just more independent because of all the little bits.

 help



So, you don't know if it has produced anything valuable yet?

It's the same story with these people running 12 parallel agents that automatically implement issues managed in Linear by an AI product team that has conducted automated market and user research.

Instead of making things, people are making things that appear busy making things. And as you point out, "but to what end?" is a really important question, often unanswered.

"It's the future, you're going to be left behind", is a common cry. The trouble is, I'm not sure I've seen anything compelling come back from that direction yet, so I'm not sure I've really been left behind at all. I'm quite happy standing where I am.

And the moment I do see something compelling come from that direction, I'll be sure to catch up, using the energy I haven't spent beating down the brush. In the meantime, I'll keep an eye on the other directions too.


> Instead of making things, people are making things that appear busy making things.

Sounds like a regular office job.


Yeah I'm not sure I understand what the goal here is. Ship of Harkinian is a rewrite not just a decompilation. As a human reverse engineer I've gotten a lot of false positives.This seems like one of those areas where hallucinations could be really insidious and hard to identify, especially for a non-expert. I've found MCP to be helpful with a lot of drudgery, but I think you would have to review the llm output, do extensive debugging/dynamic analysis, triage all potential false positives, before attempting to embark on a rewrite based on decompiled assembly... I think OoT took a team of experts collectively thousands of person-hours to fully document, it seems a bit too hopeful to want that and a rewrite just from being pushy to an agent...

Step 1: Decompile into C that can be recompiled into a working ROM. In theory, it could be compiled into the same ROM that we started with. Consistent ROM hash is the main success criteria for the OoT decompilation project. Have it grind until it succeeds.

Step 2: Integrate libultraship. Launching the game natively is the next criteria. Then ideally we could do differential testing on a frame by frame basis comparing emulated vs native.

Step 3: Semantic documentation of source. If it gets this far, I will be very impressed.

This is absolutely an experiment. It's a hard problem with low stakes. There a lot to learn from it.


Not yet. But what's the actual goal here? It's not to have a native Wave Race 64. It's to improve my intuition around what sort of tasks can be worked on 24/7 without supervision.

I have a hypothesis that I can verify the result against the original ROM. With that as the goal, I believe the agent can continue to grind on the problem until it passes that verification. I've seen it in that of other areas, but this is something larger and more tedious and I wanted to see how far it could go.


That sound like being a manager IRL.

$3 z.ai subscription? Sounds like it already burned $3k

I find those toys in perfect alignment with what LLM provider thrive for. Widespread token consumption explosion to demonstrate investors: see, we told you we were right to invest, let's open other giga factories.


It's using about 100M input tokens a day on glm 4.7 (glm 5 isn't available on my plan). It's sticking pretty close to the throttling limits that reset every 5 hours.

100M input tokens is $40 and anywhere from 2-6 kWh.

Certainly excessive for my $3/month.


How's it burned $3k on a $3/month subscription running for a few days?

I simply don't get how it could have run for quite a while and only cost $3. Z.ai offers some of the best model out there. Several dollars per million tokens, this sort of bot to generate code would burn millions in less than 30 minutes.

> Several dollars per million tokens

The flagship, glm-5, is $1/M input tokens. glm-4.7 is $0.60/M input tokens.


They have a coding plan

And the $3 plan also has significant latency compared with their higher tier plans.

What a great use of humanity's adn the earth's resources.

Keep us posted, this sounds great!



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: