Hacker Newsnew | past | comments | ask | show | jobs | submit | akiselev's commentslogin

> Curious to hear if people have use cases where they find 1M works much better!

Reverse engineering [1]. When decompiling a bunch of code and tracing functionality, it's really easy to fill up the context window with irrelevant noise and compaction generally causes it to lose the plot entirely and have to start almost from scratch.

(Side note, are there any OpenAI programs to get free tokens/Max to test this kind of stuff?)

[1] https://github.com/akiselev/ghidra-cli


OpenAi has program for trusted cybersecurity researchers https://openai.com/index/trusted-access-for-cyber/

Totally agreed. I’ve been reverse engineering Altium’s file format to enable agents to vibe-engineer electronics and though I’m on my third from scratch rewrite in as many weeks, each iteration improves significantly in quality as the previous version helps me to explore the problem space and instruct the agent on how to do red/green development [1]. Each iteration is tens of thousands of lines of code which would have been impossible to write so fast before so it’s been quite a change in perspective, treating so much code as throw away experimentation.

I’m using a combination of 100s of megabytes of Ghidra decompiled delphi DLLs and millions of lines of decompiled C# code to do this reverse engineering. I can’t imagine even trying such a large project for LLMs so while a good implementation is still taking a lot of time, it’s definitely a lot cheaper than before.

[1] I saw your red/green TDD article/book chapter and I don’t think you go far enough. Since we have agents, you can generalize red/green development to a lot of things that would be impractical to implement in tests. For example I have agents analyze binary diffs of the file format to figure out where my implementation is incorrect without being bogged down by irrelevant details like the order or encoding of parameters. This guides the agent loop instead of tests.


When I was developing my ghidra-cli tool for LLMs to use, I was using crackmes as tests and it had no problem getting through obfuscation as long as it was prompted about it. In practice when reverse engineering real software it can sometimes spin in circles for a while until it finally notices that it's dealing with obfuscated code, but as long as you update your CLAUDE.md/whatever with its findings, it generally moves smoothly from then on.


Is it also possible that crackme solutions were already in the training data?


I used the latest submissions from sites like crackmes.ones which were days or weeks old to guard against that.


The book Brett uses as his main source, Waging A Good War, is an incredible book that I strongly recommend. It treats the Civil Rights movement as a military campaign and analyzes it from the perspective of a military historian.

Not in the sense that it was viewed as a war by the protestors, but in the sense that the logistics, training, and operations of the Civil Rights movement were a well oiled machine that looked like a well organized, but nonviolent, army (including counterexamples where there was no organization).

One of the most memorable details is how James Lawson trained in nonviolence under Ghandi and came over to train protestors in nonviolent tactics. They gathered in church basements to scream insults and spit on each other to prepare for the restaurant sitins and other ops.


[flagged]


What, since released, internal memos or journals from mid-century civil rights leaders have revealed that destroying the constitution was their objective? Seems like a stretch.


I believe the civil rights leaders themselves were mostly genuine. I think they were used as useful idiots on a couple instances to support the two most destructive policies of the US.

(1) Secession. This was used for evil in the form of slavery. But it is the most powerful check of federal power by the states we had. The fact it could be used for evil did not mean it is better to get rid of it.

(2) Expansion of the interstate commerce clause to mean basically anything. A main argument for why this can't be reversed is that it would destroy the civil rights acts, which acts upon even intrastate business. Rather what should have happened is 15th amendment should have been written to apply to private entities as well, instead of blasting away the interstate commerce clause.


Im certainly sympathetic to #2 being one of the greatest unconstitutional practices of the modern US government, but is its genesis really the civil rights movement? There were many settled cases about interstate commerce before the Civil rights act, like Gibbons v. Ogden.

https://www.britannica.com/money/commerce-clause/Interpretat...


You're absolutely right -- it's not really the genesis per se on #2, just one of the modern weapons used. Civil rights act is one of the main weapons used today to explain why we can't wind back interstate commerce clause, creating a sort of legal suicide pact where the interstate commerce clause interpretation is held hostage if you want to keep your civil rights. That is, the CRA was arguably one of the most important things for double sealing the deal on progressive era expansion of the ICC.

Many times here on HN I have debated people who were well versed on constitutional law, and when I mention rolling back the interstate commerce clause one of their main go to is that they're afraid I will destroyed the CRA and that's why they can't do it. And they're right -- a nearly identical on many points CRA happened in 1875 as the one passed in 1964. The 14th and 15th amendment existed at both times, and the relevant points of the constitution stayed the same. Yet the latter was found constitution and the former was not, in large part due to the change in the meaning of the interstate commerce clause.


> when I mention rolling back the interstate commerce clause one of their main go to is that they're afraid I will destroyed the CRA and that's why they can't do it

I'll be honest, I've literally never seen this argument in any hall of power. And I know quite a few folks who believe in overturning Wickard.

The CRA, as currently interpreted, is more than fine on equal-protection grounds.


Overturning of CRA of 1875 ruled equal protection under 14th amendment doesn't bind private actors, that's why the CRA of 1964/68 depending on expanded ICC. The equal protection amendments (basically the 14th) of relevance haven't changed since the overturning of the 1875 CRA.

  The Reconstruction era ended with the resolution of the 1876 presidential election, and the Civil Rights Act of 1875 was the last federal civil rights law enacted until the passage of the Civil Rights Act of 1957. In 1883, the Supreme Court ruled in the Civil Rights Cases that the public accommodation sections of the act were unconstitutional, saying Congress was not afforded control over private persons or corporations under the Equal Protection Clause. Parts of the Civil Rights Act of 1875 were later re-adopted in the Civil Rights Act of 1964 and the Civil Rights Act of 1968, both of which cited the Commerce Clause as the source of Congress's power to regulate private actors.[]
of particular note: were later re-adopted in the Civil Rights Act of 1964 and the Civil Rights Act of 1968, both of which cited the * Commerce Clause as the source of Congress's power to regulate private actors.

* my note: now expanded

[] https://en.wikipedia.org/wiki/Civil_Rights_Act_of_1875


(2) is not a problem if you enact equivalent civil rights acts in every state. There would be plenty of political support for doing this today, including in the Sunbelt - which there wasn't in the 1950s.


I think “equivalent” would be the challenge here. When people need to know at all the nuances of what bathroom and restaurants they’re allowed to use, and what train cars when business or pleasure takes them across state lines it becomes a pretty large tax both for the individual and for interstate commerce at large


The bathroom issue is especially silly. Just mandate that public restrooms have to also include gender-neutral single-occupant bathrooms, that anyone can use as they desire.


Yes, but then achieving that mandate across the country becomes O(N) of states, all within the low bandwidth legislation process of state houses. Much simpler to just do it at the federal level, and still legitimately justifiable wrt interstate commerce imo


With that framing, aren’t those two outcomes detrimental side effects of achieving the objective, rather than the objective itself per your original comment?


The commenter you're responding to has an enlightening perspective on many things, but can't resist the temptation of framing their arguments in a needlessly inflammatory manner that bites off just a little more than is actually defensible. I chalk it up to age.


Freedom means freedom to exclude and alienate at the government level? Is that your argument? I can see your hypothesis, but I don't see your evidence.


Pound for pound, Hacker News has the best bad takes anywhere. This is an absolutely terrible take, but at least it's very interesting.


I'd recommend Slashdot...


The difference is there's a chance that they're trolling on slashdot. HN are genuine bad takes by intelligent people, I believe.


Fair points.


Shameless plug: https://github.com/akiselev/ghidra-cli

I’ve been using Ghidra to reverse engineer Altium’s file format (at least the Delphi parts) and it’s insane how effective it is. Models are not quite good enough to write an entire parser from scratch but before LLMs I would have never even attempted the reverse engineering.

I definitely would not depend on it for security audits but the latest models are more than good enough to reverse engineer file formats.


I can tell you how I am seeing agents be used with reasonable results. I will keep this high level. I don't rely on the agents solely. You build agents that augment your capabilities.

They can make diagrams for you, give you an attack surface mapping, and dig for you while you do more manual work. As you work on an audit you will often find things of interest in a binary or code base that you want to investigate further. LLMs can often blast through a code base or binary finding similar things.

I like to think of it like a swiss army knife of agentic tools to deploy as you work through a problem. They won't balk at some insanely boring task and that can give you a real speed up. The trick is if you fall into the trap of trying to get too much out of an LLM you end up pouring time into your LLM setup and not getting good results, I think that is the LLM productivity trap. But if you have a reasonable subset of "skills" / "agents" you can deploy for various auditing tasks it can absolutely speed you up some.

Also, when you have scale problems, just throw an LLM at it. Even low quality results are a good sniff test. Some of the time I just throw an LLM at a code review thing for a codebase I came across and let it work. I also love asking it to make me architecture diagrams.


> But if you have a reasonable subset of "skills" / "agents" you can deploy for various auditing tasks it can absolutely speed you up some.

Are people sharing these somewhere?


I think overall you're better off creating these yourself. The more you add to the overall context, the more chance of the model to screw up somewhere, so you want to give it as little as possible, yet still include everything that is important at that moment.

Using the agent and seeing where it get stuck, then creating a workflow/skill/whatever for how to overcome that issue, will also help you understand what scenarios the agents and models are currently having a hard time with.

You'll also end up with fewer workflows/skills that you understand, so you can help steer things and rewrite things when inevitably you're gonna have to change something.


I put the terms in quotes because it can be as simple as a set of prompts you develop for various contexts. It really doesn't have to be too heavy of an idea.


Oh, nice find... We end up using PyGhidra, but the models waste some cycles because of bad ergonomics. Perhaps your cli would be easier.

Still, Ghidra's most painful limitation was extremely slow time with Go Lang. We had to exclude that example from the benchmark.


> Models are not quite good enough to write an entire parser from scratch

In my experience models are really good at this? Not one shot, but writing decoders/encoders is entirely possible.


They can oneshot relatively simple parsers/encoders/decoders with a proper spec, but it’s a completely different ballgame when you’re trying to parse a very domain knowledge heavy file format (like the format electronics CAD) with decades of backwards compatible cruft spread among hundreds of megabytes of decompiled Delphi and C# dlls (millions of lines).

The low level parts (OLE container, streams and blocks) are easy but the domain specific stuff like deserializing to typed structs is much harder.


This is really cool! Thanks for sharing. It's a lot more sophisticated than what I did w/ Ghidra + LLMs.


How does this approach compare to the various Ghidra MCP servers?


There’s not much difference, really. I stupidly didn’t bother looking at prior art when I started reverse engineering and the ghidra-cli was born (along with several others like ilspy-cli and debugger-cli)

That said, it should be easier to use as a human to follow along with the agent and Claude Code seems to have an easier time with discovery rather than stuffing all the tool definitions into the context.


That is pretty funny. But you probably learned something in implementing it! This is such a new field, I think small projects like this are really worthwhile :)


I also did this approach (scripts + home-brew cli)...because I didn't know Ghidra MCP servers existed when I got started.

So I don't have a clear idea of what the comparison would be but it worked pretty well for me!


Thanks for sharing! It seems to be an active space, vide a recent MCP server (https://news.ycombinator.com/item?id=46882389). I you haven't tried, recommend a lot posting it as Show HN.

I tried a few approaches - https://github.com/jtang613/GhidrAssistMCP (was the harderst to set) Ghidra analyzeHeadless (GPT-5.2-Codex worked with it well!) and PyGhidra (my go-to). Did you try to see which works the best?

I mean, very likely (especially with an explicit README for AI, https://github.com/akiselev/ghidra-cli/blob/master/.claude/s...) your approach might be more convenient to use with AI agents.


> We can finally get rid of all that middle work. That adapting layer of garbage we blindly accepted during these years. A huge amount of frameworks and libraries and tooling that has completely polluted software engineering, especially in web, mobile and desktop development. Layers upon layers of abstractions that abstract nothing meaningful, that solve problems we shouldn’t have had in the first place, that create ten new problems for every one they claim to fix.

I disagree. At least for a little while until models improve to truly superhuman reasoning*, frameworks and libraries providing abstractions are more valuable than ever. The risk/reward for custom work vs library has just changed in unforeseen ways that are orthogonal to time and effort spent.

Not only do LLMs make customization of forks and the resulting maintenance a lot easier, but the abstractions are now the most valuable place for humans to work because it creates a solid foundation for LLMs to build on. By building abstractions that we validate as engineers, we’re encoding human in the loop input without the end-developer having to constantly hand hold the agent.

What we need now is better abstractions for building verification/test suites and linting so that agents can start to automatically self improve their harness. Skills/MCP/tools in general have had the highest impact short of model improvements and there’s so much more work to be done there.

* whether this requires full AGI or not, I don’t know.


I think it’s an Adobe Flex app now.


The real question is whether “debugging” the LLM is going to be as effective as debugging the code.

IME it pays dividends but it can be really painful. I’ve run into a situation multiple times where I’m using Claude Code to write something, then a week later while working it’ll come up with something like “Oh wait! Half the binaries are in .Net and not Delphi, I can just decompile them with ilspy”, effectively showing the way to a better rewrite that works better with fewer bugs that gets done in a few hours because I’ve got more experience from the v1. Either way it’s tens of thousands of lines of code that I could never have completed myself in that amount of time (which, given problems of motivation, means “at all”).


LLMs are where you need the most tests.

You want them writing tests especially in critical sections, I'll push to 100% coverage. (Not all code goes there, but thing that MUST work or everything crumbles. Yeah I do it.)

There was one time I was doing the classic: Pull a bug find 2 more thing. And I just told the LLM. "100% test coverage on the thing giving me problems." it found 4 bugs, fixed them, and that functionality has been rock solid since.

100% coverage is not a normal tool. But when you need it. Man does it help.


> You want them writing tests especially in critical sections, I'll push to 100% coverage.

But how do you know if you got it?

I've seen no LLM that can even verify execution pathway coverage.


> On the verification loop: I think there’s so much potential here. AI is pretty good at autonomously working on tasks that have a well defined and easy to process verification hook.

It's scary how good it's become with Opus 4.5. I've been experimenting with giving it access to Ghidra and a debugger [1] for reverse engineering and it's just been plowing through crackmes (from sites like crackmes.one where new ones are released constantly). I haven't bothered trying to have it crack any software but I wouldn't be surprised if it was effective at that too.

I'm also working through reverse engineering several file formats by just having it write CLI scripts to export them to JSON then recreate the input file byte by byte with an import command, using either CLI hex editors or custom diff scripts (vibe coded by the agent).

I still get routinely frustrated trying to use it for anything complicated but whole classes of software development problems have been reduced to vibe coding that feedback loop and then blowing through Claude Max rate limits.

[1] Shameless plug: https://github.com/akiselev/ghidra-cli https://github.com/akiselev/debugger-cli


I'm in the same loop where I find the more access I give it to systems and feedback mechanisms the more powerful it is. There's a lot of leverage in building those feedback systems. With the obvious caveat about footguns :P

Gave one of the repos a star as it's a cool example of what people are building with AI. Most common question on HN seems to be "what are people building". Well, stuff like this.


> Most common question on HN seems to be "what are people building". Well, stuff like this.

Hear, hear! I’ve got my altium-cli repo open source in Github as well, which is a vibe coded CLI for editing vibe reverse engineered Altium PCB projects. It’s not yet ready for primetime (I’m finishing up the file format reverse engineering this weekend) and the code quality is probably something twelve year old me would have been embarrassed by, but I can already use it and Claude/Gemini to automate a lot of the tedious parts of PCB design like part selection and footprints. I’m almost to the point where Claude Code can use it for the entire EE workflow from part selection to firmware, minus the PCB routing which I still do by hand.

I just ain’t wasting time blogging about it so unless someone stumbles onto it randomly by lurking on HN, they won’t know that Claude Code can now work on PCBs.


How much of an ECE background do you have? I've also thought about dabbling in LLM-assisted PCB stuff but felt like I was lacking too much of a foundation to get started (no ECE background at all)


I'm self taught but used to work as an EE professionally doing high speed digital and RF mixed signal work.

It honestly might have been easier without that experience because KiCad is open source and their S-expr file format is easy to use. I'm stuck with Altium since that's what I learned on and am used to.


(Shamless plug) I've been using my debugger-cli [1] to enable agents to debug code using debuggers that support the Debug Adaptor Protocol. It looks like cuda-gdb supports DAP so I'd love to add support. I just need help from someone who can test it adequately (kernels/warps/etc don't quite translate to a generic DAP client implementation).

[1] https://github.com/akiselev/debugger-cli


This is great. I hate LLMs fiddling around with logging calls to get some debugging capability.

Now they can be promoted from junior coders into mid-level coders :)


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: