Hacker Newsnew | past | comments | ask | show | jobs | submit | totetsu's favoriteslogin

Agents are basically tool using LLMs running in a loop where they come up with a plan, which includes running tools, the tool output is added to the context, and it iterates until it is done fulfilling some goal. It's basically exactly like a regular LLM chat except it is chatting with itself and giving itself instructions to run particular tools.

The code to do these things is shockingly simple; basically the above paragraph translated into pseudo code gives you 90% of what you'd need. Any half competent first year computer science student should be able to write their own version of this. Except of course they should be letting LLMs do the heavy lifting here.

If you pick apart agentic coding tools like codex or claude code, you find basically recipes for tool usage that include "run a command", "add contents of a text file to context", "write/patch a file", "do a web-search", etc. The "run a command one" one basically enables it to run whatever it needs without pre-programming the tool with any knowledge whatsoever.

That all comes from training and web searches. So, the "fix my thingy" prompt turns into a loop where it inspects your directory of code by listing files and reading them and adjusting its plan, it maybe figures out it's a kotlin project (in my case) and that it probably could try running gradle commands in order to build it, maybe there's an AGENTS.md file with some helpful information. Or a README.md. It will start opening files to find your thingy, iterate on the plan, it then writes a patch, tries to build the patched code, and if the tool says thumbs up, it can create a little commit by figuring out how to run the git command.

It's like magic when you see this in action. But all the magic is in the LLM; not the tool. Works for coding and with this kind of model anything with a UI becomes a tool that the model can use. UIs become APIs basically.

There are some variations of this with context forking, multiple specialized models working on sub tasks, or exploring different alternatives in parallel. But the core principle is very simple.

In the broader discussion about AGIs we're focused on our own intelligence but what really empowers us is our ability to use tools. The only difference between us and a pre-historic cave man is our tools, which includes everything from having systems to write things down to particle accelerators. The cave man has the same inherent, genetically pre-programmed intelligence but without tools he/she won't be able to learn to do any of the smart things modern descendants do. If you've ever seen a toddler use an ipad, you know how right I am. Most of them play games before they figure out how to walk.

The LLM way of writing things down is "adding them to a context". Most of the tool progress right now is about making that scale better. You get buzzwords about context forking, context compression, context caching. All that is is low level hacks to get the LLM to track more stuff. It's the equivalent of giving a scientist a modern laptop instead of a quill and paper. Same intelligence, better tools.


https://news.ycombinator.com/item?id=14238786 ("Sent to Prison by a Software Program’s Secret Algorithms (nytimes.com)")

https://news.ycombinator.com/item?id=14285116 ('Justice.exe: Bias in Algorithmic sentencing (justiceexe.com)")

https://news.ycombinator.com/item?id=43649811 ("Louisiana prison board uses algorithms to determine eligility for parole (propublica.org)")

https://news.ycombinator.com/item?id=11753805 ("Machine Bias (propublica.org)")


"We kill people based on metadata" Michael Haydon

Had to look that up.


Not to be a downer, but even as someone very optimistic about technology and AI generally, "TikTok but AI" sounds like a societally terrible thing to try and create.

What's the benefit of this? Curious if anyone has a solid viewpoint steelmanning any positives they can think of.


Thanks! HN was part of the origin story of the book in question.

In 2018 or 2019 I saw a comment here that said that most people don't appreciate the distinction between domains with low irreducible error that benefit from fancy models with complex decision boundaries (like computer vision) and domains with high irreducible error where such models don't add much value over something simple like logistic regression.

It's an obvious-in-retrospect observation, but it made me realize that this is the source of a lot of confusion and hype about AI (such as the idea that we can use it to predict crime accurately). I gave a talk elaborating on this point, which went viral, and then led to the book with my coauthor Sayash Kapoor. More surprisingly, despite being seemingly obvious it led to a productive research agenda.

While writing the book I spent a lot of time searching for that comment so that I could credit/thank the author, but never found it.



I think it's important to recognize here that fanfiction.net has 850 thousand distinct pieces of Harry Potter fanction on it. Fifty thousand of which are more than 40k words in length. Many of which (no easy way to measure) directly reproducing parts of the original books.

archiveofourown.org has 500 thousand, some, but probably not the majority, of that are duplicated from fanfiction.net. 37 thousand of these are over 40 thousand words.

I.e. harry potter and its derivatives presumably appear a million times in the training set, and its hard to imagine a model that could discuss this cultural phenomena well without knowing quite a bit about the source material.


Ignorance leading to assumptions. Their eureka moment: "The shell, not the Linux kernel, is responsible for searching for executables in PATH!" makes it obvious they haven't read up on operating systems. Shame because you should know how the machine works to understand what is happening in your computer. I always recommend reading Operating Systems: Three Easy Pieces. https://pages.cs.wisc.edu/~remzi/OSTEP/

Better to simply not collect the data in the first place. It's like the hierarchy of controls used in risk management, from most to least effective:

- Elimination – physically remove the hazard

- Substitution – replace the hazard

- Engineering controls – isolate people from the hazard

- Administrative controls – change the way people work

- PPE – protect the worker with equipment

Only with hazardous data, or things like moral hazards rather than physical hazards.


Point number 2. is super important for non-hobby projects. Collect a bit of data, even if you have to do it manually at first and do a "dry run" / first cut of whatever analysis you're thinking of doing so you confirm you're actually collecting what you need and what you're doing is even going to work. Seeing a pipeline get built, run for like two months and then the data scientist come along and say "this isn't what we needed" was complete goddamn shitshow. I'm just glad I was only a spectator to it.

Hey, author of the blog post here. Check out avante.nvim if you're already a vim/nvim user, I'm using it as assistant plugin with llama-server and it works great.

Small models, like Llama 3.2, Qwen and SmolLM are really good right now, compared to few years ago


The experience of passive consumption (cable TV, tiktok, etc, pointed out in another comment here) is essentially the experience of psychological obliteration.

When you get sucked into reels, you go from "here" to "there," and in the process, while you are "there," your entire whole self is destroyed. The same psychological phenomena happens to gambling addicts, alcoholics, or users of heroin. It has fewer physiological downsides and side-effects as those things; the only material loss you have is the loss of time.

But far more remarkable than that it's simply a waste of time, and rarely articulated, is this psychological loss. The destruction of the self. That echoes through a person's life, to their relationships, their self-construction, etc. It is those echoes that we are now dealing with on a mass sociological scale.

By the way. "There" has a lot of upsides too. People can be creative, productive, expressive while they are "there" too. Creating, being funny, being social, etc. That's why this is so hard.


"Expedient" is a common (or at least not rare) English word that means something like "practical and effective even if not directly attending to higher or deeper considerations."

For example, if two students in a class are having frequent confrontations that bring learning in the class to a halt, and attempts by teachers and counselors to address their conflict directly haven't been effective, the expedient solution might be to place them in separate classes. The "right thing" would be to address the problem on the social and emotional level, but if continued efforts to do so is likely to result in continued disruption to the students' education, it might be better to separate them. "Expedient" acknowledges the trade-off, while emphasizing the positive outcome.

Often a course of action is described as "expedient" when it seems to dodge an issue of morality or virtue. For example, if we solve climate change with geoengineering instead of by addressing thoughtless consumerism, corporate impunity, and lack of international accountability, many people would feel frustrated or let down by the solution because it would solve the problem without addressing the moral shortcomings that led to the problem. The word expedient stresses the positive side of this, the effectiveness and practicality of the solution, while acknowledging that it leaves other, perhaps deeper issues unaddressed.


Recently found a similar game map generator:

https://mode-vis.gumroad.com/l/IRzH


No, causation isnt distribution sampling. And there's a difference between, say, an extrinsic description of a system and it's essential properties.

Eg., you can describe a coin flip as a sampling from the space, {H,T} -- but insofar as we're talking about an actual coin, there's a causal mechanism -- and this description fails (eg., one can design a coin flipper to deterministically flip to heads).

In the case of a transformer model, and all generative statistical models, these are actually learning distributions. The model is essentially constituted by a fit to a prior distribution. And when computing a model output, it is sampling from this fit distribution.

ie., the relevant state of the graphics card which computes an output token is fully described by an equation which is a sampling from an empirical distribution (of prior text tokens).

Your nervous system is a causal mechanism which is not fully described by sampling from this outcome space. There is no where in your body that stores all possible bodily states in an outcome space: this space would require more atoms in the universe to store.

So this isn't the case for any causal mechanism. Reality itself comprises essential properties which interact with each other in ways that cannot be reduced to sampling. Statistical models are therefore never models of reality essentially, but basically circumstantial approximations.

I'm not stretching definitions into meaninglessness, these are the ones given by AI researchers, of which I am one.


To disallow:

Amazonbot, anthropic-ai, AwarioRssBot, AwarioSmartBot, Bytespider, CCBot, ChatGPT-User, ClaudeBot, Claude-Web, cohere-ai, DataForSeoBot, Diffbot, Webzio-Extended, FacebookBot, FriendlyCrawler, Google-Extended, GPTBot, 0AI-SearchBot, ImagesiftBot, Meta-ExternalAgent, Meta-ExternalFetcher, omgili, omgilibot, PerplexityBot, Quora-Bot, TurnitinBot

For all of these bots,

User-agent: <Bot Name> Disallow: /

For more information, check https://darkvisitors.com/agents

If this takes off, I've made my own variant of llms.txt here: https://boehs.org/llms.txt . I hereby release this file to the public domain, if you wish to adapt and reuse it on your own site.

Hall of shame: https://www.404media.co/websites-are-blocking-the-wrong-ai-s...


The core value proposition of software is the ability to implement most system designs relatively quickly and efficiently, including very bad ones.

At the risk of being spammy, I wrote a simple Matrix bot that replicates the entirety of their currently announced featureset, but also supports any other model too, that you can access from a normal Matrix client.

Stop re-building chat clients, I already have one!

Ideally they would just run their own chatbot on different existing chat platforms that you could verify your API key with, but with my project you can at least run that chatbot yourself.

[0] - https://github.com/arcuru/chaz

[1] - https://jackson.dev/post/chaz/ (Blog Post)


This seems like the perfect opportunity to introduce those unfamiliar to Robert Elder. He makes cool YouTube[0] and blog content[1] and has a series on regular expressions[2] and does some quite deep dives into the differing behaviour of the different tools that implement the various versions.

His latest on the topic is cool too: https://www.youtube.com/watch?v=ys7yUyyQA-Y

He's has quite a lot of content that HN folks might be interested in I think, like the reality and woes of consulting[3]

[0] https://www.youtube.com/@RobertElderSoftware

[1] https://blog.robertelder.org/

[2] https://blog.robertelder.org/regular-expressions/

[3] https://www.youtube.com/watch?v=cK87ktENPrI


> I had the same problem. Which camera should I buy if I want to use this?

Their docs have a general "we don't provide specifics because $OEM can change the internals on a whim" disclaimer. That is true. OEMs can/will change the internals out but leave the model number the same. You see this ALL THE TIME with the cheap SOHO routers.

However, those changes are _usually_ paired with a bump in version / revision on the product label.

In any case, a community sourced page of devices that were - at one point in time - known to contain a supported chip would be helpful. Even if the OEM had changed the internals, at least you'd know that the internals have changed and that scouring eBay for an older serial number of the same model might get you a supported one.

In any case, I opened up a few of the amcrest cameras I have lying around and found a poorly support ambrella SoC and another SoC that is not supported at all.

--

EDIT: after talking with a friend that does a lot of hardware reverse engineering:

``` Plug the chipset that's well supported into Ali Express and then just hope that sellers put it in the listing and that it's accurate. You'll get lots of hits on bare boards / dev-kits where the SoC ID is the only unique thing about it but you might get a few "finished" devices that are listed as using the chip as well. ```

It's not perfect, but that might be _a_ way to find devices that should be supported.


Very useful is:

    PS4='+ ${BASH_SOURCE:-}:${FUNCNAME[0]:-}:L${LINENO:-}:   '
When using `set -x` this makes it so that it shows the filename, function name, and line number. Which in larger Bash scripts can be quite handy in debugging.

Binary search (or bisecting) is also an incredibly valuable approach that I don’t see junior and intermediate engineers reach for nearly as often as they should.

When some thing is failing, find a midpoint between where things are working and where the bug is manifesting. Do you see evidence of the bug? If not, look earlier in the pipeline. If so, look later. Repeat.

In my experience this process is the primary distinguisher between those who flail around looking for a root cause and the people who can rapidly come to an answer.


If you're interested, I have a small side project (https://imaginanki.com) for generating Anki decks with images + speech (via SDXL/Azure).

When I originally wrote the LGPL (v1, back around 1991) we could not imagine anything like an App Store or signed binaries. Dynamic linking provided an easy way for users to upgrade the library code.

Since the user doesn’t have the freedom to update the libs on ios etc I don’t see how you could deploy LGPL code on those platforms; since one of the points of using unity is its cross-platform support, that suggests you’d have to find another library unless you were only deploying on real OSes.

But is that Unity’s problem?


Some more resources:

- A Comprehensive Survey on Vector Database: Storage and Retrieval Technique, Challenge, https://arxiv.org/abs/2310.11703

- Survey of Vector Database Management Systems, https://arxiv.org/abs/2310.14021

- What are Embeddings, https://raw.githubusercontent.com/veekaybee/what_are_embeddi...

---

h/t: https://twitter.com/eatonphil/status/1745524630624862314 and https://twitter.com/ChristophMolnar/status/17457316026829826...


Nice! Playing around with the widget under the section "Experimenting with lines" I'm reminded of Bret Victor's Inventing on Principle talk [0] (an absolute must watch, if anyone hasn't yet). In particular, changing the smoothness reveals a sort of scaling effect that I'd probably never know about if not playing around with sliders and having it update in real time rather than setting individual values. Very interesting and beautiful!

[0] https://youtu.be/PUv66718DII?si=2urxGUwD_lWA8C4q


Heres a relevant video on the topic: https://www.youtube.com/watch?v=ZYj4NkeGPdM

I really love that video.


For sure! The book that gets the most amount of praise and the one I personally used and can highly recommend is `Lingua Latina Per Se Illustrata Pars I: Familia Romana` or LLPSI, for short.

It is a "natural method" book, which means it teaches you the language using the language itself. This may seem hard and counter-intuitive, but it starts off really easily, with sentences that just about anyone could understand, and there are images to help you visualize things. The advantage of this method is that it teaches you an intuitive understanding of the language, as if you were learning by immersion. That is how humans generally learn languages: we don't think of grammar when we read or speak, we just do it.

That isn't to say you won't learn grammar, but rather, it means that grammar will be a complement, not your main focus. For grammar-related queries, Allen & Greenough's dictionary is a really good one. You can find it hosted online by the Dickinson College.

As a dictionary, there are the Latinitium ones, which are really good, and serve Latin to English as well as the contrary. For support and to see what other Latinistas are up to, there is the Latin & Ancient Greek discord server (sorry, I don't have the link on me right now), and from there you can join the LLPSI one.

What I did was to read a bit every day of either LLPSI I & II or some more advanced books when I was able to for about a year and a half. Now, I can read a lot by Cicero and some other authors. It's well worth it :)

Happy learning!


i would not call myself a writer though my education was literature, and whether by intent of my instructors or just by how the writing process works, this is more or less how i write more formal or “professional” items.

it’s not nearly as conscious of a process as what the article describes but this really is the self-check process in my head as i write. each section i just try to understand what do i want to say, have i said it anywhere else, have i effectively made the point with supporting context for it, and have i expressed the source/type of information i am presenting (it’s my opinion, my thought, as best i understand a fact, etc).

it’s a discipline for sure and as such i don’t always do it well, but for any piece i want to write for work or blogs or creatively, this process helps me and after some practice it’s very easy to fire off unconsciously.

the best thjng for me was having very good instructors who did constructive criticism well; they tried to understand my intent and purpose of the writing and noted places where i didn’t quite meet my intent or there were better ways to express it.

i got into programming later on in life and i would equate the process to a good code review. it can be strict and intimidating given it’s a critique of how you express yourself in the medium, but the strictest code reviews are immensely important for my growth as a programmer, and even more so as a writer as writing with human written languages is a lot more “open” in many ways with less strictly defined methods of accomplishing a task.


This seems to be a very common type of software bug. On fail retry, without any limits of how frequent or how many time.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: