I literally cannot believe that people are wasting their time doing this either ...

sharkjacobs · 2026-04-16T20:44:02 1776372242

It feels like the results stopped being interesting a little while ago but the practice has become part of simonw's brand, and it gives him something to post even when there is nothing interesting to say about another incremental improvement to a model, and so I don't imagine he'll stop.

stephbook · 2026-04-16T21:26:43 1776374803

I, for one, expected progress. Uneven, sometimes delayed, but ever increasing progress.

But that Opus pelican?

cedws · 2026-04-16T22:22:29 1776378149

It’s not a waste of time. As the boundaries of AI are pushed we increasingly struggle to define what intelligence actually is. It becomes more useful to test what models cannot do instead of what they can. Random tasks like the pelican test can show how general the intelligence really is, putting aside the obvious flaw that the labs can optimise for such a simple public benchmark.

throwuxiytayq · 2026-04-17T07:25:53 1776410753

The whole point of this benchmark is that it asks the model to work in a modality it is not trained in and does not understand well. The result is largely meaningless. This is just like the people who are endlessly surprised by the fact that a raw LLM does not work with numbers well, or miscounts letters. In short, this test benchmarks the intelligence of the person running it, not of the model.

cedws · 2026-04-19T18:53:10 1776624790

The rasterised SVG is just a different representation of the same data. A sufficiently advanced LLM may not need to 'see' the rasterised image to be able to draw a good picture. A human could draw a very basic image through raw SVG just by mentally plotting points.

recursive · 2026-04-16T21:57:24 1776376644

Fun is so un-productive. Everyone doing things for "fun" is going to be sorry when they look back and realizes they were wasting time having a "good time" rather than optimizing their KPIs.

throwuxiytayq · 2026-04-17T07:22:20 1776410540

Sarcasm aside, asking LLMs do draw pelicans is your idea of fun? I'm worried for you.

recursive · 2026-04-18T03:46:33 1776483993

No. I've never done it. However the stuff I do is even weirder. Thanks for your concern.

casey2 · 2026-04-17T19:15:05 1776453305

That's what happens in a monopoly enviornment. Literally everyone and every company becomes dancing monkeys for teracaps

segmondy · 2026-04-16T21:29:13 1776374953

I can't believe you're such a party pooper. It's exciting times, the silly things do matter!

bschwindHN · 2026-04-17T01:55:06 1776390906

I do wonder how much energy collectively has been burned on this useless "benchmark".

Marciplan · 2026-04-16T23:44:24 1776383064

I also can't understand how this goes so viral every time on Hackernews lol