More

nickandbro · 2026-01-22T20:02:37 1769112157

Maybe show how it works instead of making the home page a login screen.

nickandbro · 2026-01-18T20:26:26 1768767986

I wonder if some of the docs from https://app.wafer.ai/docs could be used to make the model be better at writing GGML kernels. Interesting use case.

nickandbro · 2026-01-15T05:01:38 1768453298

Pretty cool, like your carpe diem post

nickandbro · 2026-01-06T20:51:41 1767732701

Love it

nickandbro · 2025-12-02T02:47:00 1764643620

For anyone that is interested

"create me a svg of a pelican riding on a bicycle"

https://www.svgviewer.dev/s/FhqYdli5

chronogram · 2025-12-02T11:02:16 1764673336

It created a whole webpage to showcase the SVG with animation for me: https://output.jsbin.com/qeyubehate

nickandbro · 2025-11-25T02:59:33 1764039573

I use the following models like so nowadays:

Gemini is great, when you have gitingested the code of pypi package and want to use it as context. This comes in handy for tasks and repos outside the model's training data.

5.1 Codex I use for a narrowly defined task where I can just fire and forget it. For example, codex will troubleshoot why a websocket is not working, by running its own curl requests within cursor or exec'ing into the docker container to debug at a level that would take me much longer.

Claude 4.5 Opus is a model that I feels trustworthy for heavy refactors of code bases or modularizing sections of code to become more manageable. Often it seems like the model doesn't leave any details out and the functionality is not lost or degraded.

nickandbro · 2025-11-25T00:09:29 1764029369

"Create me a SVG of a PS4 controller"

Gemini 3.0 Pro: https://www.svgviewer.dev/s/CxLSTx2X

Opus 4.5: https://www.svgviewer.dev/s/dOSPSHC5

I think Opus 4.5 did a bit better overall, but I do think eventually frontier models will eventually converge to a point where the quality will be so good it will be hard to tell the winner.

esperent · 2025-11-25T01:14:00 1764033240

I can only see the svg code there on mobile. I don't see any way to view the output.

Thev00d00 · 2025-11-25T10:01:30 1764064890

Click the export tab

nickandbro · 2025-11-18T15:40:38 1763480438

What we have all been waiting for:

"Create me a SVG of a pelican riding on a bicycle"

https://www.svgviewer.dev/s/FfhmhTK1

Thev00d00 · 2025-11-18T15:41:56 1763480516

That is pretty impressive.

So impressive it makes you wonder if someone has noticed it being used a benchmark prompt.

burkaman · 2025-11-18T15:48:49 1763480929

Simon says if he gets a suspiciously good result he'll just try a bunch of other absurd animal/vehicle combinations to see if they trained a special case: https://simonwillison.net/2025/Nov/13/training-for-pelicans-...

ddalex · 2025-11-18T16:06:39 1763481999

https://www.svgviewer.dev/s/TVk9pqGE giraffe in a ferrari

jmmcd · 2025-11-18T16:04:21 1763481861

"Pelican on bicycle" is one special case, but the problem (and the interesting point) is that with LLMs, they are always generalising. If a lab focussed specially on pelicans on bicycles, they would as a by-product improve performance on, say, tigers on rollercoasters. This is new and counter-intuitive to most ML/AI people.

BoorishBears · 2025-11-18T19:18:47 1763493527

The gold standard for cheating on a benchmark is SFT and ignoring memorization. That's why the standard for quickly testing for benchmark contamination has always been to switch out specifics of the task.

Like replacing named concepts with nonsense words in reasoning benchmarks.

jmmcd · 2025-11-19T09:06:49 1763543209

Yes. But "the gold standard" just means "the most natural, easy and dumb way".

rixed · 2025-11-18T16:28:11 1763483291

I have tried combinations of hard to draw vehicle and animals (crocodile, frog, pterodactly, riding a hand glider, tricycle, skydiving), and it did a rather good job in every cases (compared to previous tests). Whatever they have done to improve on that point, they did it in a way that generalise.

bitshiftfaced · 2025-11-18T15:49:35 1763480975

It hadn't occurred to me until now that the pelican could overcome the short legs issue by not sitting on the seat and instead put its legs inside the frame of the bike. That's probably closer to how a real pelican would ride a bike, even if it wasn't deliberate.

xnx · 2025-11-18T15:53:13 1763481193

Very aero

nickandbro · 2025-10-14T21:17:36 1760476656

You could! But just like others have mentioned, the performance would be negligible. If you really wanted to see more of a performance boost by pretraining you could try to create a bigger chunk of data to train off of. This would be done by either creating synthetic data off of your material, or finding adjacent information to your material. Here's a good paper about it: <https://arxiv.org/abs/2409.07431>

nickandbro · 2025-09-09T23:13:27 1757459607

She did man