Hacker Newsnew | past | comments | ask | show | jobs | submit | nickandbro's commentslogin

Maybe show how it works instead of making the home page a login screen.

I wonder if some of the docs from https://app.wafer.ai/docs could be used to make the model be better at writing GGML kernels. Interesting use case.

Pretty cool, like your carpe diem post

Love it


For anyone that is interested

"create me a svg of a pelican riding on a bicycle"

https://www.svgviewer.dev/s/FhqYdli5


It created a whole webpage to showcase the SVG with animation for me: https://output.jsbin.com/qeyubehate


I use the following models like so nowadays:

Gemini is great, when you have gitingested the code of pypi package and want to use it as context. This comes in handy for tasks and repos outside the model's training data.

5.1 Codex I use for a narrowly defined task where I can just fire and forget it. For example, codex will troubleshoot why a websocket is not working, by running its own curl requests within cursor or exec'ing into the docker container to debug at a level that would take me much longer.

Claude 4.5 Opus is a model that I feels trustworthy for heavy refactors of code bases or modularizing sections of code to become more manageable. Often it seems like the model doesn't leave any details out and the functionality is not lost or degraded.


"Create me a SVG of a PS4 controller"

Gemini 3.0 Pro: https://www.svgviewer.dev/s/CxLSTx2X

Opus 4.5: https://www.svgviewer.dev/s/dOSPSHC5

I think Opus 4.5 did a bit better overall, but I do think eventually frontier models will eventually converge to a point where the quality will be so good it will be hard to tell the winner.


I can only see the svg code there on mobile. I don't see any way to view the output.


Click the export tab


What we have all been waiting for:

"Create me a SVG of a pelican riding on a bicycle"

https://www.svgviewer.dev/s/FfhmhTK1


That is pretty impressive.

So impressive it makes you wonder if someone has noticed it being used a benchmark prompt.


Simon says if he gets a suspiciously good result he'll just try a bunch of other absurd animal/vehicle combinations to see if they trained a special case: https://simonwillison.net/2025/Nov/13/training-for-pelicans-...



"Pelican on bicycle" is one special case, but the problem (and the interesting point) is that with LLMs, they are always generalising. If a lab focussed specially on pelicans on bicycles, they would as a by-product improve performance on, say, tigers on rollercoasters. This is new and counter-intuitive to most ML/AI people.


The gold standard for cheating on a benchmark is SFT and ignoring memorization. That's why the standard for quickly testing for benchmark contamination has always been to switch out specifics of the task.

Like replacing named concepts with nonsense words in reasoning benchmarks.


Yes. But "the gold standard" just means "the most natural, easy and dumb way".


I have tried combinations of hard to draw vehicle and animals (crocodile, frog, pterodactly, riding a hand glider, tricycle, skydiving), and it did a rather good job in every cases (compared to previous tests). Whatever they have done to improve on that point, they did it in a way that generalise.


It hadn't occurred to me until now that the pelican could overcome the short legs issue by not sitting on the seat and instead put its legs inside the frame of the bike. That's probably closer to how a real pelican would ride a bike, even if it wasn't deliberate.


Very aero


You could! But just like others have mentioned, the performance would be negligible. If you really wanted to see more of a performance boost by pretraining you could try to create a bigger chunk of data to train off of. This would be done by either creating synthetic data off of your material, or finding adjacent information to your material. Here's a good paper about it: <https://arxiv.org/abs/2409.07431>


She did man


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: