Hacker Newsnew | past | comments | ask | show | jobs | submit | markdog12's commentslogin

This software has been terrible for me. Burns tokens like crazy, and fails. Most times I try to use the browser plugin, it just says it can't use the plugin. When it does work, it takes minutes to click a button. Unusable workflow.

I ask to generate a png with an alpha channel. It can't. Instead, it outputs a chroma-keyed image, then generates a python script to remove chroma key (fails), then a js script (which also fails). Then my 5h allotment is up.

It's frustrating because if it worked as they advertise, it'd be an amazing tool.


Although they can technically do it, I wouldn't be asking LLMs to generate binary files like PNG with alpha channels, no matter how simple that may seem. If it's easy enough to manually create one yourself, I would do that.

The best way for LLMs to do this is likely to write a scratch program (which is what it seems to have reached for in the second half), write code (which they are good at) and have the library create the image.

At some point it is just easier to handle such things yourself, and use them with text-based formats.


100% agree. I've spent many hours testing out local models/harnesses. So far, they're very much not worth the tradeoff. Obviously, I hope that changes.

Came here to say the same thing. Why add this fake image?


Website Obesity mentioned ! [0]

  This project led me to propose the Taft Test:

  Does your page design improve when you replace every image with William Howard Taft?

  If so, then, maybe all those images aren’t adding a lot to your article. At the very least, leave Taft there! You just admitted it looks better. 
[0] (idlewords.com/talks/website_obesity): https://web.archive.org/web/20260421022440/https://idlewords...


Two reasons: The people in the pic look more or less like our real ourselves. The synthesized photo shows the process of discussing highly conceptual approaches, which was our everyday for 10 years or so.


"AI overly affirms users, and that's bad" - everyone nods. "Modern society overly affirms people, and that's bad" - ....


Why did so many people swallow this crap in the first place?


Ah, I think I searched for "jpegxl", that's why there was no match.


"Yes, re-opening.".

> Given these positive signals, we would welcome contributions to integrate a performant and memory-safe JPEG XL decoder in Chromium. In order to enable it by default in Chromium we would need a commitment to long-term maintenance. With those and our usual launch criteria met, we would ship it in Chrome.

https://groups.google.com/a/chromium.org/g/blink-dev/c/WjCKc...


Context: Mozilla has had the same stance and many devs (including Googlers) are already working on a Rust decoder which has made good progress.


LOL. Google, the "yeah that thing we bought six months ago, we're killing it off 30 days for 4 weeks ago" company demanding "long-term" anything.


That conversation doesn't apply to their core products: Search, Mail, Maps, Chrome, Android. Their commitment to maintaining these services over decades has been amazing. It's everything else that sucks.


Mail is dropping features left and right, like gmailify. I'm pretty sure they're trying to limit the maintenance costs as much as possible.


I could almost imagine the normal search going away to be replaced by a chatbot.


long term support is actually being provided by google...

just a different team in a different country :D

most jxl devs are at google research in zurich, and already pledged to handle long tetm support


Just like google pledges long term support for everything until the next new and shiny comes along.


I think Chrome can safely be said to have a track record of long term investment.


It is, after all, their primary ad delivery vector.


Very good track record there, native clients, floc, manifest v2, ...


I asked Gemini "dymamic view" how SynthID works: https://gemini.google.com/share/62fb0eb38e6b


I asked it to analyze my tennis serve. It was just dead wrong. For example, it said my elbow was bent. I had to show it a still image of full extension on contact, then it admitted, after reviewing again, it was wrong. Several more issues like this. It blamed it on video being difficult. Not very useful, despite the advertisements: https://x.com/sundarpichai/status/1990865172152660047


The default FPS it's analyzing video at is 1, and I'm not sure the max is anywhere near enough to catch a full speed tennis serve.


Ah, I should have mentioned it was a slow motion video.

> The default FPS it's analyzing video at is 1

Source?


https://ai.google.dev/gemini-api/docs/video-understanding#cu...

"By default 1 frame per second (FPS) is sampled from the video."


OK, I just used https://gemini.google.com/app, I wonder if it's the same there.


I’ve never seen such a huge delta between advertised capabilities and real world experience. I’ve had a lot of very similar experiences to yours with these models where I will literally try verbatim something shown in an ad and get absolutely garbage results. Do these execs not use their own products? I don’t understand how they are even releasing this stuff.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: