Their models might be impressive, but their products absolutely suck donkey balls. I’ve given Gemini web/cli two months and ran away back to ChatGPT. Seriously, it would just COMPLETELY forget context mid dialog. When asked about improving air quality it just gave me a list of (mediocre) air purifiers without asking for any context whatsoever, and I can list thousands of conversations like that. Shopping or comparing options is just nonexistent.
It uses Russian propaganda sources for answers and switches to Chinese mid sentence (!), while explaining some generic Python functionality.
It’s an embarrassment and I don’t know how they justify 20 euro price tag on it.
I agree. On top of that, in true Google style, basic things just don't work.
Any time I upload an attachment, it just fails with something vague like "couldn't process file". Whether that's a simple .MD or .txt with less than 100 lines or a PDF. I tried making a gem today. It just wouldn't let me save it, with some vague error too.
I also tried having it read and write stuff to "my stuff" and Google drive. But it would consistently write but not be able to read from it again. Or would read one file from Google drive and ignore everything else.
Their models are seriously impressive. But as usual Google sucks at making them work well in real products.
I don't find that at all. At work, we've no access to the API, so we have to force feed a dozen (or more) documents, code and instruction prompts through the web interface upload interface. The only failures I've ever had in well over 300 sessions were due to connectivity issues, not interface failures.
Context window blowouts? All the time, but never document upload failures.
I'm talking about Gemini in the app and on the web. As well as AI studio. At work we go through Copilot, but there the agentic mode with Gemini isn't the best either.
What I love about Gemini mobile is that, if you look at the app wrong, it completely loses the response. It still generates it (and uses up your quota), but it never displays it!
This is the company that made Android, and it can't make an Android app that fetches a response from a server. Astonishing.
It's so capable at some things, and others are garbage.
I uploaded a photo of some words for a spelling bee and asked it to quiz my kid on the words. The first word it asked, wasn't on the list. After multiple attempts to get it to start asking only the words in the uploaded pic, it did, and then would get the spellings wrong in the Q&A. I gave up.
I had it process a photo of my D&D character sheet and help me debug it as I'm a n00b at the game. Also did a decent, although not perfect, job of adding up a handwritten bowling score sheet.
How can the models be impressive if they switch to Chinese mid-sentence? I've observed those bizarre bugs too. Even GPT-3 didn't have those. Maybe GPT-2 did. It's actually impressive that they managed to botch it so badly.
Google is great at some things, but this isn't it.
My experience with Antigravity is the opposite. It's the first time in over 10 years that an IDE has managed to take me out a bit out of the jetbrain suite. I did not think that was something possible as I am a hardcore jetbrain user/lover.
I disagree. At least in my brief test drive, when used with Claude, the performance was on par with Cursor except that the Agent could actually interact with the terminal properly (Cursor is comically bad at this for some reason).
When the (generous!) Claude credits dry up functionality stops however. Gemini is as useless in Antigravity as everywhere else.
I've used their Pro models very successfully in demanding API workloads (classification, extraction, synthesis). On benchmarks it crushed the GPT-5 family. Gemini is my default right now for all API work.
It took me however a week to ditch Gemini 3 as a user. The hallucinations were off the charts compared to GPT-5. I've never even bothered with their CLI offering.
It’s all context/ use case; I’ve had weird things too but if you only use markdown inputs and specific prompts Gemini 3 Pro is insane, not to mention the context window
Also because of the long context window (1 mil tokens on thinking and pro! Claude and OpenAI only have 128k) deep research is the best
That being said, for coding I definitely still use Codex with GPT 5.3 XHigh lol
Agreed on the product. I can't make Gemini read my emails on GMail. One day it says it doesn't have access, the other day it says Query unsuccessful.
Claude Desktop has no problem reaching to GMail, on the other hand :)
I don't have any of these issues with Gemini. I use it heavily everyday. A few glitches here and there, but it's been enormously productive for me. Far more so then chatgpt, which I find mostly useless.
And it gives incorrect answers about itself and google’s services all the time. It kept pointing me to nonexistent ui elements. At least it apologizes profusely! ffs
Not a single person is using it for coding (outside of Google itself).
Maybe some people on a very generous free plan.
Their model is a fine mid 2025 model, backed by enormous compute resources and an army of GDM engineers to help the “researchers” keep the model on task as it traverses the “tree of thoughts”.
But that isn’t “the model” that’s an old model backed by massive money.
Market counter points that aren't really just a repackaging of:
1. "Google has the world's best distribution" and/or
2. "Google has a firehose of money that allows them to sell their 'AI product' at an enormous discount?
These benchmarks are super impressive. That said, Gemini 3 Pro benchmarked well on coding tasks, and yet I found it abysmal. A distant third behind Codex and Claude.
Tool calling failures, hallucinations, bad code output. It felt like using a coding model from a year ago.
Even just as a general use model, somehow ChatGPT has a smoother integration with web search (than google!!), knowing when to use it, and not needing me to prompt it directly multiple times to search.
Not sure what happened there. They have all the ingredients in theory but they've really fallen behind on actual usability.
Just not search. The search product has pretty much become useless over the past 3 years and the AI answers often will get just to the level of 5 years ago. This creates a sense that that things are better - but really it’s just become impossible to get reliable information from an avenue that used to work very well.
I don’t think this is intentional, but I think they stopped fighting SEO entirely to focus on AI. Recipes are the best example - completely gutted and almost all receive sites (therefore the entire search page) run by the same company. I didn’t realize how utterly consolidated huge portions of information on the internet was until every recipe site about 3 months ago simultaneously implemented the same anti-Adblock.
Competition always is. I think there was a real fear that their core product was going to be replaced. They're already cannibalizing it internally so it was THE wake up call.
Wartime Google gave us Google+. Wartime Google is still bumbling, and despite OpenAI's numerous missteps, I don't think it has to worry about Google hurting its business yet.
I do miss Google+. For my brain / use case, it was by far the best social network out there, and the Circle friends and interest management system is still unparalleled :)
Windows Phone was actually good. I would even say that my Lumia something was one of best experiences ever on mobile. G+ was also good. Efficient markets mean that you can "extract" rent, via selling data or attention etc. not realy what is good
But wait two hours for what OpenAI has! I love the competition and how someone just a few days ago was telling how ARC-AGI-2 was proof that LLMs can't reason. The goalposts will shift again. I feel like most of human endeavor will soon be just about trying to continuously show that AI's don't have AGI.
"AGI" doesn't mean anything concrete, so it's all a bunch of non-sequiturs. Your goalposts don't exist.
Anyone with any sense is interested in how well these tools work and how they can be harnessed, not some imaginary milestone that is not defined and cannot be measured.
I agree. I think the emergence of LLMs have shown that AGI really has no teeth. I think for decades the Turing test was viewed as the gold standard, but it's clear that there doesn't appear to be any good metric.
The turing test was passed in the 80s, somehow it has remained relevant in pop culture despite the fact that it's not a particularly difficult technical achievement
> I feel like most of human endeavor will soon be just about trying to continuously show that AI's don't have AGI.
I think you overestimate how much your average person-on-the-street cares about LLM benchmarks. They already treat ChatGPT or whichever as generally intelligent (including to their own detriment), are frustrated about their social media feeds filling up with slop and, maybe, if they're white-collar, worry about their jobs disappearing due to AI. Apart from a tiny minority in some specific field, people already know themselves to be less intelligent along any measurable axis than someone somewhere.
It's very hard to tell the difference between bad models and stinginess with compute.
I subscribe to both Gemini ($20/mo) and ChatGPT Pro ($200/mo).
If I give the same question to "Gemini 3.0 Pro" and "ChatGPT 5.2 Thinking + Heavy thinking", the latter is 4x slower and it gives smarter answers.
I shouldn't have to enumerate all the different plausible explanations for this observation. Anything from Gemini deciding to nerf the reasoning effort to save compute, versus TPUs being faster, to Gemini being worse, to this being my idiosyncratic experience, all fit the same data, and are all plausible.
You nailed it. Gemini 3 Pro seems very "lazy" and seems to never reason for more than 30 seconds, which significantly impacts the quality of its outputs.
Have you used Gemini CLI, and then codex? Gemini is so trigger happy, the moment you don’t tell it „don’t make any changes“ it runs off and starts doing all kind of unrelated refactorings. This is the opposite of what I want. I want considerate, surgical implementations. I need to have a discussion of the scope, and sequence diagrams first. It should read a lot of files instead of hallucinating about my architecture.
Their chat feels similar. It just runs off like a wild dog.
Gemini's UX (and of course privacy cred as with anything Google) is the worst of all the AI apps. In the eyes of the Common Man, it's UI that will win out, and ChatGPT's is still the best.
They don't even let you have multiple chats if you disable their "App Activity" or whatever (wtf is with that ass naming? they don't even have a "Privacy" section in their settings the last time I checked)
and when I swap back into the Gemini app on my iPhone after a minute or so the chat disappears. and other weird passive-aggressive take-my-toys-away behavior if you don't bare your body and soul to Googlezebub.
ChatGPT and Grok work so much better without accounts or with high privacy settings.
This exactly! "Oh that gang of thieves that also sells doors has never had their house broken into"
I hate how they insist on knowing everything I do all the time, but heavens forbid the minute I'm on a VPN or shared connection I have to do unpaid manual labor (100 CAPTCHAs) to train their AI
You mean AI Studio or something like that, right? Because I can't see a problem with Google's standard chat interface. All other AI offerings are confusing both regarding their intended use and their UX, though, I have to concur with that.
No projects, completely forgets context mid dialog, mediocre responses even on thinking, research got kneecapped somehow and is completely uses now, uses propaganda Russian videos as the search material (what’s wrong with you, Google?), janky on mobile, consumes GIGABYTES of RAM on web (seriously, what the fuck?). Left a couple of tabs over night, Mac is almost complete frozen because 10 tabs consumed 8 GBs of RAM doing nothing. It’s a complete joke.
Fair enough. I'm always astonished how different experiences are because mine is the complete opposite. I almost solely use it for help with Go and Javascript programming and found Gemini Pro to be more useful than any other model. ChatGPT was the worst offender so far, completely useless, but Claude has also been suboptimal for my use cases.
I guess it depends a lot on what you use LLMs for and how they are prompted. For example, Gemini fails the simple "count from 1 to 200 in words" test whereas Claude does it without further questions.
Another possible explanation would be that processing time is distributed unevenly across the globe and companies stay silent about this. Maybe depending on time zones?
Been using Gemini + OpenCode for the past couple weeks.
Suddenly, I get a "you need a Gemini Access Code license" error but when you go to the project page there is no mention of this or how to get the license.
You really feel the "We're the phone company and we don't care. Why? Because we don't have to." [0] when you use these Google products.
PS for those that don't get the reference: US phone companies in the 1970s had a monopoly on local and long distance phone service. Similar to Google for search/ads (really a "near" monopoly but close enough).
I'm leery to use a Google product in light of their history of discontinuing services. It'd have to be significantly better than a similar product from a committed competitor.
Agree. Anyone with access to large proprietary data has an edge in their space (not necessarily for foundation models): Salesforce, adobe, AutoCAD, caterpillar
Trick? Lol not a chance. Alphabet is a pure play tech firm that has to produce products to make the tech accessible. They really lack in the latter and this is visible when you see the interactions of their VP's. Luckily for them, if you start to create enough of a lead with the tech, you get many chances to sort out the product stuff.
Don't let the benchmarks fool you. Gemini models are completely useless not matter how smart they are. Google still hasn't figure out tool calling and making the model follow instructions. They seem to only care about benchmarking and being the most intelligent model on paper. This has been a problem of Gemini since 1.0 and they still haven't fixed it.