Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

That's a fair comment.

Just to give my point of view: I'm head of ML here, but I'm choosing to work here for the impact I believe I can have. I could work somewhere else.

As for the net positive effect, the point of my essay is that the trust relation you raise (not having to duplicate the research, etc) to me is a product design issue.

LLMs are fundamentally capable of bullshit. So products that leverage them have to keep that in mind to build workflows that don't end up breaking user trust in it.

The way we're currently thinking of doing that is to keep the user in the loop and incentivize the user to check sources by making it as easy as possible to quickly fact check LLM claims.

I'm on the same page as you that a model you can only trust 95% of the time is not useful because it's untrustable. So the product has to build an interaction flow that assumes that lack of trust but still makes something that is useful, saves time, respects user preferences, etc.

You're welcome to still think they're not useful for you, but that's the way we currently think about it and our goal is to make useful tools, not lofty promises of replacing humans at tasks.



> That's a fair comment.

Equally, And despite my disagreement, I do genuinely appreciate the reply, especially given my dissent.

> Just to give my point of view: I'm head of ML here, but I'm choosing to work here for the impact I believe I can have. I could work somewhere else.

> As for the net positive effect, the point of my essay is that the trust relation you raise [...] to me is a product design issue.

Product design, or material appropriability issue? Why is the product you're trying to deliver, based ontop of a conversational model? I know why my rock climbing rope is a synthetic 10mm dynamic kernmantle rope, but why is a conversational AI the right product here?

> LLMs are fundamentally capable of bullshit.

Why though? I don't mean from a technical level, I do understand how next token prediction works. But from a product reasonability standpoint? Why are you attempting to build this product using a system that you makes predictions based on completely incorrect or inappropriate inputs?

I admittedly, am not up-to-date on state of the art, so please do correct me if my understanding is incomplete or wrong. But if I'm not mistaken, generally, attention based transformers themselves dont hallucinate when producing low heat language to language translations, right? Why are conversational models, the ones very much prone to hallucinations and emitting believable bullshit the interface everything uses?

How much of that reason is because that ability to emit believable bullshit, is actually the product you are trying to sell? (The rhetorical you, I'm specifically considering LLM as a service providers egear to over sell the capabilities of their model. I still have a positive opinion about Kagi, so I could be convinced you're the ones who are different) The artificial confidence is the product. Bullshitting something believeable but wrong has better results, in bulk, for the metrics you're tracking. When soliciting feedback, the vast majority of the answers are based on vibes, right?

If you had two models, one that was rote and very reliable, very predictable, rarely produced inaccurate output. But wasn't impressive when trying to generate conversational feeling text, and critically, was unable to phrase things in a trivial to understand way exuding an abundance of confidence. Contrasted with another that very very rarely would produce total bullshit, but all the feedback shows everyone loves using that model. But it makes them feel good about the answer, yet there's stil that nagging hallucination issue bubbling under the surface.

Which would you ship?

Again, I'm asking which would you ship with the rhetorical you... perhaps there is someone in charge of AI that would only ship the safe version, even if few users ranked it higher than normal organic search. Unfortunately I'm way too much of a cynic to believe that's possible. The AI is good crowd doesn't have a strong reputation for always making the ethical selection.

> So products that leverage them have to keep that in mind to build workflows that don't end up breaking user trust in it.

> The way we're currently thinking of doing that is to keep the user in the loop and incentivize the user to check sources by making it as easy as possible to quickly fact check LLM claims.

Do you feel that's a reasonable expectation from users when you've already given them the perfect answer they're looking for with plenty of subjective confidence?

> I'm on the same page as you that a model you can only trust 95% of the time is not useful because it's untrustable. So the product has to build an interaction flow that assumes that lack of trust but still makes something that is useful, saves time, respects user preferences, etc.

> You're welcome to still think they're not useful for you, but that's the way we currently think about it and our goal is to make useful tools, not lofty promises of replacing humans at tasks.

I don't think I'm the ideal person to be offering advice. Because I would never phrase the problem statement as "we have to give users to tools to verify if the confident sounding thing lied this time" I know far too much about both human nature, and alarm fatigue. So I can only reject your hypothetical, and ask what if you didn't have to do something I worry will make the world worse.

I attribute a large portion of the vitriol, anger, and divisiveness that has become pervasive, and is actively harming people and communities; as stemming directly from modern algorithmic recommendation systems. These systems prioritize speed, and being first, above the truth. Or they rank personalized results, that selectively offer only the content that feels good, and confirms preexisting ideas, to the detriment of reality.

They all tell you want you want to hear, over what is true. It will take a mountain of evidence to assure me, conversational LLMs wont do exactly the same thing, just better or faster. Especially when I could uncharitably summarize your solution to these defects, as merely "encouraging people to do their own research"




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: