More

rzmmm · 2026-01-18T07:30:47 1768721447

The model has multiple layers of mechanisms to prevent carbon copy output of the training data.

glemion43 · 2026-01-18T09:32:00 1768728720

Do you have a source for this?

Carbon copy would mean over fitting

fweimer · 2026-01-18T15:52:22 1768751542

I saw weird results with Gemini 2.5 Pro when I asked it to provide concrete source code examples matching certain criteria, and to quote the source code it found verbatim. It said it in its response quoted the sources verbatim, but that wasn't true at all—they had been rewritten, still in the style of the project it was quoting from, but otherwise quite different, and without a match in the Git history.

It looked a bit like someone at Google subscribed to a legal theory under which you can avoid copyright infringement if you take a derivative work and apply a mechanical obfuscation to it.

Der_Einzige · 2026-01-18T16:34:20 1768754060

Source is just read the definition of what "temperature" is.

But honestly source = "a knuckle sandwich" would be appropriate here.

TZubiri · 2026-01-18T07:50:16 1768722616

forgive the skepticism, but this translates directly to "we asked the model pretty please not to do it in the system prompt"

ComplexSystems · 2026-01-18T16:37:57 1768754277

The model doesn't know what its training data is, nor does it know what sequences of tokens appeared verbatim in there, so this kind of thing doesn't work.

ffsm8 · 2026-01-18T08:18:49 1768724329

It's mind boggling if you think about the fact they're essential "just" statistical models

It really contextualizes the old wisdom of Pythagoras that everything can be represented as numbers / math is the ultimate truth

glemion43 · 2026-01-18T09:33:50 1768728830

They are not just statistical models

They create concepts in latent space which is basically compression which forces this

jrmg · 2026-01-18T15:20:07 1768749607

You’re describing a complex statistical model.

mmooss · 2026-01-18T16:13:34 1768752814

What is "latent space"? I'm wary of metamagical descriptions of technology that's in a hype cycle.

GrowingSideways · 2026-01-18T09:06:02 1768727162

How so? Truth is naturally an apriori concept; you don't need a chatbot to reach this conclusion.

mikaraento · 2026-01-18T08:30:06 1768725006

That might be somewhat ungenerous unless you have more detail to provide.

I know that at least some LLM products explicitly check output for similarity to training data to prevent direct reproduction.

guenthert · 2026-01-18T16:19:04 1768753144

Should they though? If the answer to a question^Wprompt happens to be in the training set, wouldn't it be disingenuous to not provide that?

ttctciyf · 2026-01-18T17:13:55 1768756435

Maybe it's intended to avoid legal liability resulting from reproducing copyright material not licensed for training?

efskap · 2026-01-18T08:34:13 1768725253

Would it really be infeasible to take a sample and do a search over an indexed training set? Maybe a bloom filter can be adapted

hexaga · 2026-01-18T09:26:56 1768728416

It's not the searching that's infeasible. Efficient algorithms for massive scale full text search are available.

The infeasibility is searching for the (unknown) set of translations that the LLM would put that data through. Even if you posit only basic symbolic LUT mappings in the weights (it's not), there's no good way to enumerate them anyway. The model might as well be a learned hash function that maintains semantic identity while utterly eradicating literal symbolic equivalence.

Den_VR · 2026-01-18T08:43:34 1768725814

Unfortunately.

GeoAtreides · 2026-01-18T15:00:07 1768748407

does it?

this is a verbatim quote from gemini 3 pro from a chat couple of days ago:

"Because I have done this exact project on a hot water tank, I can tell you exactly [...]"

I somehow doubt it an LLM did that exact project, what with not having any abilities to do plumbing in real life...

retsibsi · 2026-01-18T15:09:33 1768748973

Isn't that easily explicable as hallucination, rather than regurgitation?

ttctciyf · 2026-01-18T17:15:00 1768756500

Those are not mutually exclusive in this instance, it seems.

rzmmm · 2026-01-17T14:48:25 1768661305

Someone presented a hypothetical scenario: What if a hacker would write a virus, which breached a totally unprotected database after the hacker has passed away. It's clear that the therapy provider is at least partially responsible.

reactordev · 2026-01-17T14:54:29 1768661669

Posthumous crime is the ultimate because the legal system is all about punishing the living until they are dead.

divan · 2026-01-17T17:51:26 1768672286

In a retributive just culture, yes.

https://www.patientsafety.com/en/blog/human-error-retributiv...

reactordev · 2026-01-17T20:57:21 1768683441

If only human beings were good at learning from past mistakes. It requires multiple tries before we realize, fire bad, unless good, if controlled.

rzmmm · 2026-01-15T09:11:37 1768468297

I believe the industry has largely accepted that prompt injection is inherent part of LLM tech.

rzmmm · 2026-01-15T08:58:55 1768467535

There are many open-source toy browser implementations available, so this seems quite likely.

rzmmm · 2026-01-13T13:38:36 1768311516

I'm hopeful. The open source AI ecosystem could benefit from large players like Mozilla making moves.

wolvoleo · 2026-01-13T15:34:57 1768318497

In what world is Mozilla large?

rzmmm · 2026-01-14T09:28:56 1768382936

Maybe influential would be a better word choice. Firefox has 100M+ users.

rzmmm · 2026-01-11T10:15:02 1768126502

Has anyone analyzed what is the proportion of AI-related submissions over time in HN?

etyhhgfff · 2026-01-11T20:05:30 1768161930

Here is something like that https://beuke.org/hn-ai-coverage/

rzmmm · 2026-01-09T12:50:20 1767963020

This proposal seems solid. I personally also like how many scientific journals have added a mandatory AI disclosure in publication. Practically it's one or two sentences how (or if) Gen AI was used.

"ChatGPT model GPT-5.2 was used to identify spelling errors"

"Google Gemini 3 was used to generate the abstract of the paper".

gus_massa · 2026-01-09T17:51:08 1767981068

"Whatever Overleaf has was used to identify spelling errors"

"Google Docs AI (whatever the name is, Gemini) has was used to identify spelling, grammar and idioms errors"

"Gemini in Google Search has been used to understand how to use obscure Fortan 77 instruction"

...

rzmmm · 2026-01-11T10:19:16 1768126756

Along those lines, yes. Often journals ask specifically about generative AI, so other types of AI tools don't require disclosure.

rzmmm · 2026-01-07T23:46:31 1767829591

It doesn't happen in non-diabetic people. It's different in type 2 diabetics who will see large swings in blood fat and glucose after meals.

rzmmm · 2026-01-07T23:15:07 1767827707

I'm skeptical that paleo diet would be healthy for long term. There are studies where they find atherosclerosis in pre-industrial hunter-gatherer remains. It's called HORUS study.

aldarion · 2026-01-08T15:39:29 1767886769

From what I've managed to find in the newest research, it apppears that diet does not appear to have any impact on atherosclerosis itself. But, as they say, more data needed.

rzmmm · 2025-12-28T10:33:51 1766918031

> In 2013, meanwhile, researchers in the Netherlands subjected 17 healthy adults to temperatures of 15-16C (59-60.8F) for six hours a day.

It seems that that these articles often discuss cold plunges, cold showers etc. but the rigorous research is often conducted simply via rooms with reduced temperature combined with light clothing.