Hacker Leaks 3.3B Emails and Yes Every Single One Is Unique

umanwizard · on Sept 22, 2024

Email addresses, not emails. Completely misleading title.

zero-sharp · on Sept 22, 2024

Sure it's sloppy. But why would the title mention uniqueness if we were talking about emails?

umanwizard · on Sept 22, 2024

> why would the title mention uniqueness if we were talking about emails?

Why wouldn’t it? Doesn’t the concept of uniqueness apply to emails just like it does to email addresses? I don’t understand your point.

zero-sharp · on Sept 22, 2024

Email contents and email addresses are both text. Yes, the concept still applies. The point is that email contents are almost never the same to begin with. If we're including the timestamp, then every email is almost unique by default. Mentioning that the email addresses are unique is making the point that we've identified just as many people [1], which is interesting. The statement "these 3.3 billion emails are unique" is much less interesting, because we've identified messages and not people. Also, people are usually more concerned with the information in the email rather than the count. Most/a lot of the value of an email comes from the information in it.

If I were to release 3.3 billion emails between random low-profile office workers (let's say) which contain nothing interesting, I'm not so sure that would make a headline.

[1] just as many or THEREABOUT*

rational_indian · on Sept 22, 2024

Why would anyone assume Hacker news titles are maximally interesting? In practice they often aren't. I am with the OP on this one.

Also 3.3 billion unique emails are strictly more interesting than just the addresses since an email includes adresseses and a subject line by definition.

umanwizard · on Sept 22, 2024

> The point is that email contents are almost never the same to begin with

Obviously, if you have emails that were generated in different events, thus having different Message-ID and timestamp fields, they will be unique.

But non-uniqueness could crop up in a dataset for various reasons. As the simplest example, imagine this guy aggregated datasets A, B, and C, but it turns out C was itself already an aggregate of A and B. Then all the emails in A and B would be duplicated in the final dataset.

So of course when publishing some huge collection of data from many different sources, it's useful to make sure each piece of data is unique, and the title is just pointing out that for this data set, that has indeed been done. This logic applies whether the data is messages or addresses.

If you just look at the body text, and not the headers, it is even less likely for emails to be unique due to mass spam.

> Mentioning that the email addresses are unique is making the point that we've identified just as many people, which is interesting.

No it isn't. He didn't say that the addresses correspond to unique _people_, just that they are unique addresses, textually. The mapping of email addresses to people is not even close to one-to-one.

> Also, people are usually more concerned with the information in the email rather than the count. Most/a lot of the value of an email comes from the information in it.

But the article/headline isn't just saying a count was published, it's saying the emails themselves were leaked. If this meant email messages rather than addresses, then it would indeed mean the valuable information in the emails had been compromised. Why are you saying that wouldn't be interesting?

> If I were to release 3.3 billion emails between random low-profile office workers (let's say) which contain nothing interesting, I'm not so sure that would make a headline.

I think it would, assuming they were between humans and not just spam. A leak of 3.3 billion ostensibly private messages, on any platform (email, twitter DMs, whatever), would be by far the most serious data breach in the history of the internet.

eesmith · on Sept 22, 2024

I assumed it was email messages, not email addresses, and the de-duplication was for things like mass mailings, spams, and the like.

(As one example, strip tracking urls and web bugs to identify that two messages with different bytes are auto-generated from a unique template.)

With email messages it would be possible to train based on interests, for example to get a generative AI to create more targeted phishing messages.

MattGaiser · on Sept 22, 2024

Also, the data was already leaked. The “hacker” just cleaned it.

newhotelowner · on Sept 22, 2024

Well he didn't leak it. It was already leaked. He just collected all the emails from the previous breaches and showing how bad the leaks are.

pussygrabber · on Sept 22, 2024

Who really cares? e-mail like SMS and phonecalls is 99% e-generated garbage.

With my phone, if you're not in the directory you have to leave voicemail. If you leave voicemail then based on what the voicemail is, I might or might not respond or I might just block the number.

With email, everything is automatically reported as spam unless it's in the whitelist. No exceptions.

SMS is harder to deal with but I can and do report SMS spam.

NKosmatos · on Sept 22, 2024

He didn’t leak them, he just collected whatever is circulating around, cleaned it (with a regex expression used by Troy Hunt of HaveIBeenPwned) and then distributed. The post is not clear, but from the screenshot of BreachForums it seems to be email AND password, not just emails.

MattGaiser · on Sept 22, 2024

1. Email addresses, not emails.

2. It’s just cleaning data from prior leaks. It doesn’t seem like anything new has been leaked?

3. Do people consider their email address all that private? I list most of mine publicly.

londons_explore · on Sept 22, 2024

they are email address/password pairs.

47282847 · on Sept 22, 2024

Emails, or email addresses?

…

Aardwolf · on Sept 22, 2024

Agree, "email contents" is exactly what I read the title as as well.

3.3B email addresses isn't as impressive if it isn't with other info like passwords or accounts

> Oh, and in case you’re wondering, this represents about one out of every four individuals on Earth.

I thought there were 8 billion people, not 13B

magicalhippo · on Sept 22, 2024

A staggering 3.3 billion unique email addresses were gathered from compromised websites.

Yet the author switches like every other time, demonstrating they have no idea what an email actually is.

Workaccount2 · on Sept 22, 2024

Seems to just be email addresses

jasongill · on Sept 22, 2024

Email addresses, obviously.

input_sh · on Sept 22, 2024

I mean it's accurate, but I definitely wouldn't call it obvious due to poor writing.

> Hacker Leaks 3.3 Billion Emails and Yes Every Single One Is Unique

Okay, so emails?

> A staggering 3.3 billion unique emails were leaked in an underground forum.

Still points to emails?

> Imagine waking up to find your email address among the 3.3 billion unique addresses floating around the dark reaches of the internet.

Ah, email addresses actually!

xyst · on Sept 22, 2024

I used “hide my email” many times in the past and recently transitioned to my own mail server which allows me to catch all emails.

Might download the set later and see if any of my aliased accounts are “pwned”

poincaredisk · on Sept 22, 2024

Is there any reason to suspect every single one is valid? I have some experience with breached password collections, and at least 80% of entries is fake (even more for larger collections).

Wasserpuncher · on Sept 22, 2024

A hacker leaked 3.3 billion emails from multiple public breaches, because who needs privacy anymore?

MattGaiser · on Sept 22, 2024

It is email addresses and they arguably were already leaked. He just cleaned them.

forgot_user1234 · on Sept 22, 2024

Seems like an Ai written article