Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
[flagged] Hacker Leaks 3.3B Emails and Yes Every Single One Is Unique (hackerdose.com)
20 points by Wasserpuncher on Sept 22, 2024 | hide | past | favorite | 24 comments


Email addresses, not emails. Completely misleading title.


Sure it's sloppy. But why would the title mention uniqueness if we were talking about emails?


> why would the title mention uniqueness if we were talking about emails?

Why wouldn’t it? Doesn’t the concept of uniqueness apply to emails just like it does to email addresses? I don’t understand your point.


Email contents and email addresses are both text. Yes, the concept still applies. The point is that email contents are almost never the same to begin with. If we're including the timestamp, then every email is almost unique by default. Mentioning that the email addresses are unique is making the point that we've identified just as many people [1], which is interesting. The statement "these 3.3 billion emails are unique" is much less interesting, because we've identified messages and not people. Also, people are usually more concerned with the information in the email rather than the count. Most/a lot of the value of an email comes from the information in it.

If I were to release 3.3 billion emails between random low-profile office workers (let's say) which contain nothing interesting, I'm not so sure that would make a headline.

[1] just as many or THEREABOUT*


Why would anyone assume Hacker news titles are maximally interesting? In practice they often aren't. I am with the OP on this one.

Also 3.3 billion unique emails are strictly more interesting than just the addresses since an email includes adresseses and a subject line by definition.


> The point is that email contents are almost never the same to begin with

Obviously, if you have emails that were generated in different events, thus having different Message-ID and timestamp fields, they will be unique.

But non-uniqueness could crop up in a dataset for various reasons. As the simplest example, imagine this guy aggregated datasets A, B, and C, but it turns out C was itself already an aggregate of A and B. Then all the emails in A and B would be duplicated in the final dataset.

So of course when publishing some huge collection of data from many different sources, it's useful to make sure each piece of data is unique, and the title is just pointing out that for this data set, that has indeed been done. This logic applies whether the data is messages or addresses.

If you just look at the body text, and not the headers, it is even less likely for emails to be unique due to mass spam.

> Mentioning that the email addresses are unique is making the point that we've identified just as many people, which is interesting.

No it isn't. He didn't say that the addresses correspond to unique _people_, just that they are unique addresses, textually. The mapping of email addresses to people is not even close to one-to-one.

> Also, people are usually more concerned with the information in the email rather than the count. Most/a lot of the value of an email comes from the information in it.

But the article/headline isn't just saying a count was published, it's saying the emails themselves were leaked. If this meant email messages rather than addresses, then it would indeed mean the valuable information in the emails had been compromised. Why are you saying that wouldn't be interesting?

> If I were to release 3.3 billion emails between random low-profile office workers (let's say) which contain nothing interesting, I'm not so sure that would make a headline.

I think it would, assuming they were between humans and not just spam. A leak of 3.3 billion ostensibly private messages, on any platform (email, twitter DMs, whatever), would be by far the most serious data breach in the history of the internet.


I assumed it was email messages, not email addresses, and the de-duplication was for things like mass mailings, spams, and the like.

(As one example, strip tracking urls and web bugs to identify that two messages with different bytes are auto-generated from a unique template.)

With email messages it would be possible to train based on interests, for example to get a generative AI to create more targeted phishing messages.


Also, the data was already leaked. The “hacker” just cleaned it.


Well he didn't leak it. It was already leaked. He just collected all the emails from the previous breaches and showing how bad the leaks are.


Who really cares? e-mail like SMS and phonecalls is 99% e-generated garbage.

With my phone, if you're not in the directory you have to leave voicemail. If you leave voicemail then based on what the voicemail is, I might or might not respond or I might just block the number.

With email, everything is automatically reported as spam unless it's in the whitelist. No exceptions.

SMS is harder to deal with but I can and do report SMS spam.


He didn’t leak them, he just collected whatever is circulating around, cleaned it (with a regex expression used by Troy Hunt of HaveIBeenPwned) and then distributed. The post is not clear, but from the screenshot of BreachForums it seems to be email AND password, not just emails.


1. Email addresses, not emails.

2. It’s just cleaning data from prior leaks. It doesn’t seem like anything new has been leaked?

3. Do people consider their email address all that private? I list most of mine publicly.


they are email address/password pairs.


Emails, or email addresses?


Agree, "email contents" is exactly what I read the title as as well.

3.3B email addresses isn't as impressive if it isn't with other info like passwords or accounts

> Oh, and in case you’re wondering, this represents about one out of every four individuals on Earth.

I thought there were 8 billion people, not 13B


A staggering 3.3 billion unique email addresses were gathered from compromised websites.

Yet the author switches like every other time, demonstrating they have no idea what an email actually is.


Seems to just be email addresses


Email addresses, obviously.


I mean it's accurate, but I definitely wouldn't call it obvious due to poor writing.

> Hacker Leaks 3.3 Billion Emails and Yes Every Single One Is Unique

Okay, so emails?

> A staggering 3.3 billion unique emails were leaked in an underground forum.

Still points to emails?

> Imagine waking up to find your email address among the 3.3 billion unique addresses floating around the dark reaches of the internet.

Ah, email addresses actually!


I used “hide my email” many times in the past and recently transitioned to my own mail server which allows me to catch all emails.

Might download the set later and see if any of my aliased accounts are “pwned”


Is there any reason to suspect every single one is valid? I have some experience with breached password collections, and at least 80% of entries is fake (even more for larger collections).


A hacker leaked 3.3 billion emails from multiple public breaches, because who needs privacy anymore?


It is email addresses and they arguably were already leaked. He just cleaned them.


Seems like an Ai written article




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: