Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This is wrong:

> Email addresses are case-insensitive.

From https://thecopenhagenbook.com/email-verification

The email standard says they are case sensitive.

If you lowercase emails during send operations, the wrong person may get the email. That's bad for auth.

Some (many) popular email providers choose to offer only case-insensitive emails. But a website about general auth should recommend the general case.

https://stackoverflow.com/questions/9807909/are-email-addres...

Side remark: It is not always clear/obvious what's the other case of a given character is, and may change over time. For example, the German capital ß was added in 2008 to Unicode. So it's best to avoid case sensitivity where you can, in general programming.



The top stackoverflow answer you link to disagrees with you: "In practice though, no widely used mail systems distinguish different addresses based on case."

The standard says one thing, yet implementers do another. In this case following the letter of the standard gets you in trouble in the real world.


Yeah, case-sensitive email addressing seems like a horrid idea for a standard. For exactly the reason pointed out, that using only lowercase could result in the wrong person receiving emails. Expecting users who type in email addresses to respect case-sensitivity is wishful thinking at best.


> Expecting users who type in email addresses to respect case-sensitivity is wishful thinking at best.

I agree. First, you have tons of websites using the wrong input field (“text” instead of “email”) which often results in capitalized inputs without user intent. Then you have the non-techies who would absolutely not remember this little gotcha, and put in randomly interchangeable casing depending on who knows what. Some people still thinks capitalization looks more formal and correct, for instance.

So what’s the benefit of adhering to the standard strictly? Nothing that solves real-world issues afaik. There is only downside: very simple impersonation attacks.

That said, there is a middle ground. Someone put it like this: store and send user input the way they entered it but used the canonical address for testing equality, eg in the database.


The other side of that is handling case insensitivity in Unicode bug for bug compatible with email providers.


> handling case insensitivity in Unicode bug for bug compatible with email providers.

The official email standards basically say to treat email addresses as a binary format. You aren't even allowed to do NFC / NFD / NFKC etc normalization.

https://github.com/whatwg/html/issues/4562#issuecomment-2096...

Unicode has some standards which are slightly better, but they're only for email providers to restrict registering new email addresses, and it still doesn't suggest case-insensitivity.

https://www.unicode.org/reports/tr39/#Email_Security_Profile...

I'm tempted to write an email standard called "Sane Email" that allows providers to opt into unicode normalization, case insensitivity (in a well-defined way), and sane character restrictions (like Unicode's UTS #39).

Currently the standards allow for pretty much _any_ unicode characters, including unbalanced right-to-left control characters, and possibly even surrogates.

Websites are supposed to store email addresses as opaque binary strings.

I think the overly permissive standards are what are holding back unicode email addresses.


My practice on this is to store the user-provided case, but do case insensitive lookups.

This means that you send emails to the case-sensitive address originally entered, but the user is free to login case insensitively.

The downside is that you cannot have two distinct users with emails that only differ in their case. But I feel rather OK about that.


For applications storing user data on Postgres, the citext (case-insensitive text) type does just that.

https://www.postgresql.org/docs/current/citext.html


Seems like the optimal solution to me


Yes, this is also what we do.

It also allows adding exceptions, in case a customer shows up where you do need to support two users that differ only in casing.


I had case where you could register with your email and the email then became your username which was used to login. The email was case insensitive but the username created from the email was case sensitive, if you created the account with uppercase capital letter email, that was your username and case insensitive email didn’t work to login.


Let’s abolish capitalization altogether. Such a waste. And time zones next. UTC FTW! Then we can do away with languages, and cultural differences. Gazillions of LOC down the drain. I can hear the repositories shrinking already


While in theory it's true, in practice I've seen multiple systems that due to early implementation bugs they had multiple cases for the same email, which obviously were from the same person, think JohnDoe@example.com on one user entry and johndoe@example.com in another user entry. These were also coming from different systems, which made it all even more troublesome.

I would argue the risk matrix of having case-insensitive emails looks much better than the risk matrix of having case-sensitive emails (meaning, you should lowercase all the emails and thus The Copenhagen Book is right, once again).


It's worth noting that most reputable transaction email services only accept ASCII characters in email addresses so it's at least worth notifying the user that non ASCII emails are not allowed.

In most cases an accented character is a typo. If you have a non ASCII email I guess you are used to pain on the internet.


That still doesn't make it a good idea to normalize to lowercase. Some people are very particular about capitalization.

MacAdam is a surname, like the Scottish engineer John Loudon McAdam who invented the road construction known as "macadam". "Sandy.MacAdam@example.com" comes across rather different than "sandy.macadam@example.com".

A hypothetical DrAbby@example.com probably would prefer keeping that capitalization over "drabby@example.com".

I'm sure there are real-world examples.

On a related note, I knew someone with an Irish O'Surname who was very particular that the computer systems support his name. (As https://stackoverflow.com/questions/8527180/can-there-be-an-... puts it, "People do have email addresses with apostrophes. I see them not infrequently, and have had to fix bugs submitted by angry Hibernians.") No doubt some of them also want to see the correct capitalization be used.

A possibly better alternative is to recommend that the normalization be used only for internal use, while using the user-specified address for actual email messages, and to at least note some of the well-known issues with normalizing to lower-case.


> A hypothetical DrAbby@example.com probably would prefer keeping that capitalization over "drabby@example.com".

And they can keep that capitalization when they type in their login or otherwise share their email address with the world. Are you suggesting that this Dr. Abby user would be offended that the website’s authentication infrastructure ends up working with it as lowercase?


I am suggesting that showing the normalized name, perhaps using it as the "To:" in an email, or presented in the UI, may annoy some users.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: