Side remark: It is not always clear/obvious what's the other case of a given character is, and may change over time. For example, the German capital ß was added in 2008 to Unicode. So it's best to avoid case sensitivity where you can, in general programming.
The top stackoverflow answer you link to disagrees with you: "In practice though, no widely used mail systems distinguish different addresses based on case."
The standard says one thing, yet implementers do another. In this case following the letter of the standard gets you in trouble in the real world.
Yeah, case-sensitive email addressing seems like a horrid idea for a standard. For exactly the reason pointed out, that using only lowercase could result in the wrong person receiving emails. Expecting users who type in email addresses to respect case-sensitivity is wishful thinking at best.
> Expecting users who type in email addresses to respect case-sensitivity is wishful thinking at best.
I agree. First, you have tons of websites using the wrong input field (“text” instead of “email”) which often results in capitalized inputs without user intent. Then you have the non-techies who would absolutely not remember this little gotcha, and put in randomly interchangeable casing depending on who knows what. Some people still thinks capitalization looks more formal and correct, for instance.
So what’s the benefit of adhering to the standard strictly? Nothing that solves real-world issues afaik. There is only downside: very simple impersonation attacks.
That said, there is a middle ground. Someone put it like this: store and send user input the way they entered it but used the canonical address for testing equality, eg in the database.
> handling case insensitivity in Unicode bug for bug compatible with email providers.
The official email standards basically say to treat email addresses as a binary format. You aren't even allowed to do NFC / NFD / NFKC etc normalization.
Unicode has some standards which are slightly better, but they're only for email providers to restrict registering new email addresses, and it still doesn't suggest case-insensitivity.
I'm tempted to write an email standard called "Sane Email" that allows providers to opt into unicode normalization, case insensitivity (in a well-defined way), and sane character restrictions (like Unicode's UTS #39).
Currently the standards allow for pretty much _any_ unicode characters, including unbalanced right-to-left control characters, and possibly even surrogates.
Websites are supposed to store email addresses as opaque binary strings.
I think the overly permissive standards are what are holding back unicode email addresses.
I had case where you could register with your email and the email then became your username which was used to login. The email was case insensitive but the username created from the email was case sensitive, if you created the account with uppercase capital letter email, that was your username and case insensitive email didn’t work to login.
Let’s abolish capitalization altogether. Such a waste. And time zones next. UTC FTW! Then we can do away with languages, and cultural differences. Gazillions of LOC down the drain. I can hear the repositories shrinking already
While in theory it's true, in practice I've seen multiple systems that due to early implementation bugs they had multiple cases for the same email, which obviously were from the same person, think JohnDoe@example.com on one user entry and johndoe@example.com in another user entry. These were also coming from different systems, which made it all even more troublesome.
I would argue the risk matrix of having case-insensitive emails looks much better than the risk matrix of having case-sensitive emails (meaning, you should lowercase all the emails and thus The Copenhagen Book is right, once again).
It's worth noting that most reputable transaction email services only accept ASCII characters in email addresses so it's at least worth notifying the user that non ASCII emails are not allowed.
In most cases an accented character is a typo. If you have a non ASCII email I guess you are used to pain on the internet.
That still doesn't make it a good idea to normalize to lowercase. Some people are very particular about capitalization.
MacAdam is a surname, like the Scottish engineer John Loudon McAdam who invented the road construction known as "macadam". "Sandy.MacAdam@example.com" comes across rather different than "sandy.macadam@example.com".
A hypothetical DrAbby@example.com probably would prefer keeping that capitalization over "drabby@example.com".
I'm sure there are real-world examples.
On a related note, I knew someone with an Irish O'Surname who was very particular that the computer systems support his name. (As https://stackoverflow.com/questions/8527180/can-there-be-an-... puts it, "People do have email addresses with apostrophes. I see them not infrequently, and have had to fix bugs submitted by angry Hibernians.") No doubt some of them also want to see the correct capitalization be used.
A possibly better alternative is to recommend that the normalization be used only for internal use, while using the user-specified address for actual email messages, and to at least note some of the well-known issues with normalizing to lower-case.
> A hypothetical DrAbby@example.com probably would prefer keeping that capitalization over "drabby@example.com".
And they can keep that capitalization when they type in their login or otherwise share their email address with the world. Are you suggesting that this Dr. Abby user would be offended that the website’s authentication infrastructure ends up working with it as lowercase?
> Email addresses are case-insensitive.
From https://thecopenhagenbook.com/email-verification
The email standard says they are case sensitive.
If you lowercase emails during send operations, the wrong person may get the email. That's bad for auth.
Some (many) popular email providers choose to offer only case-insensitive emails. But a website about general auth should recommend the general case.
https://stackoverflow.com/questions/9807909/are-email-addres...
Side remark: It is not always clear/obvious what's the other case of a given character is, and may change over time. For example, the German capital ß was added in 2008 to Unicode. So it's best to avoid case sensitivity where you can, in general programming.