Standard 2.1.1 - Possibly overly broad treatment of @ character
The spec currently says "2.Any Identifier that contains the character '@' in any other position other than the first position must be treated as an e-mail address." About this, Yaron Goland wrote:
This seems dangerous to me since @ is a perfectly legal character in a URL including the HTTP URL. I wonder if we should say that we know how to specifically handle e-mail URLs and provide the parsing rules there. This would come straight out of RFC 6068. In other words we can call out mailto URLs for special treatment.
Is he right here/
Comments (4)
-
reporter -
reporter - marked as major
He wrote a lot more later in the Discovery draft:
RFC 5322 defines the addr-spec production. Now the BNF format is a bit ancient but I’m pretty sure that just looking for an @ won’t work. For example, the local-part can contain CFWS (white space) which can contain a comment which can contain ccontent which can contain ctext which can contain an @ character.
The local-part can also contain a quoted-string which contains qtext which contains the @ character as well.
I’m sure I can come up with more places that @ can show up but you get the idea.
Oh and btw, yes, @ can also appear in the domain as well. Check out the domain-literal production which contains dtext which contains @.
So there are @s all over the damn place. So you can’t just do a search for the first @ or the last @, neither is conclusive evidence that you have properly split an e-mail address.
So this brings up a couple of really fun issues: Normalizing e-mail addresses – So if a user types in JOE@Foo.COM does that match a principal joe@foo.com? You could argue that the domains match since domains are case insensitive (although how that applies to the domain-literal production I have no damn clue) but the local-part is case sensitive. Except NOBODY on earth actually treats the local-part as case sensitive. But remember, RPs have to care about this crap. I think you had the right idea here when we said it’s up to the server to figure it out. So we should say something to the effect that the e-mail address entered by a user IS NOT normalized and that the authorization server will have to handle normalization. Note however that we currently haven’t defined that SWD has to return the normalized value. Do we care? I’m not sure.
Finding the domain – This one is a real pain in the general case. Now my strong guess is that most clients don’t care. The reason is that they are going to get this value from their UX and their UX won’t allow for any of the weird behaviors allowed by the addr-spec production. So my suggestion is that we cheat and just say that the domain as defined in the addr-spec production of RFC 5322 is the host and call it a day.
-
-
assigned issue to
- changed status to open
URL containing @ MUST be percent encoded.
-
assigned issue to
-
reporter - changed status to resolved
- Log in to comment
Oops - this one is about Discovery - not Standard.