Standard 2.1.1 - Possibly overly broad treatment of @ character

Issue #355 resolved
Michael Jones created an issue

The spec currently says "2.Any Identifier that contains the character '@' in any other position other than the first position must be treated as an e-mail address." About this, Yaron Goland wrote:

This seems dangerous to me since @ is a perfectly legal character in a URL including the HTTP URL. I wonder if we should say that we know how to specifically handle e-mail URLs and provide the parsing rules there. This would come straight out of RFC 6068. In other words we can call out mailto URLs for special treatment.

Is he right here/

Comments (4)

  1. Michael Jones reporter

    He wrote a lot more later in the Discovery draft:

    RFC 5322 defines the addr-spec production. Now the BNF format is a bit ancient but I’m pretty sure that just looking for an @ won’t work. For example, the local-part can contain CFWS (white space) which can contain a comment which can contain ccontent which can contain ctext which can contain an @ character.

    The local-part can also contain a quoted-string which contains qtext which contains the @ character as well.

    I’m sure I can come up with more places that @ can show up but you get the idea.

    Oh and btw, yes, @ can also appear in the domain as well. Check out the domain-literal production which contains dtext which contains @.

    So there are @s all over the damn place. So you can’t just do a search for the first @ or the last @, neither is conclusive evidence that you have properly split an e-mail address.

    So this brings up a couple of really fun issues: Normalizing e-mail addresses – So if a user types in JOE@Foo.COM does that match a principal joe@foo.com? You could argue that the domains match since domains are case insensitive (although how that applies to the domain-literal production I have no damn clue) but the local-part is case sensitive. Except NOBODY on earth actually treats the local-part as case sensitive. But remember, RPs have to care about this crap. I think you had the right idea here when we said it’s up to the server to figure it out. So we should say something to the effect that the e-mail address entered by a user IS NOT normalized and that the authorization server will have to handle normalization. Note however that we currently haven’t defined that SWD has to return the normalized value. Do we care? I’m not sure.

    Finding the domain – This one is a real pain in the general case. Now my strong guess is that most clients don’t care. The reason is that they are going to get this value from their UX and their UX won’t allow for any of the weird behaviors allowed by the addr-spec production. So my suggestion is that we cheat and just say that the domain as defined in the addr-spec production of RFC 5322 is the host and call it a day.

  2. Michael Jones reporter

    Fix #354: Discovery 2.1 - Identifier normalization rules are not extensible Fix #355: Standard 2.1.1 - Possibly overly broad treatment of @ character Fix #356: Discovery 2.1.3 - Extracting hosts from URIs

    2981ff120fd7

  3. Log in to comment