openid / connect / issues / #355 - Standard 2.1.1 - Possibly overly broad treatment of @ character

Michael Jones reporter

marked as major

He wrote a lot more later in the Discovery draft:

RFC 5322 defines the addr-spec production. Now the BNF format is a bit ancient but I’m pretty sure that just looking for an @ won’t work. For example, the local-part can contain CFWS (white space) which can contain a comment which can contain ccontent which can contain ctext which can contain an @ character.

The local-part can also contain a quoted-string which contains qtext which contains the @ character as well.

I’m sure I can come up with more places that @ can show up but you get the idea.

Oh and btw, yes, @ can also appear in the domain as well. Check out the domain-literal production which contains dtext which contains @.

So there are @s all over the damn place. So you can’t just do a search for the first @ or the last @, neither is conclusive evidence that you have properly split an e-mail address.

So this brings up a couple of really fun issues: Normalizing e-mail addresses – So if a user types in JOE@Foo.COM does that match a principal joe@foo.com? You could argue that the domains match since domains are case insensitive (although how that applies to the domain-literal production I have no damn clue) but the local-part is case sensitive. Except NOBODY on earth actually treats the local-part as case sensitive. But remember, RPs have to care about this crap. I think you had the right idea here when we said it’s up to the server to figure it out. So we should say something to the effect that the e-mail address entered by a user IS NOT normalized and that the authorization server will have to handle normalization. Note however that we currently haven’t defined that SWD has to return the normalized value. Do we care? I’m not sure.

Finding the domain – This one is a real pain in the general case. Now my strong guess is that most clients don’t care. The reason is that they are going to get this value from their UX and their UX won’t allow for any of the weird behaviors allowed by the addr-spec production. So my suggestion is that we cheat and just say that the domain as defined in the addr-spec production of RFC 5322 is the host and call it a day.

2011-11-21T21:08:08+00:00

Comments (4)

Michael Jones reporter
Oops - this one is about Discovery - not Standard.
- 2011-11-21T21:05:26+00:00
Michael Jones reporter
- marked as major
He wrote a lot more later in the Discovery draft:

RFC 5322 defines the addr-spec production. Now the BNF format is a bit ancient but I’m pretty sure that just looking for an @ won’t work. For example, the local-part can contain CFWS (white space) which can contain a comment which can contain ccontent which can contain ctext which can contain an @ character.

The local-part can also contain a quoted-string which contains qtext which contains the @ character as well.

I’m sure I can come up with more places that @ can show up but you get the idea.

Oh and btw, yes, @ can also appear in the domain as well. Check out the domain-literal production which contains dtext which contains @.

So there are @s all over the damn place. So you can’t just do a search for the first @ or the last @, neither is conclusive evidence that you have properly split an e-mail address.

So this brings up a couple of really fun issues: Normalizing e-mail addresses – So if a user types in JOE@Foo.COM does that match a principal joe@foo.com? You could argue that the domains match since domains are case insensitive (although how that applies to the domain-literal production I have no damn clue) but the local-part is case sensitive. Except NOBODY on earth actually treats the local-part as case sensitive. But remember, RPs have to care about this crap. I think you had the right idea here when we said it’s up to the server to figure it out. So we should say something to the effect that the e-mail address entered by a user IS NOT normalized and that the authorization server will have to handle normalization. Note however that we currently haven’t defined that SWD has to return the normalized value. Do we care? I’m not sure.

Finding the domain – This one is a real pain in the general case. Now my strong guess is that most clients don’t care. The reason is that they are going to get this value from their UX and their UX won’t allow for any of the weird behaviors allowed by the addr-spec production. So my suggestion is that we cheat and just say that the domain as defined in the addr-spec production of RFC 5322 is the host and call it a day.
- 2011-11-21T21:08:08+00:00
Nat Sakimura
- assigned issue to
  
  Michael Jones
- changed status to open
URL containing @ MUST be percent encoded.
- 2011-11-22T00:39:35+00:00
Michael Jones reporter
- changed status to resolved
Fix ~~#354~~: Discovery 2.1 - Identifier normalization rules are not extensible Fix ~~#355~~: Standard 2.1.1 - Possibly overly broad treatment of @ character Fix ~~#356~~: Discovery 2.1.3 - Extracting hosts from URIs

→ 2981ff120fd7
- 2011-12-08T01:51:04+00:00
Log in to comment