-
assigned issue to
- changed status to open
Discovery 2.2.3 - "example.com:8080" has a scheme
Regarding to RFC3986, "example.com:8080" will be parsed as following. * scheme = example.com * path = 8080
I think in this case "https" should not be prepended, and rejected due to unknown scheme. (So, someone in the future can extend this spec to URI like "acmetelecom.net:123...".)
Comments (8)
-
-
Account Deleted -
- changed component to Discovery
- edited description
According to RFC 3986, URI is defined as follows.
URI = scheme ":" hier-part [ "?" query ] [ "#" fragment ] hier-part = "//" authority path-abempty / path-absolute / path-rootless / path-empty
Therefore, if example.com:8080 were parsed as a URI, then example.com will not be a scheme but either an authority section or a path.
Since by 2.1.1, the idnetifier in this case is treated as a URL, the first segment is actually treated as the authority section.
Having said that, the normalization rule here needs to be tightened probably.
Note the following defs from the RFC.
authority = [ userinfo "@" ] host [ ":" port ] userinfo = *( unreserved / pct-encoded / sub-delims / ":" ) host = IP-literal / IPv4address / reg-name IP-literal = "[" ( IPv6address / IPvFuture ) "]" IPvFuture = "v" 1*HEXDIG "." 1*( unreserved / sub-delims / ":" )
Clearly, userinfo includes ":". IP-literal also includes ":". Thus, naively treating the segment after the first ":" as a port would result in errors. We need to further constrain the authoirty segment to be reg-name.
BTW, looking at above, perhaps we do not need to treat email style identifier specially. userinfo@host is still a valid authority section in URI. On the other hand, RFC 5322 addr-spec http://tools.ietf.org/html/rfc5322#section-3.4.1 is not suitable for our "email looking" identifier as it may include CRLF etc.
So, I think we should go only with the authority section defined in RFC 3986 and allow only reg-name in host.
-
Account Deleted Therefore, if example.com:8080 were parsed as a URI, then example.com will not be a scheme but either an authority section or a path.
If so, what is the scheme of "example.com:8080"? Consulting RFC 3986, URI contains a scheme and a hier-part. hier-part can be path-rootless, and path-rootless contains one or more segments splitted by "/". Thus, parsing "example.com:8080" would result to a URI with scheme "example.com" and path-rootless "8080" and actually URI parsers in serveral programming languages such as Java, C# and Ruby produce such results.
-
Oh, are you just talking about the example in 2.2.3, and not the normalization rule stated in 2.1.3?
Then you are right. The first segment of relative reference cannot contain a ":". So, if example.com:8080/joe cannot be a relative reference, and the example is wrong. Is that what you are getting at instead of the normalization rule to put https: to the authority section?
At the same time, I do understand people want to support a user input string such as example.com:8080/joe. This is not URI, and this is not even a relative reference. It is something else if we were to normalize it to https://example.com:8080/joe. So, clause 2.1 needs to be fixed. It is closely related to issue
#621. -
- changed status to resolved
-
- changed status to open
Just saying user input is a relative reference did not solve it. Relative reference to contain the authority section, it needed to be prefixed by "//".
-
- changed status to resolved
Fixed
#625by restating relative ref with authority pat-abempty.→ <<cset 250b24348c83>>
- Log in to comment
We should start from the principal extract the authority section (which has a port) and extract host from that.