Remove PAGE from internal link

Issue #192 resolved
dreadnaut created an issue

Originally reported on Google Code with ID 192

Currently our internal links are written using the constant PAGE as base for the href
attribute, and look more or less like this:

<a href='phpliteadmin.php?querystring=etcetc'>something</a>

As far as I know¹ the 'phpliteadmin.php' part is redundant, and ?querystring=etcetc
would be a valid relative url.

<a href='?querystring=etcetc'>something</a>

I also fed the modified code to the xhtml validator and it was accepted.

Although this might seem a small ~16B reduction, we render quite a lot of links in
each page and they sum up to a few KB. E.g., the table list contains 11 links per table,
which means more than 2KB for a 12-tables database.

Moreover, we would remove about 130 occurrences of ' ".PAGE." ' from the code, shaving
another kilobyte :)

[1] http://url.spec.whatwg.org/#relative-state

Reported by dreadnaut on 2013-03-15 17:35:39

Comments (10)

  1. Christopher Kramer
    Yes, I think that's correct. I am only not 100% sure there is not some stupid old browser
    that gets into trouble.
    
    From what I read in standards, I see one thing problematic: Is this what RTF 2396 /
    3986 call a "Same-document Reference"?
    https://tools.ietf.org/html/rfc2396#page-15
    http://www.rfc-editor.org/rfc/rfc3986.txt (chapter 4.4)
    
    "Traversal of such a reference should not result in an additional retrieval action."
    (Forms explicitly excluded.)
    Hmm. I know browsers do a new request (with new GET parameters), but where does the
    standard say they have to?
    
    I mean, it is fine as long as there is no known browser that misbehaves, which I believe
    is true in this case.
    Is there something to support or disprove this?
    
    I just worried whether it is currently like this for a good reason (which does not
    have to be the case ;-) ).
    

    Reported by crazy4chrissi on 2013-03-17 23:51:41

  2. dreadnaut reporter
    Quoting the RFC for easier reference:
    
    4.2. Same-document References
    
       A URI reference that does not contain a URI is a reference to the
       current document.  In other words, an empty URI reference within a
       document is interpreted as a reference to the start of that document,
       and a reference containing only a fragment identifier is a reference
       to the identified fragment of that document.  Traversal of such a
       reference should not result in an additional retrieval action."
    
    This seems to apply to *empty* URIs and an URI is made of many parts, ending in a path
    and an optional query string. The query string is part of the URI, so an empty URI
    has no query string. On the other hand, it can have a fragment id, since 4.1 states
    that fragment identifiers are not part of the URI:
    
    4.1. Fragment Identifier
    
       [...] As such, it is not part of a URI, but is often used in conjunction with a
    URI.
    
    Following the above, my interpretation is:
    
      href=""           ->  not a URI, no retrieval
      href="#fragment"  ->  not a URI, no retrieval
      href="?query=x"   ->  URI, send a request to the server
      <form action=""   ->  form, always request to serve
    
    This means we can remove PAGE from "action" attributes too! :)
    
    At least according to the RFC... how about in the real world? Testing with the few
    browsers I have available, I encountered no trouble with href="" and forms without
    an action attribute, even with older versions of IE.
    
    I found an (invalid) bug report related to this 2002 that confirms the interpretation
    above:
    https://bugzilla.mozilla.org/show_bug.cgi?id=177725
    
    and answers on Stack seem fairly relaxed:
    http://stackoverflow.com/questions/15291307/relative-url-containing-just-the-querystring
    
    I would think this is something that has been around for a long long time (old school
    CGI era) so browsers have been doing the right thing for more than ten, twelve years.
    I was thinking about adding a safety <base> tag, but that seems to create more problems
    that it would (maybe) solve.
    
    
    Slightly unrelated: didn't know about http://www.php.net/http_build_query we could
    investigate if it can simplify our 100+ calls to urlencode().
    

    Reported by dreadnaut on 2013-03-18 14:58:19

  3. dreadnaut reporter
    By the way, I am not too interested in this fix to reduce file size. However, it's one
    step toward simplifying the output code and improving readability.
    

    Reported by dreadnaut on 2013-03-18 15:33:31

  4. Christopher Kramer
    Thanks for your research.
    
    I also did some further research. Regarding an empty form action, I found this one
    interesting:
    https://www.w3.org/Bugs/Public/show_bug.cgi?id=14215#c1
    "browsers do weird things with an empty action="" attribute."
    Although no browser is given as an example, I guess w3c editors know what they are
    talking about.
    
    And whatwg:
    http://www.whatwg.org/specs/web-apps/current-work/multipage/association-of-controls-and-forms.html#attr-fs-action
    "The action and formaction content attributes, if specified, must have a value that
    is a valid non-empty URL potentially surrounded by spaces"
    (Although we are not using HTML5 yet, I guess whatwg has good reasons to require urls
    "non-empty")
    
    So I think we should _not_ use an empty action attribute, but rather have no action
    attribute at all (or specify it completely). In most cases, we specify GET parameters
    anyway.
    
    http_build_query looks useful, but only if we want to encode more than 1 parameter,
    and most of the time table is the only parameter we encode.
    

    Reported by crazy4chrissi on 2013-03-18 16:02:16

  5. dreadnaut reporter
    | So I think we should _not_ use an empty action attribute, but rather have no action
    | attribute at all (or specify it completely).
    
    Agreed!
    
    I'll leave this aside until we close Issue 190 then, and after that I'll prepare a
    patch where I remove PAGE from <a> tags only, leaving only the query string.
    

    Reported by dreadnaut on 2013-03-18 16:29:28 - Blocked on: #190

  6. dreadnaut reporter
    Here's a patch removing most occurrences of PAGE from urls, following the discussion
    above.
    

    Reported by dreadnaut on 2013-04-01 21:29:51 - Status changed: Started

    <hr> * Attachment: lesspage.diff

  7. Christopher Kramer
    Hmm. Leaving out the form action parameter seems to be invalid xhtml 1.1 trans (see
    screenshot).
    
    So as long as we stick to xhtml, I guess we should better not omit it.
    

    Reported by crazy4chrissi on 2013-04-02 21:37:18

    <hr> * Attachment: validator_formaction.png<br>validator_formaction.png

  8. dreadnaut reporter
    I see, I'll add them back and then commit then.
    
    When are we switching to HTML? :)
    

    Reported by dreadnaut on 2013-04-02 22:44:23

  9. dreadnaut reporter
    This issue was closed by revision r399.
    

    Reported by dreadnaut on 2013-04-04 01:12:15 - Status changed: Fixed

  10. Log in to comment