Issue #87 resolved

Dynamic handling of 'virtual file names', or URL paths with file extensions

Anonymous created an issue

= Motivation =

CherryPy maps URL components to Python objects using the path component itself as if it was the name of the object. However, the rules for valid Python identifiers are much stricter than those for path components in a URL. Most of the time this is not a big problem; however, there are two situations where this can turn out to be an issue:

  1. When the path component is a file name with a file extension;
  2. When the path component contains extended characters, including Unicode escapes.

This ticket covers '''only with the first case''', because it's something simple, with limited scope, but very useful to handle some relatively frequent cases (more on this later). ''The second case should be discussed later and eventually implemented as part of a more generic solution, but that's outside of the scope of this ticket''.

== Using default to handle arbitrary path components ==

The {{{default}}} method can be used to handle arbitrary path components. It is usually called with any non-resolved path arguments as positional arguments, and it can receive arbitrary strings for each component, not only valid Python identifiers. For something as simple as file extensions, this solution involves unnecessary work; it's a idiom that has to be learned and repeated for every new CP2 application. For example:

{{{ def default(self, *args): if args[0] == 'favicon.ico': ... elif args[0] == 'robots.txt': ... = True }}}

== A proposal to handle file extensions in path components ==

The proposed solution is to map the '.' (period) to the '_' (underscore) character. This mapping will be done natively, as part of the core mapping function.

== Example ==

There are two simple files that are usually required for any functional web site: {{{robots.txt}}} and {{{favicon.ico}}}. Both files can benefit from automatic handling. In the former case, CherryPy itself can build the {{{robots.txt}}} file to indicate which parts of the file can be crawled by an external search spider. In the later, the standard CherryPy icon can be provided by a library call, without the need to distribute an external icon file.

Using the mapping proposed above, these file names would be mapped to the following python identifiers: {{{robots_txt}}} and {{{favicon_ico}}}. These names can be used as exposed methods:

{{{ def favicon_ico(self): return cherrypy.lib.cptools.standardCherryPyIcon() = True

def robots_txt(self):
    return cherrypy.lib.cptools.makeRobotsFile() = True


In the example above, both {{{standardCherryPyIcon}}} and {{{makeRobotsFile}}} are arbitrary examples of functions taht could be implemented in the library. However, their implementation is also beyond the scope of this ticket.

Reported by cribeiro

Comments (9)

  1. Anonymous

    This ticket introduced a problem. It broke some code that relied on getting the unmangled filename on default(); instead, the method was receiving the filename already in mangled format. After discussion on the IRC channel, it was decided that the change was to be reverted.

    Instead of reverting it, the ticket is being reopeoned with a new proposal. It's an attempt at a compromise.

    On revision #133, it just searches for the objects with the mangled name, but doesn't keep the URL path mangled. This preserves the original intended behavior and keeps the URL untouched as it's passed to the handling method.

    If this change is deemed unnecessary or harmful, pelase feel free to remove it. It was done quickly mostly to allow some users that were affected by the original change to have their problem solved, while at the same time preserving the basic behavior.

  2. Anonymous

    Maybe somehow pass the file extension to the function? It would allow a different one content-many formats solution ("file.pdf" instead of "file?format=pdf").

  3. Log in to comment