Issue #2 resolved

Reserved characters / spaces are double-escaped

hjkelly
created an issue

Hey, I'm really loving your template tags. :)

I did just attempt using add_params with a datetime, though, and I hit a snag: it comes out as: /?my_date=2013-10-01%252000%3A00%3A00 What tipped me off was the %2520 where there should have been a space (separating the date from the time). Of course, this can't be parsed naturally on the receiving end.

I dug deeper, and it looks like the problem is that update_query_data stores the values using iri_to_uri (which encodes the space), and then get_full_path uses urlquote (which re-encodes the space's %20 and encodes the time's colons as well).

So the overall process looks like this:

>>> from url_tools.helper import UrlHelper
>>> url = UrlHelper('/')
>>> url.update_query_data(test='1111-11-11 22:22:22')
>>> print url.query_dict
<QueryDict: {u'test': [u'1111-11-11%2022:22:22']}>
>>> print url.get_full_path()
/?test=1111-11-11%252022%3A22%3A22
>>> print url.get_query_string()  # more specifically (get_full_path calls this)
test=1111-11-11%252022%3A22%3A22

I took a crack at addressing this myself (so as not to be an empty complainer). I chose to alter update_query_data since the other setters (i.e. overload_params) don't seem to do any escaping, rather than changing the outgoing get_query_string.

Specifically, my changes were to lines 37 and 39 of helpers.py, removing iri_to_uri and passing in the raw val / v instead:

    def update_query_data(self, **kwargs):
        for key in kwargs:
            val = kwargs[key]
            if hasattr(val, '__iter__'):
                self.query_dict.setlist(key, [v for v in val])
            else:
                self.query_dict[key] = val

This solved the problem totally for me, but I'm only using add_params and del_params via the template tags.

Thoughts? I can make a formal change if you agree!

Comments (17)

  1. Marc Kirkwood

    Watching this issue, hoping it is resolved soon so I can use this lovely module! I was expecting the querystring values themselves to be form encoded, e.g. 'Space+here' instead of 'Space%20here' -- looks like Django's urlquote_plus could be used for this, as it uses urllib.quote_plus. Am I missing something, or is that not preferable as a default case? Edit: Nevermind that, turns out that QueryDict.urlencode() does the job of course. As hjkelly said, it appears that iri_to_uri() is unnecessary in update_query_data().

  2. hjkelly reporter

    Branko, looks like you're taking care of the code tweaks. If that's wrong, lemme know and I can fumble through the contributory process and try to get a patch to you. :)

  3. Marc Kirkwood

    I am working on it now, got a test for it. Just need to learn enough git to see me through, as you're one of the funny people who use Bitbucket without hg ;)

  4. Marc Kirkwood

    See comment regarding lack of string coercion for non-string values: https://bitbucket.org/trojjer/django-url-tools/commits/bc0958749c032a4839a888775e3e09c09432d9e5

    Either smart_text, force_text or six.text_type(value if six.PY3 else bytes(value)) could be used for this in update_query_data -- the latter seems preferable, given comments in Django's bytes_to_text function regarding querystring data conversion. Unfortunately, that function can't be used in this case as it does nothing!

  5. Marc Kirkwood

    Of course, if this project isn't targeting Python 3 at all yet it can be simplified to unicode(bytes(value)). It's a bit annoying that bytes_to_text doesn't do this, given the seemingly undocumented expectation that values stored in QueryDicts must be Unicode strings.

  6. Marc Kirkwood

    Ah, so confusing! Seems that unicode(1) is essentially the same as unicode(bytes(1)) for this purpose after all, at least in Py 2.7.3 at work. It seems that errors='replace' is an important kwarg for querystring data though, and to preserve the QueryDict's internal encoding the encoding kwarg can be specified. Funnily enough, converting the value with bytes then becomes necessary in 2.x!

  7. Branko Vukelic

    Marc Kirkwood hjkelly When setting values for query params in the UrlHelper, I think it's best we leave values as they are (no coercion to unicode). Since the helpers is going to call iri_to_uri on the values when getting the full URL, I see no reason we we should do anything to the values until needed (doing the opposite created this bug to begin with). As far as no other test case breaks, I think it's fine to change the three failing cases to expect ints. I doubt anyone depends on the particular behavior we have right now. Thoughts?

  8. Log in to comment