Pull requests

#5 Declined
Repository
msabramo/six_msabramo_urllib_parse six_msabramo_urllib_parse
Branch
default
Repository
gutworth/six six
Branch
default

Create six.moves.urllib

Author
  1. Marc Abramowitz
Reviewers
Description

Six currently doesn't have abstractions for a bunch of things which are in urllib in Python 3 and spread out across urllib, urllib2, and urlparse in Python 2.

E.g.:

Stuff in Python 3's urlllib.parse:

  • urlparse, urlunparse, parse_qs, parse_qsl, urljoin, urldefrag, urlsplit, urlunsplit (from urlparse in Python 2)
  • quote, quote_plus, unquote, unquote_plus, urlencode (from urllib in Python 2)

Stuff in Python 3's urlllib.error:

  • URLError, HTTPError (from urllib2 in Python 2)
  • ContentTooShortError (from urllib in Python 2)

Stuff in Python 3's urlllib.request:

  • urlopen, urlretrieve, install_opener, Request, getproxies, etc...

Several times I have used six to port a module to Python 3 and I had to add my own code to handle these. So it would be useful to handle these in six so that everyone that uses six can use them and doesn't need to add their own code.

Stuff in Python 3's urllib.response:

  • addbase, addclosehook, addinfo, and addinfourl.

Stuff in Python 3's urllib.robotparser:

For consistency, I also created six.moves.urllib.robotparser which is the same as the existing six.moves.urllib_robotparser (note underscore) and Python 3's urllib.robotparser.

Comments (40)

  1. Marc Abramowitz author

    Unfortunately, my change to documentation/index.rst in bc5edb44aca9 looks big because I had to make the table wider to accomodate the new entries, so I had to change every line of the table.

    The only lines that changed materially are the ones that have urllib_parse in them.

  2. Benjamin Peterson repo owner

    I've always been wary about mapping the whole urllib2/urlparse mess. Since so many things moved around cross-modules, it's hard to create a sensical naming scheme. Rather than adding several functions at a time like this patch, I would prefer to see a whole compatibility layer for the entire urllib renaming, perhaps by faking the urllib modules on Python 2. This way we could get a nice consistent interface.

    One problem particular to this implementation is that it's not lazy.

  3. Aymeric Augustin

    For what it's worth, when porting Django, I've used explicit imports for urllib:

    try:
        from urllib.parse import urlparse, urlunparse
    except ImportError:     # Python 2
        from urlparse import urlparse, urlunparse
    

    and it wasn't too bad.

  4. Marc Abramowitz author

    In 8d699c7, I got rid of the extra file (six_urlparse.py) and got it back to being all in a single file (six.py).

    I'm on the fence as to whether these functions should stay in six.moves.urllib_parse or if I should try to move them to six.moves.urllib.parse (dot instead of an underscore; a little more like the Python 3 structure so perhaps a tad nicer?). On the other hand, six already has a precedent for doing the underscore style as it has mapped Python 3's urllib.robotparser to six.moves.urllib_robotparser. So maybe it's fine as it is?

  5. Jason R. Coombs

    As I've started migrating my projects to unified code bases, I find I'm only recently using six now, but I've still been putting conditional imports for urllib modules I use. I'd very much like this to be a part of the package.

  6. Marc Abramowitz author

    In fed8375, I added the ability to use six.moves.urllib.parse. So one can do stuff like:

    from six.moves import urllib
    urllib.parse.parse_qs('a=1&b=2')
    

    Notably:

    import six.moves.urllib
    

    works, but this does not:

    >>> import six.moves.urllib.parse
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    ImportError: No module named parse
    

    It's there; it's just not importable:

    >>> six.moves.urllib.parse
    <module 'six.moves.urllib_parse' (built-in)>
    

    I might take a stab at fixing that in a later commit.

  7. Marc Abramowitz author

    In 05e8771, I made it so that:

    import six.moves.urllib.parse
    

    works.

    >>> import six.moves.urllib.parse
    >>> six.moves.urllib.parse
    <module 'six.moves.urllib_parse' (built-in)>
    >>> six.moves.urllib.parse.quote
    <function quote at 0x1005e2668>
    >>> six.moves.urllib.parse.urlsplit
    <function urlsplit at 0x1005d0668>
    >>> six.moves.urllib.parse.urlencode
    <function urlencode at 0x1005e2758>
    
  8. Marc Abramowitz author

    In 230265e, I created the six.moves.urllib.response namespace. It contains 4 items that are in urllib in Python 2 and in urllib.response in Python 3: addbase, addclosehook, addinfo, and addinfourl.

  9. Marc Abramowitz author

    In b9805f7, I create the six.moves.urllib.robotparser namespace for consistency with the other stuff I added. Now this works:

    >>> from six.moves.urllib.robotparser import RobotFileParser
    >>> RobotFileParser
    <class robotparser.RobotFileParser at 0x100565a78>
    
  10. Jason R. Coombs

    I tried using this today, but found (on Python 2), urllib.parse doesn't have ParseResult. Shouldn't it?

    Python 2.7.4 (default, Apr  6 2013, 19:55:15) [MSC v.1500 64 bit (AMD64)] on win32
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import six
    >>> six.moves.urllib.parse.ParseResult
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    AttributeError: 'Module_six_moves_urllib_parse' object has no attribute 'ParseResult'
    >>> ^Z
    
    C:\Users\jaraco\G\pan [default ~5 ?13 tip]> python
    Python 3.3.2 (v3.3.2:d047928ae3f6, May 16 2013, 00:06:53) [MSC v.1600 64 bit (AMD64)] on win32
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import six
    >>> six.__version__
    '1.4.0dev-20130711'
    >>> import six.moves.urllib.parse
    >>> six.moves.urllib.parse.ParseResult
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    AttributeError: 'module' object has no attribute 'parse'
    
  11. Marc Abramowitz author

    Thanks, Jason R. Coombs for trying it out and finding 2 problems!

    I fixed the first problem (lack of ParseResult in Python 2) with 0d35042b -- give that a spin.

    marca@marca-mac:~/dev/hg-repos/six_msabramo_urllib_parse$ python
    Python 2.7.3 (v2.7.3:70274d53c1dd, Apr  9 2012, 20:52:43)
    [GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import six
    >>> six.moves.urllib.parse.ParseResult
    <class 'urlparse.ParseResult'>
    

    The second problem I reproduced but I'm having a harder time wrapping my head around it -- feel free to make suggestions if you know how to fix it.

  12. Marc Abramowitz author

    Jason R. Coombs:

    For some reason, I can't manage to write a py.test test that reproduces the second problem. The py.test test that I wrote passes:

    def test_six_moves_urllib_dot_parse():
        import six.moves.urllib.parse
        assert six.moves.urllib.parse is not None
        assert hasattr(six.moves.urllib.parse, 'quote_plus')
    
    marca@marca-mac:~/dev/hg-repos/six_msabramo_urllib_parse$ .tox/py33/bin/py.test -v test_fail.py
    ============================================================================== test session starts ===============================================================================
    platform darwin -- Python 3.3.2 -- pytest-2.3.5 -- /Users/marca/dev/hg-repos/six_msabramo_urllib_parse/.tox/py33/bin/python3.3
    collected 1 items
    
    test_fail.py:1: test_six_moves_urllib_dot_parse PASSED
    
    ============================================================================ 1 passed in 0.01 seconds ============================================================================
    

    I don't know why. This does fail though:

    marca@marca-mac:~/dev/hg-repos/six_msabramo_urllib_parse$ cat fail.py
    import six.moves.urllib.parse
    six.moves.urllib.parse
    print(six.moves.urllib.parse)
    marca@marca-mac:~/dev/hg-repos/six_msabramo_urllib_parse$ .tox/py33/bin/python fail.py
    Traceback (most recent call last):
      File "fail.py", line 2, in <module>
        six.moves.urllib.parse
    AttributeError: 'module' object has no attribute 'parse'
    
  13. Marc Abramowitz author

    Jason R. Coombs

    OK, I fixed the second problem with cbae96ab

    marca@marca-mac:~/dev/hg-repos/six_msabramo_urllib_parse$ .tox/py33/bin/python
    Python 3.3.2 (v3.3.2:d047928ae3f6, May 13 2013, 13:52:24)
    [GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import six.moves.urllib.parse
    >>> six.moves.urllib.parse
    <module 'six.moves.urllib_parse'>
    >>> six.moves.urllib.parse.quote_plus
    <function quote_plus at 0x1006fc830>
    
  14. Jason R. Coombs

    We're using these changes now in one of our internal projects with great success. Good work Marc. I hope Ben will consider pulling this and the other pull requests upstream.

  15. Marc Abramowitz author

    No worries. That's the cost of progress. :-)

    Let me know if you're interested in the concept, and if so, I'll merge default into this and fix the conflicts...

  16. Jason R. Coombs

    Marc, you might find that the hard work of the merge (resolving the RST table layout) has already been done in my fork, so feel free to reference that if it's helpful.

  17. Benjamin Peterson repo owner

    I think this looks pretty good; I would like to see a new patch.

    Most of the added tests don't look very useful to me. I think something like the current test_move_items, which simply checks that things are importable, suffices.

  18. Marc Abramowitz author

    Benjamin Peterson: Re:

    Wouldn't it be simpler to just create a module? A subclass isn't needed, right?

    This might expose my ignorance of how six moves stuff around, MovedAttribute, module objects, etc., but I tried (maybe not what you meant) and failed:

    marca@marca-mac:~/dev/hg-repos/six_msabramo_urllib_parse$ hg diff
    diff --git a/six.py b/six.py
    --- a/six.py
    +++ b/six.py
    @@ -174,10 +174,6 @@
    
    
    
    -class Module_six_moves_urllib_parse(types.ModuleType):
    -    """Lazy loading of moved objects in six.moves.urllib_parse"""
    -
    -
     _urllib_parse_moved_attributes = [
         MovedAttribute("ParseResult", "urlparse", "urllib.parse"),
         MovedAttribute("parse_qs", "urlparse", "urllib.parse"),
    @@ -194,12 +190,13 @@
         MovedAttribute("unquote_plus", "urllib", "urllib.parse"),
         MovedAttribute("urlencode", "urllib", "urllib.parse"),
     ]
    +six_moves_urllib_parse_module = types.ModuleType("six.moves.urllib.parse")
     for attr in _urllib_parse_moved_attributes:
    -    setattr(Module_six_moves_urllib_parse, attr.name, attr)
    +    setattr(six_moves_urllib_parse_module, attr.name, attr)
     del attr
    
    -sys.modules[__name__ + ".moves.urllib_parse"] = Module_six_moves_urllib_parse("six.moves.urllib_parse")
    -sys.modules[__name__ + ".moves.urllib.parse"] = Module_six_moves_urllib_parse("six.moves.urllib.parse")
    +sys.modules[__name__ + ".moves.urllib_parse"] = six_moves_urllib_parse_module
    +sys.modules[__name__ + ".moves.urllib.parse"] = six_moves_urllib_parse_module
    
    
     class Module_six_moves_urllib_error(types.ModuleType):
    marca@marca-mac:~/dev/hg-repos/six_msabramo_urllib_parse$ python
    Python 2.7.3 (v2.7.3:70274d53c1dd, Apr  9 2012, 20:52:43)
    [GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import six
    >>> six.moves.urllib
    <module 'six.moves.urllib' (built-in)>
    >>> six.moves.urllib.parse
    <module 'six.moves.urllib.parse' (built-in)>
    >>> six.moves.urllib.parse.urlparse
    <six.MovedAttribute object at 0x1004ab390>
    >>> six.moves.urllib.parse.urlparse()
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    TypeError: 'MovedAttribute' object is not callable
    

    Sorry. I might've not actually done what you were trying to say though. If you clarify, I might be able to change it.

  19. Marc Abramowitz author

    I think I agree with Jason. I'd stay with the longer name to keep consistency with what's already there.

    I don't think the long name is much of a problem. Folks can just do:

    from six.moves.urllib.parse import urlparse
    

    And then the long name doesn't matter.