Py3k support for the C extension

Issue #2161 resolved
Former user created an issue

(original reporter: ged) Here is a patch implementing the C extension with Py3k support.

It seems to compile & run with both py2 & py3k, but it could use more testing...

Also, I'm not sure it's the most readable this way, maybe it would be more readable with larger #if blocks...

Comments (26)

  1. Mike Bayer repo owner
    • changed milestone to 0.7.0

    hm i see that. what's your take on its stability for python 2 ? the C exts are on by default now in 0.7...we're due for 0.7.0 pretty much.

  2. Former user Account Deleted

    (original author: ged) There are only two changes which impact py2:

    • in each type (rowproxy, decimalproc & unicodeproc), the use of PyVarObject_HEAD_INIT(NULL, 0)

    instead of

    PyObject_HEAD_INIT(NULL) 0, / ob_size /

    I would really like input from more experienced people on this before that goes into a release, because it does not seem logical/correct. It's contrary to the API documentation which states that PyVarObject_HEAD_INIT should only be used with PyVarObject, which is not the case for our object. However, the official C extension tutorial does it this way too, and I couldn't get the code to compile on py3k without that oddity, and the code seems to compile & run fine with it on both py2.6+ & py3 so I assume it is correct nevertheless.

    Also, I didn't (and can't easily) check if it compiles (and preferably runs) with Python < 2.6.

    • storing number of parsed arguments in a variable in the date* processors.

    The first version of the patch contained a little mistake there (int vs unsigned int). After that correction (patch v2), the change should be harmless.

    In conclusion, I wouldn't commit that patch to the main branch "to be released" as is without some checks/review by other people but such checks shouldn't take long so it's really up to you.

  3. Former user Account Deleted

    (original author: ged) Another possible resolution is of course to add one more #if block to keep the old code intact for py2, and use PyVarObject_HEAD_INIT only on py3.

  4. Mike Bayer repo owner

    OK I'm asking for some other help with this, so maybe we'll keep this on the 0.7.xx milestone until we're sure its right.

  5. Former user Account Deleted

    (original author: taavi) Regarding {{{PyObject_HEAD_INIT}}}, looking at Python-2.6.2 vs Python-3.2, both in {{{Include/object.h}}}, I can see why you'd make the change from {{{PyObject_HEAD_INIT(type), 0}}} to {{{PyVarObject_HEAD_INIT(type, 0)}}}. On Py2k they generate identical code to the compiler by definition:

    #define PyVarObject_HEAD_INIT(type, size)   \
        PyObject_HEAD_INIT(type) size,
    

    And that's NOT the case in Py3k (things get wrapped in an extra set of curlies):

    #define PyVarObject_HEAD_INIT(type, size)       \
        { PyObject_HEAD_INIT(type) size },
    

    If {{{PyVarObject_HEAD_INIT}}} is the right thing to use in Py3k, then we should also use it in Py2k.

  6. Former user Account Deleted
    • assigned issue to

    (original author: ged) reassigning, since I don't intent to do anything more on this patch...

  7. Mike Bayer repo owner

    this patch can go right in if people familiar with C-extensions can sign off on these changes, especially that there's no risk of breakage for existing 0.7 apps if this goes to 0.7.5.

  8. Mike Bayer repo owner

    ouch, mercurial branches....they're forever. I usually use a bitbucket branch then merge it in, and sometimes I even flatten it (if i knew how to use mercurial queues i might be better at that...or we'd just switch to git...)

  9. Philip Jenvey

    Sorry, I actually kind of agree but didn't realize that was your policy (some of the older branches didn't seem that old). I like mercurial queues but I figured I'd put this on the main repo so others could track it

  10. claudiofreire NA

    Check out #2720. Although re-implementing in Cython does reduce performance a tiny bit (10% or less in all checks I performed for cprocessors), it gives free Py3k compatibility.

    Is it worth paying that performance price (I'd think so)?

  11. Mike Bayer repo owner

    Reimplementing in Cython is "fine" as a long term goal, however to get our existing C extensions to work in py3k should be a far smaller task. Not just in terms of coding. More importantly, in terms of tests and proven stability; the C extensions we have right now are very widely used and tested in thousands if not tens of thousands of installations. If we rewrite C exts from scratch using a new system, we throw that maturity out the window and start all over again.

    Adding C extensions for small functions here and there isn't really going to get us much in terms of performance, and I'm more or less -1 on the #2720 improvement as far as attribute access, the ratio of code verbosity/complexity to performance gain is too high in that case. In order to break out of what are very diminishing returns right now, we'd really need to reconstruct the entire core and ORM using C throughout. This would definitely be more of a Cython project as it would be unworkable otherwise, but that's a large and distant undertaking. Right now, Pypy continues to be developed with funding, suggesting the irony that the longer we do nothing, the more progress is actually made towards us getting all the speed improvements via pypy for free.

    As far as this issue, while it's a small task, it's been sitting here for literally two years, I'll likely be digging in to learn the API differences and try to get this in for 0.9.

  12. claudiofreire NA

    The biggest API difference is that everything is unicode now.

    So, while in py2 parsing a datetime is easily done with sscanf, in py3 you must first convert to ASCII, and catch errors in the process. There's a LOT of boilerplate to do that, and Cython has it all pretty well cooked, hiding all that boilerplate that is indeed quite different between py2 and py3.

  13. Mike Bayer repo owner

    I am 100% with you that cython is better but the existing C extensions should be ported in place in any case. I think the only sscanf stuff we have is with the date parsers, and the code for this is already there in the attached patch.

    The code here is literally 95% done it just needs to be checked for correctness and tested.

  14. claudiofreire NA

    Replying to taavi:

    Regarding {{{PyObject_HEAD_INIT}}}, looking at Python-2.6.2 vs Python-3.2, both in {{{Include/object.h}}}, I can see why you'd make the change from {{{PyObject_HEAD_INIT(type), 0}}} to {{{PyVarObject_HEAD_INIT(type, 0)}}}. On Py2k they generate identical code to the compiler by definition: {{{ #define PyVarObject_HEAD_INIT(type, size) \ PyObject_HEAD_INIT(type) size, }}}

    And that's NOT the case in Py3k (things get wrapped in an extra set of curlies): {{{ #define PyVarObject_HEAD_INIT(type, size) \ { PyObject_HEAD_INIT(type) size }, }}}

    If {{{PyVarObject_HEAD_INIT}}} is the right thing to use in Py3k, then we should also use it in Py2k.

    So, this is py3:

    #ifdef Py_LIMITED_API
    typedef struct _typeobject PyTypeObject; /* opaque */
    #else
    typedef struct _typeobject {
        PyObject_VAR_HEAD
    

    And this is py2

    typedef struct _typeobject {
        PyObject_VAR_HEAD
    

    In both cases, it's variable size. This is done for base classes, which are stored in tuple-fashion, after the type object's struct.

    So, the correct thing is indeed to use VAR.

  15. Log in to comment