[PATCH] Support for (clean) multi-dimensional PGArray + speed improvement to PGArrays

Issue #1591 resolved
Former user created an issue

(original reporter: ged) Here is a patch which significantly speed up processing of PGArrays and also "fixes" the support for multi-dimensional PGArrays.

In short, this patch introduces support for types like PGArray(PGArray(Integer)). However, multidimensional support might (I didn't actually test that case) have already worked without being explicitly declared (ie using a PGArray(Integer) as a multi-dimensional array) because Postgres does not enforce the number of dimension declared. IMO, this shouldn't be relied upon because it is probably a temporary situation.

I didn't commit it directly because I fear the current way of doing things might have been intentional, given the current recursive code.

Also, the attached patch tries to add some test for multi-dimensional arrays to the test suite but was not tested within the test suite because I couldn't run it (it seemed to be broken as of yesterday).

Of course, I did test basic roundtrip, as in the attached testcase.

Comments (9)

  1. Mike Bayer repo owner
    • changed milestone to 0.6.0

    based on visuals it seems like both approaches would handle multidimensional arrays; I don't really deal with the ARRAY code so I don't know much about it. If your patch maintains all current unit tests then its good to go, feel free to commit. Make sure you refer to it as "ARRAY" in changelogs and docs since thats its new name.

  2. Former user Account Deleted

    (original author: ged) Well no. The current way does not handle them if they are defined like: PGArray(PGArray(Integer))

    This is because Array processor assumes list values, and the outer Array "convert_item" will call the inner Array processor on the "most inner" elements (which are not lists).

    And I can't guarantee that my patch doesn't break anything since I couldn't run the test suite.

  3. Mike Bayer repo owner

    Replying to ged:

    Well no. The current way does not handle them if they are defined like: PGArray(PGArray(Integer))

    fine, so this patch is an improvement, right ?

    This is because Array processor assumes list values, and the outer Array "convert_item" will call the inner Array processor on the "most inner" elements (which are not lists).

    ok

    And I can't guarantee that my patch doesn't break anything since I couldn't run the test suite.

    Why would that be ? I know you can run PG, and I know the current test suite works on the buildbot, what's missing ?

  4. Mike Bayer repo owner

    moving this out to 0.6.xx since its not critical for 0.6.0 release, unless you want to get around to fixing these test failures:

    ======================================================================
    ERROR: test.dialect.test_postgresql.ArrayTest.test_array_subtype_resultprocessor
    ----------------------------------------------------------------------
    Traceback (most recent call last):
      File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/nose-0.11.1-py2.6.egg/nose/case.py", line 183, in runTest
        self.test(*self.arg)
      File "/Users/classic/dev/sqlalchemy/lib/sqlalchemy/test/testing.py", line 116, in maybe
        return fn(*args, **kw)
      File "/Users/classic/dev/sqlalchemy/lib/sqlalchemy/test/testing.py", line 116, in maybe
        return fn(*args, **kw)
      File "/Users/classic/dev/sqlalchemy/test/dialect/test_postgresql.py", line 1220, in test_array_subtype_resultprocessor
        eq_(results[1](1)['strarr']('strarr'), [[u'm\xe4\xe4']([u'm\xe4\xe4'), [u'm\xf6\xf6'](u'm\xf6\xf6')])
      File "/Users/classic/dev/sqlalchemy/lib/sqlalchemy/engine/base.py", line 1561, in __getitem__
        return self.__colfuncs[key](key)[0](0)(self.__row)
      File "/Users/classic/dev/sqlalchemy/lib/sqlalchemy/engine/base.py", line 1791, in getcol
        return processor(row[index](index))
      File "/Users/classic/dev/sqlalchemy/lib/sqlalchemy/dialects/postgresql/base.py", line 183, in process
        return [for child in value](item_proc(child))
      File "/Users/classic/dev/sqlalchemy/lib/sqlalchemy/types.py", line 519, in process
        return value.decode(dialect.encoding)
    AttributeError: 'list' object has no attribute 'decode'
    
    ======================================================================
    FAIL: test.dialect.test_postgresql.ArrayTest.test_reflect_array_column
    ----------------------------------------------------------------------
    Traceback (most recent call last):
      File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/nose-0.11.1-py2.6.egg/nose/case.py", line 183, in runTest
        self.test(*self.arg)
      File "/Users/classic/dev/sqlalchemy/test/dialect/test_postgresql.py", line 1182, in test_reflect_array_column
        assert isinstance(tbl.c.multiarr.type.item_type, postgresql.PGArray)
    AssertionError
    
    ----------------------------------------------------------------------
    Ran 60 tests in 2.568s
    
    FAILED (errors=1, failures=1)
    
  5. Former user Account Deleted

    (original author: ged) Replying to zzzeek:

    moving this out to 0.6.xx since its not critical for 0.6.0 release, unless you want to get around to fixing these test failures:

    ERROR: test.dialect.test_postgresql.ArrayTest.test_array_subtype_resultprocessor

    Seems like I didn't look at those tests properly. Without my patch, SA does support multi-dimensional arrays, though it relies on a detail of the current PostgreSQL implementation. I'll fix the test (this is trivial) if you accept the syntax change.

    Quote from: http://www.postgresql.org/docs/8.4/interactive/arrays.html

    However, the current implementation ignores any supplied array size limits, 
    i.e., the behavior is the same as for arrays of unspecified length.
    
    The current implementation does not enforce the declared number of dimensions either. 
    Arrays of a particular element type are all considered to be of the same type, 
    regardless of size or number of dimensions. So, declaring the array size or number 
    of dimensions in CREATE TABLE is simply documentation; it does not affect run-time 
    behavior.
    

    FAIL: test.dialect.test_postgresql.ArrayTest.test_reflect_array_column

    This one is just because I didn't update the reflection code (yet) but did add a test for mutli-dimensional reflection code with the new syntax, hoping it would work and it seems like it does not. I'll fix that if you accept to stop supporting the old "abusive" syntax.

  6. Former user Account Deleted

    (original author: ged) I'm abandoning this ticket because I just realized two things: * the Array feature of PostgreSQL is not new at all (I assumed it was new-ish since I had never heard of it and because of the "current implementation" comment). * PG somehow allows rows to use different numbers of dimension on a per-row basis, while I thought it had to be the same number for all rows, even if not the same as the one declared.

    While I thought it was alright to force people to use the more correct form to do the same thing (hence "document" their database better), I am not so sure about not allowing them to use a feature that they might find useful even if it is not very clean. That's their choice after all. The sad thing is that the simplest (and probably most common) case (1 dimensional arrays) suffers a lot from this. But I won't go any further into this (providing an option for "variable dimension arrays" would be the solution I suppose) because I've already invested too much time into this issue that I don't care about since I don't use it.

    FWIW, in case someone wants to implement this change after all, pg_catalog.format_type (as used in the get_columns function of the reflection code) does not format the type properly: it only appends one pair of brackets even if the array is multi-dimensional; however the correct number of dimension is available as "pg_attribute.attndims" (available since PG 7.2).

    I've also committed a simple (and much less rewarding) optimization in 6f96478e76ad8d2bef0466f31873c107dc7c4fb8.

  7. Log in to comment