1. Mike Bayer
  2. sqlalchemy

Pull requests

#6 Declined
Repository
sqlalchemy redshift patch
Branch
rel_0_8
Repository
sqlalchemy
Branch
rel_0_8

sqlalchemy to support postgresql 8.0.* (Redshift)

Author
  1. Fardin Sarker
Reviewers
Description

Fix to work with Redshift (currently Postgresql 8.0.3) Used to break like this: "ERROR: op ANY/ALL (array) requires array on right side" because indkey column of pg_index table/view is of type int2vector not ARRAY. So I converted it to an array when postgresql version is less than 8.1

  • Learn about pull requests

Comments (20)

  1. Mike Bayer repo owner

    so - can we greatly shorten this? only the way we get the array here seems to change, therefore:

    diff --git a/lib/sqlalchemy/dialects/postgresql/base.py b/lib/sqlalchemy/dialects/postgresql/base.py
    index e1dc4af..a1caa3c 100644
    --- a/lib/sqlalchemy/dialects/postgresql/base.py
    +++ b/lib/sqlalchemy/dialects/postgresql/base.py
    @@ -1834,25 +1834,33 @@ class PGDialect(default.DefaultDialect):
                                default=default, autoincrement=autoincrement)
             return column_info
    
    +    def _indkey_as_array(self, value):
    +        if self.server_version_info < (8, 1):
    +            return "string_to_array(textin(int2vectorout(%s)), ' ')" % value
    +        else:
    +            return value
    +
         @reflection.cache
         def get_pk_constraint(self, connection, table_name, schema=None, **kw):
             table_oid = self.get_table_oid(connection, table_name, schema,
                                            info_cache=kw.get('info_cache'))
    
    +
             if self.server_version_info < (8, 4):
                 # unnest() and generate_subscripts() both introduced in
                 # version 8.4
    +
                 PK_SQL = """
                     SELECT a.attname
                     FROM
                         pg_class t
                         join pg_index ix on t.oid = ix.indrelid
                         join pg_attribute a
    -                        on t.oid=a.attrelid and a.attnum=ANY(ix.indkey)
    +                        on t.oid=a.attrelid and a.attnum=ANY(%s)
                      WHERE
                       t.oid = :table_oid and ix.indisprimary = 't'
                     ORDER BY a.attnum
    -            """
    +            """ % (self._indkey_as_array("ix.indkey"), )
             else:
                 PK_SQL = """
                     SELECT a.attname
    @@ -1977,7 +1985,7 @@ class PGDialect(default.DefaultDialect):
                         join pg_class i on i.oid=ix.indexrelid
                         left outer join
                             pg_attribute a
    -                        on t.oid=a.attrelid and a.attnum=ANY(ix.indkey)
    +                        on t.oid=a.attrelid and a.attnum=ANY(%s)
               WHERE
                   t.relkind = 'r'
                   and t.oid = :table_oid
    @@ -1985,7 +1993,7 @@ class PGDialect(default.DefaultDialect):
               ORDER BY
                   t.relname,
                   i.relname
    -        """
    +        """ % (self._indkey_as_array("ix.indkey"), )
    
             t = sql.text(IDX_SQL, typemap={'attname': sqltypes.Unicode})
             c = connection.execute(t, table_oid=table_oid)
    
  2. Mike Bayer repo owner

    So I'm having trouble finding background on textin and int2vectorout. Even searching through old 8.0, 7.4 PG docs there are no results for these functions. are these redshift specific and if so can we detect redshift instead of 8.1 ?

  3. Adrian Schreyer

    Those functions are used internally and normally only used to convert between internal and external representations of data types, e.g. if data is copied from a file or vice versa. PostgreSQL requires these functions when a new data type is created (INPUT/OUTPUT). They should not be used in SQL queries.

  4. Mike Bayer repo owner

    So on PG9, I can run :

    select string_to_array(textin(int2vectorout(ix.indkey))) from pg_index ix
    

    but I can't run:

    select * from pg_attribute a, pg_index ix where a.attnum=ANY(string_to_array(textin(int2vectorout(ix.indkey))));
    

    it gets "No function matches the given name and argument types. You might need to add explicit type casts.".

    Adrian - are you suggesting these functions can't be used at all? what solution would you advise for amazon redshift?

  5. Adrian Schreyer

    I think you are missing parameters for string_to_array - you need to specify the delimiter at least by which the string should be split.

    select * from pg_attribute a, pg_index ix where a.attnum=ANY(string_to_array(textin(int2vectorout(ix.indkey)), ''));
    

    Does this work? I have never used Redshift I just saw the tweets and recognised the functions. Technically you can use these functions of course but they are normally used only internally, that is why there is no documentation.

  6. Mike Bayer repo owner

    oh right I forgot the other argument...still doesn't quite work. But not sure what redshift does. I just checked my AWS console and starting up a redshift to test seems like it'll be 85 cents an hour at least so I'm not in a hurry over there...

    test=> select * from pg_attribute a, pg_index ix where a.attnum=ANY(string_to_array(textin(int2vectorout(ix.indkey)), ''));
    ERROR:  operator does not exist: smallint = text
    LINE 1: ... * from pg_attribute a, pg_index ix where a.attnum=ANY(strin...
                                                                 ^
    HINT:  No operator matches the given name and argument type(s). You might need to add explicit type casts.
    
  7. Adrian Schreyer

    Is a.attnum an integer? Try

    select * 
      from pg_attribute a, pg_index ix 
     where a.attnum=ANY(string_to_array(textin(int2vectorout(ix.indkey)), ' ')::int[])
     limit 10;
    

    But this will only work if the string is split into something that can be converted to int.

  8. Adrian Schreyer

    More idiomatic would be this by the way: int2vector can be cast directly to int[]. Will be much faster as well.

     SELECT *
       FROM pg_attribute a, pg_index ix
      WHERE a.attnum = ANY(ix.indkey::int[])
      LIMIT 10
    
  9. Mike Bayer repo owner

    OK but my understanding is that amazon redshift does not support the array type at all. somehow it has int2vector though in it's support of the pg_index.indkey column.

  10. Fardin Sarker author

    Adrian Schreyer This wouldn't work, as indkey is still an int2vector and ANY expects an ARRAY.

    Redshift doesn't really support ARRAY operations but ANY and string_to array functions are just result of postgresql 8.0 being used underneath. I found the int2vectorout from the code, it's not really a postgres function that it documents for use. http://doxygen.postgresql.org/int_8c.html

    Now, I see that this solution is a hack. Ideally there should be a dialect for redshift and treat it separately. You never know the amount of work being done on Redshift it may become quite different from the original postgresql. Same thing happened to Greenplum.

  11. Adrian Schreyer

    I see - but string_to_array is supported on Redshift? Then this might work to make it a bit more elegant:

    select * 
      from pg_attribute a, pg_index ix 
     where a.attnum::text = ANY(string_to_array(ix.indkey::text, ' '))
     limit 10;
    

    Or this if there is a problem with :: casting on 8.0

    select * 
      from pg_attribute a, pg_index ix 
     where cast(a.attnum as text) = ANY(string_to_array(cast(ix.indkey as text), ' '))
     limit 10;
    
  12. Fardin Sarker author

    I have tried these options as well, none of these works.

    # select * 
      from pg_attribute a, pg_index ix 
     where a.attnum::text = ANY(string_to_array(ix.indkey::text, ' '))
     limit 10;
    ERROR:  cannot cast type int2vector to character varying
    
    # select * 
      from pg_attribute a, pg_index ix 
     where cast(a.attnum as text) = ANY(string_to_array(cast(ix.indkey as text), ' '))
     limit 10;
    ERROR:  cannot cast type int2vector to character varying
    
  13. Mike Bayer repo owner

    I'd like to accomplish redshift compatibility by simple backend detection, e.g. adding a flag "is_redshift" to the postgresql dialect based on server version info. When this flag is detected, the index reflection scheme will simply be skipped altogether.