1. Michael Bayer
  2. alembic
  3. Issues

Issues

Issue #34 resolved

Make version table name configurable

Karol Kuczmarski
created an issue

It's currently alembic_version, but it would be nice if this was configurable - either in alembic.ini, or (more likely) in env.py.

Comments (22)

  1. dairiki

    I have a case where I have two separately managed schema living in a single database. (I.e. I have two disjoint subsets of tables in a single database. I would like each subset to be managed by its own alembic migration environment.)

    If the version table name were configurable, that would work.

    Even better, might be to have a single version table which is shared by both migration environments. Another column ('environment_id') would have to be added to the version table. (Alternatively, each environment could just be made to ignore any revisions in the version table which it doesn't recognize as one of its own — this has the advantage of not having to add a column to the version table, but seems frighteningly fragile.)

    Then again, adding the logic to deal with two-column version tables alongside legacy one-column version tables (as well as migrating between them) might be a pain...

    Anyhow, if you tell me whether or not you think this (shared version table) is a stupid idea, I'll see if I can work up a pull request.

  2. dairiki

    Having thought about this a bit more, here's a more detailed proposal.

    Add two new options (for EnvironmentContext.configure()):

    version_table

    The version table name. Default 'alembic_version'.

    environment_id

    Environment identifier, a string which must be unique among alembic environments sharing the same version table.

    version_table_schema

    One of:

    1 - Use the one-column schema for the version table. It is an error to specify version_table_schema=1 along with a non-None environment_id.

    2 - Use the two-column schema, allows for sharing the version table between alembic environments.

    None (the default) - Use schema version 1 if no environment_id is set, else use schema version 2.

    "auto" - Like None except when running in on-line mode, in which case

    • If the existing version table is of the one-column sort, and a non-None environment_id is specified, the version_table will be automatically migrated to the two-column schema
    • If the existing version table is of the two-column sort, and environment_id is None, the two-column schema will be used.

    In any case, when running in on-line mode, the schema of any existing version table is determined via reflection. If version_table_schema is not "auto" and error is raised unless the existing version table conforms to the configured schema. (When version_table_schema is "auto", Alembic will cope automatically with any schema mismatch, as described above.)

    Changes

    2012-04-04T01:00Z: Added version_table_schema configure parameter to minimize the amount of magic happens without it being explicitly enabled.

  3. Michael Bayer repo owner

    I think I like this. Why not have VersionTableV1Helper and VersionTableV2Helper as subclasses of VersionTableHelper (and maybe lose the name "Helper", let this just be, "the version table"), so that we don't need the conditionals ?

    One thing I'm possibly worried about with the single table with multiple versions is the ramifications of generating SQL scripts - when I experimented with SQL script generation I found it to be non-trivial whether or not the scripts should include the generation of the version table itself in the script. That is, when we're doing SQL script we don't really have the luxury of being able to reflect and check for it being there. Having distinct version tables per environment at least means you can gen a SQL script from zero and definitely include the generation of the version table.

    But you've got the name configurable so now it does both ways.

    Do we want two ways to do it ?

  4. dairiki

    I did start out with Table{V1,V2} subclasses. The problem with that was (in the case of version_table_schema='auto' and environment_id unset) the table version is not always known when the helper is constructed. This is because table reflection is deferred until needed (e.g. .get_current_revision() is called.) (I can't remember now whether there was really a good reason to defer the version table reflection. If not, well then...)

    (The helper could use another class, which would be instantiated when the table was reflected, to implement the version table operations. This would eliminate the conditionals, but at the time I didn't think it was worth it.)


    You're right. There is no way to tell when a shared version table needs to be created in off-line mode. Maybe "CREATE TABLE IF NOT EXISTS" could help a bit? (How widespread is support for IF NOT EXISTS?)


    Making the table name configurable is trivial compared to rest of this. I figure you might as well include it. It doesn't have to be explicitly mentioned (in the tutorials) as an alternate way. (It's a simple enough concept that basic API docs should be enough.)

  5. Michael Bayer repo owner

    yeah I'd push for Version.from_db()->VerisionTableV2 or something like that, those conditionals bug me :)

    but yeah what are we gaining here vs. multiple version tables with distinct names, vs. one table with "environment ids".

    I dont think the "IF NOT EXISTS" thing is very standard.

  6. dairiki

    I think the two-column table vs multiple table question is mainly one of aesthetics. A motivating use case is: suppose you want to change the scaffoldings so that alembic environments are “conflict safe” by default. (I think this is a good idea, given that it is tricky to change the "environment id" once any migrations have been run.) To minimize the amount of manual configuration involved, that would probably entail generating a random guid to use as the environment id. In the multiple version table scenario, that means the version table will be named something like 'alembic_version_38f9a2c' — kinda weird and ugly.

    OTOH, given the amount of code and verbiage needed to describe the configuration of the shared two-column version table whizziness, that approach is not the ultimate in elegance either.

    I'd be happy going either way — make the call. I'll be happy either to clean up the current patch, or come up with a new one supporting just the configurable version table name.

  7. Michael Bayer repo owner

    I like this a lot. though I wonder if we might take out the defensive things I see - that is, where we have the user configure things, then "test" those configs - why not trust them ? that is, why have `version_table_schema`, isnt `environment_id` None or not enough for that ? why autoload the existing version table, then check the columns, then "upgrade" the table, what's all that for ?

  8. dairiki

    Part of the motivation for `version_table_schema` is to provide a migration path in the case:

    • User (me) has an alembic environment set up which manages some tables in a database. This environment is already in use under the current version of alembic which does not support the multi-column version table. (So, effectively, this environment has `environment_id = None`.)
    • User (again, me) wants to create a second alembic environment to manage some other tables in the same database.

    The upgrade path in this case (if one is running on-line migrations) is to

    • In the old environment leave `environment_id` at the default `None`. Set `version_table_schema` to 'auto'. This will allow the old environment to work with either one- or two-column version tables.
    • In the new environment, set `environment_id` to some new unique id; set `version_table_schema` to 'auto'. This will allow the new environment to auto-upgrade the single-column version table to the two-column table it needs.

    (If one is restricted to off-line migrations, upgrading the version table will take manual SQL-foo, at present.)


    Another motivation for `version_table_schema` is to make it possible to disable any magic upgrading/adapting of the version table. (More specifically, to disable the magic unless explicitly enabled.) As long as `version_table_schema` is not set to 'auto', off-line and on-line migrations will (should, at least) behave identically (modulo the issue you pointed out having to do with knowing when the version table needs to be created in off-line mode.)


    You’re right that when `version_table_schema` is not 'auto', reflection of the version table is not really required. In that case it is used only for my paranoid sanity checks.

  9. Michael Bayer repo owner

    if I were upgrading from a one-table to a two-table system, I'd run a special command: "alembic multienv_init". this command generates a new multi-env table, selects the single identifier from the current single table, and moves it into the multi-env table.

    Implications:

    1. a particular database uses either the 1-env or 2-env table for versioning. not both.

    2. no reflection/guessing/possibly conflicting flags

    3. the user knows which scheme he or she wants to use, and when, runs the helper command to set things up explicitly.

    4. the two modes otherwise have no connection to each other.

    5. the story for the "online" mode and "offline" mode now becomes similar (run the same command in "offline" mode, will produce "INSERT ... SELECT" to transfer the version number, perhaps)

    should be a lot less complexity, and the "we're switching styles" use case is stowed away in a single explicitly called command.

  10. dairiki

    (Sorry for the delay in response.)

    I basically agree with the motives of simplicity and explicitness. But the version table schema upgrade command doesn't feel quite right to me.

    If an alembic environment can only deal with either a 1-col or a 2-col version table at a given time, there is a synchronization issue. Does one upgrade env.py then run multienv_init or vice versa? If there are only one or two deployments to worry about, it's not a big deal. If there are more (and if one wants to defer upgrade/migrations on some until later) it becomes a pita to keep track of. If env.py and the version table schema are out of sync, alembic commands which touch the version table will fail, I guess. Or, more worrisome, perhaps the the database will appear to be unversioned?

    It seems that you want to disallow `None` as a valid value for `environment_id` (so that `environment_id = None` implies 1-col version table.) If so, then one must specify a env_id when multienv_init is run, I guess as a command-line argument (then make sure env.py is altered to match.) Ugh.

    I would think some sanity checking against the possibility of running multienv_init twice would be called for.

    I guess to me it seems like with the manual upgrade command, most of the complexity (both in terms of lines-of-code, and configuration/operation confusion) which is saved in migration.py will just be moved into the multienv_init. Either way some sanity checks will be called for.


    Given that we still haven't come up with a particularly simple (and safe) way to support both 1- and 2- col version tables, I can still be easily convinced that the best plan is to scrap all the two-column stuff and just go with multiple single-column version tables. Though it feels somehow less elegant, it really is simple and straightforward.

  11. dairiki

    No problem. Sometimes one just has to jump in and see how it works out. I always learn something (though, unfortunately, I don't always remember the lesson the next time a similar idea rolls through.)

    (Let me know if you want me to create a bona fide pull request for the commit.)

  12. Log in to comment