autogen from persisted metadata

Issue #117 wontfix
Michael Bayer
repo owner created an issue

see if we can add a feature whereby you push out the current metadata to a file, like a pickle of the metadata or JSON (or whatever). then autogenerate wouldn't need to use reflection and we could get very accurate upgrades/downgrades. this would solve all of the issues we have with server defaults, types, etc.

so existing migration:

alembic persist

takes the current MetaData and dumps to a file.

then in autogenerate, instead of reflecting, we just look for that file and pull from there.

perhaps the persisted file can itself be constructed of individual migrations. this is of course not too different from just the other way this could go, by using the migration files and op.* directives themselves to determine current state.

Or the autogenerate diffs themselves.

but what happens if the user creates some migrations without using autogenerate? still would want to always write out the current metadata state to the file.

its almost like the persisted file is another database schema that we keep up to date exactly as we do the real database. basically something that can be "reflected" completely accurately.

it means the file would be written when a migration actually happens, rather than when revision --autogenerate is called.

more thought to come....

Comments (6)

  1. Michael Bayer reporter
    1. autogenerate becomes a package

    2. a new API will be here that encompasses open ended comparison of items, like

    def compare(x, y, reflected, default_compare):
    def render(x, reflected, default_render):
    @compares(ServerDefault, 'postgresql')
    def compare(x, y, reflected, default_compare):
    @renders(ServerDefault, 'postgresql')
    def render(x, reflected, default_render):
    def serialize(x, reflected, default_serialize):
    def deserialize(x, reflected, default_deserialize):

    so folks can implement any of the above for any existing or new kind of SchemaItem or TypeEngine.

    "reflected" indicates if the item was a result of reflection or not, as there will be a lot of variability based on that.

    the default_XYZ functions are always there for fallback within custom functions. If there is no fallback, its a placeholder function that raises NotImplementedError.

    This also implies the decorators here need to keep a list of functions to produce the "fallback chain".

    1. we build a system of serialization/deserialization of whole schemas. It will use json.

    2. autogenerate will work like this:

    a. we check for a file in the versions directory, "serialized.json" or something like that.

    b. this file, if present, has a set of serialized metadatas, keyed to the version id (same version id we store in the DB).

    c. --autogenerate, when it writes the candidate migrations, will also always write out the current MetaData structure to this file at the same time, keyed to the new version id it just created. There will of course be some option to disable the whole thing but I think by default, it'll be on. I want the average user to just get the benefit of this without worrying about changing configurations when they upgrade.

    d. the file will have at most two versions, the most recent one, and the one before that. The idea here is that we will be storing the new version of the metadata here when someone runs --autogenerate and creates a new migration file. But I'd like them to be able to delete that new migration file and go back to the current one, without the file getting in the way. --autogenerate will do the right thing when run a second time like this, will prune out the non-existent version and re-persist.

    e. when autogenerate persists the model in the file, it needs to merge with an existing structure that might be there already for a particular version. This is to support the use case I have here where I run context.run_migrations() multiple times with multiple metadata objects. The merge here is at the level of schemaname/tablename. If the merge is overwriting the same name, it should spit out a warning, which would encourage the user to use "names" in their context.configure() to separate the metadatas (see next item)

    f. context.configure() will of course support a "name" of some kind, so that you can assign names to individual MetaData objects persisted in the file as well. The name will be "default" if not specified.

    g. there will of course be a command like alembic --regenerate that just removes the current version from the file and rewrites it from current metadata.

    h. when the file isn't there: put a message, hey the serialized file isn't there. Say --autogenerate --reflect to go out to the relational DB and do a reflection, or say --regenerate to create a new file for the current metadata. the --reflect option means we do exactly what we do now.

    i. those folks who don't want this feature at all just put "reflect=True" in their alembic.ini, which is the same as saying --reflect. the file is still written though unless they also put "write_serialized=False" in their alembic.ini.

    j. what to do with the existing options: compare_type, compare_server_default, include_symbol (which will be superseded by include_object), include_schemas, render_item. autogenerate probably consults these before going to the decorated functions.

    k. make sure the file has a magic number! exception like "your migration datafile is in an old format (format 1, we're now at format 2). Please specify --regenerate to produce a new file from your current metadata."

  2. weikai

    IMHO, because most users of sqlalchemy will keep the table definition files under control of a version control system, we can simply compare the metadata generated from the previous version and current version of the table definition files for the purpose of generate diff in autogeneration.

    Of course, it will only work if the user's database deployment/migration workflow can make sure at any time the schema in the database exactly reflects a certain version of the table definition files in the version control system, which I think is a quite reasonable assumption.

    It can help avoid all the problems associated with the current inspection-based approach.

  3. Michael Bayer reporter

    while the reflection-based comparison has its issues, it really is a very fundamental assumption these days particularly in the openstack world where autogen features are used in unit test suites to ensure the migrated schema matches the model. I don't have plans right now to pursue the datafile-based approach, it would be an enormous undertaking for a system that people seem to be mostly OK with as is.

  4. Log in to comment