DB

Instead of storing every revision of each Patient and Dataset branch in the tables, we only store the current revision of a branch. Old versions of Datasets will be stored in separate history tables. This seems to be a common strategy. This will allow us to only consider the present state of each branch in our DB constraints (unique columns, one-to-many relationships, nullable columns).

No history will be stored for Patients, at least initially. Since only administrators will be editing those tables, and further edits will be prohibited once a Patient is marked as published, it seems an acceptable limitation for history not to be stored.

Updates to a Dataset branch will happen in place, meaning that the current revision of a Dataset branch will be updated (rather than copied) and that the only possible new main table entries will be any new annotations which are part of the update. The history of a branch will be stored in the history tables and only there.

It is only legal to branch at the Patient and Dataset level. Branched-from Patients will be marked as deprecated. (We may need to support adding in new studies (including branching at the study level) to a published Patient in place which may in turn may make storing history for a Patient important.)

A branch will produce a deep copy of a Patient or Dataset and its owned (one-to-many) objects. Note that since, for example, the traces are owned by a study, copies of the trace objects will be made - this means that existing annotations will only point at the original study, not at the branch. References to shared objects, eg a Datasets's Images and TimeSeries', will only be copied by foreign key in many-to-many join tables. Copies will contain fields prev_id+prev_rev_no which will point at the original object + version. These will be foreign keys that point at the history tables for Datasets since they can be modified in place, including having its annotations deleted. For Patients we will not store history, they will be pointers at the main tables. That works since we will only allow branching for published Patients and once published, a Patient can't be modified or deleted. A branch must be given a unique label.

Advantages:

A more strict schema will (hopefully) be less prone to bugs which could be time consuming to solve since they will require fixing the database as well as code.

Hibernate Envers - which we will use to handle history - will handle the code for inserting, updating, and quering the history tables.

Ability to partition the database, moving old history records out of the way.

Faster searches since there will be less data in the non-history tables.

Code for modifying Datasets and Patients will be simpler.

Drawbacks:

multiple tables to maintain. At this point, that is not much of burden since it will involve only 4 tables history tables: dataset_AUD, time_series_annotation_AUD, data_set_time_series_AUD, and data_set_image_AUD. (dataset->time_series_annotation is the only parent-child relationship in dataset)

Writes will be more expensive since they will involve additional inserts and updates to the history tables.

For example, updating an annotation in the old scheme would require 3 insertions: a dataset, dataset_time_series_annotation, and time_series_annotation and 1 update (dataset).

Now it will require 3 insertions (dataset_AUD, data_set_time_series_annotation_AUD, time_series_annotation_AUD) and 5 updates (dataset, time_series_annotation, dataset_AUD, dataset_time_series_annotation_AUD, time_series_annotation_AUD).

DataSnapshotServer

If an old revision id is passed into any of the store methods, a checked exception will be thrown indicating that the client has stale data.

There will be an explicit branch method for Dataset's only. Patient branching will be dealt with in the data entry interface.

Wiki

ieeg / DBAndIDataSnapshotServer

DB

DataSnapshotServer