Note: this document describes our analysis of various persistence strategies. The DB solution was finally selected.
- Memory efficient: we don't want all the transcripts to be loaded in memory, but the codes of each project could always be in memory.
- Unobtrusive read/write: common operations such as adding a code or changing the description of a code should not slow down the user (e.g., because a big 1 MB file needs to be written).
- Ideally, it should be *easy* to merge two dataset (e.g., two set of codes) either in a decentralized mode or in a centralized mode.
Solution 1 - XML Files
- Relatively easy to merge.
- Easy to manipulate outside frameworks by humans (good for error recovery and update). Even a non-programmer could edit the files in the worst case.
- In a centralized mode, transactional support (e.g., locking) needs to be hand coded.
- In decentralized mode, merging needs to be supervised (to make sure that only valid XML is produced from a merge).
- References resolution needs to be hand coded to avoid putting everything in a large XML file.
- As soon as a collection is kept in a file, a change in one element results in the rewriting of a big file (slow). The alternative is to have one file per element, which can lead to the creation of many small files (bad).
- Search queries need to be hand-coded.
Solution 2 - DB
- In a centralized mode, transaction support/concurrency control is provided by the database.
- Update of an element has a small impact (row update v.s. file update)
- Persists graphs of objects easily.
- The amount of used memory is easily configurable.
- Search operations given for free!
- Bigger performance overhead than XML upfront.
- Can be difficult to update the schema (Hibernate can help, but could require custom SQL scripts... Bad bad bad).
- Merging in decentralized mode needs to be hand coded.