Clone wiki

main / Persistence


Note: this document describes our analysis of various persistence strategies. The DB solution was finally selected.


  • Memory efficient: we don't want all the transcripts to be loaded in memory, but the codes of each project could always be in memory.
  • Unobtrusive read/write: common operations such as adding a code or changing the description of a code should not slow down the user (e.g., because a big 1 MB file needs to be written).
  • Ideally, it should be *easy* to merge two dataset (e.g., two set of codes) either in a decentralized mode or in a centralized mode.

Solution 1 - XML Files


  • Relatively easy to merge.
  • Easy to manipulate outside frameworks by humans (good for error recovery and update). Even a non-programmer could edit the files in the worst case.


  • In a centralized mode, transactional support (e.g., locking) needs to be hand coded.
  • In decentralized mode, merging needs to be supervised (to make sure that only valid XML is produced from a merge).
  • References resolution needs to be hand coded to avoid putting everything in a large XML file.
  • As soon as a collection is kept in a file, a change in one element results in the rewriting of a big file (slow). The alternative is to have one file per element, which can lead to the creation of many small files (bad).
  • Search queries need to be hand-coded.

Solution 2 - DB


  • In a centralized mode, transaction support/concurrency control is provided by the database.
  • Update of an element has a small impact (row update v.s. file update)
  • Persists graphs of objects easily.
  • The amount of used memory is easily configurable.
  • Search operations given for free!


  • Bigger performance overhead than XML upfront.
  • Can be difficult to update the schema (Hibernate can help, but could require custom SQL scripts... Bad bad bad).
  • Merging in decentralized mode needs to be hand coded.