Analysis of J2ME Polish Serialization and Persistence

Serialization

Cons

Suffers from a lot of the inefficient encodings that motivated us to create our new serialization framework. In particular:

Primitives like int, long always take 4 or 8 bytes, respectively. Our ints usually take 1 or 2 bytes.
All serialized types are tagged and have a 'null' flag, adding 2 bytes per field to be serialized. This can really add up. We only add these flags when directed to (if a value should never be null, don't serialize a 'null' flag).
Any Externalizable object being serialized as part of a parent object's serialization is prefaced with the fully qualified class name of the child object before it is serialized. This is a huge waste of space. Most of the time, the object is guaranteed to always be the same type, in which case we omit the tagging altogether. When a type tag is necessary, we reduce it to a 4-byte hash of the class name.

Pros

Does natively handle compound structures like Hashtables and Vectors, which was another big motivation for our new framework. But we handle them too, now, so this is no net gain.

Boilerplate serialization/deserialization code can be generated automatically by Polish during pre-processor. All the class needs to do is implement Serializable, which is an empty interface. It would be very cool if we could hook into Polish's 'auto-serialization' of such classes. It would auto-generate the serialization/deserialization code based on class contents, but we could hook it up to use our more efficient encodings. (Note: this would still not be as efficient as custom-serialization code, because the class definition itself cannot capture all the information that might result in a more efficient encoding (e.g., 'can this field be null?', 'will the contents of this Vector always be the same type?', etc.).

Persistence

Cons

Their scheme is not as 'database-y' as we would like. A lot of our use cases are 'store this set of forms' or 'store this set of patients' in which we have a homogenous collection of objects that we usually want to access one at a time. Their scheme is more for storing a random assortment of unrelated objects, each accessible with a unique string key. If we wanted to have a database of patients, we'd have to refer to them by PATIENT_1, PATIENT_2, PATIENT_3, commingled in the same global store with FORM_1, FORM_2...; it all seems very ugly.

They also don't to anything fancy with spillover. It's all or nothing: either all records are stored in a single RMS, or ''every'' record is stored in its own RMS.

It also doesn't seem like they try to make their storage system particularly resilient, or try to shield the user from many of the low-level RMS exceptions. Everything is just wrapped up in an IOException and left for the programmer to sort out.

Pros

None that I can see.

Wiki

javarosa / PolishStorageAnalysis

Analysis of J2ME Polish Serialization and Persistence

Serialization

Cons

Pros

Persistence

Cons

Pros