Wiki

Clone wiki

javarosa / Serialization

Externalizable Framework

Background

The original serialization framework had a fatal flaw that seriously impacted its usability: you had to instantiate an object before you could deserialize it. This made it very hard to handle:

  • object references that might be null
  • base types like Integer, String, Boolean, etc., that don't implement Externalizable in J2ME
  • compound structures like Vector and Hashtable that also don't implement Externalizable
  • abstract classes or interfaces where the actual object may be one of any number of child classes -- you don't know which until you start deserializing

A number of 'helpers' were created to assist with all these different situations, but the helpers had inconsistent APIs and their scope was limited; a slightly different need (say serializing a Vector of Boolean instead of a Vector of Integer) would require a whole new helper. Having to write a whole new helper to do every new thing wasn't very much help. The lack of a centralized framework for helpers also made it hard to consistently apply efficient encodings (e.g., not using four bytes to write an int when one will do) across the board.

Another serious problem with this scheme was disambiguating multiple child classes when deserializing an abstract class. This was implemented in an ad hoc manner by each class that needed it, not only unduly complicating the class's serialization code, but also introducing serious management and coupling problems in trying to keep track of the many potential resultant classes. The encoding of these abstract classes was also wasteful.

The New Framework

The new framework builds on the existing Externalizable interface (with some slight modifications discussed later). The main change is in the refactoring of the ExternalizableHelper utility class. Don't use ExternalizableHelper anymore; all the utility functions for the new framework are in ExtUtil.

The ideal is that all serialization/deserialization, including all the tricky cases mentioned above, can be done through the same two functions:

  • void write (DataOutputStream out, Object o)
  • Object read (DataInputStream in, Class type)

To write (serialize) an object, just pass it to the function and it knows what to do. To read (deserialize), just say what kind of object you want, and it knows how to get it for you. This works great for all Externalizables, and for base types like Integer and String.

Usage Examples:

ExtUtil.write(out, new Integer(5));
Integer x = (Integer)ExtUtil.read(in, Integer.class);
ExtUtil.write(out, "a string");
String x = (String)ExtUtil.read(in, String.class);
ExtUtil.write(out, new CustomExternalizableType());
CustomExternalizableType x = (CustomExternalizableType)ExtUtil.read(in, CustomExternalizableType.class);

'''You can use this basic syntax with''' any Externalizable, along with the following base types: String, Integer, Long, Byte, Short, Float, Double, Character, Boolean, and Date. For some primitive types, read and write may use encodings that are more efficient than the naive encodings available with DataOutputStream. For example, integers (including Integer, Long, Short, etc.) will use the minimum number of bytes necessary to encode the value instead of always using 4, 8, or 2 bytes, respectively. Therefore, you should always use the helper functions in ExtUtil rather than calling the methods (such as out.writeInt()) on your data stream directly.

Primitive types also have direct helper functions for when you don't want to box your value in an object. For example:

ExtUtil.writeNumeric(out, 17);
long x = ExtUtil.readNumeric(in);

Wrappers

For more complex scenarios, however, we need more information about the object to properly serialize and deserialize it. We understand there is the concept of an object's 'type', which goes beyond simply what its class is. For simple situations, like in the previous section, a class is enough to fully describe the object's 'type', but oftentimes we need more.

So let's define the 'type', for serialization purposes, as an object's class plus annotations about the data it contains and how we want it serialized. This new kind of 'type' is represented as an ExternalizableWrapper. The ExternalizableWrapper wraps the original object and contains extra information about its type. Or it can stand alone, much like a Class, and contain only the type annotations with no actual value inside. ExternalizableWrappers can be nested.

Here are the kinds of wrappers:

'''Base wrapper''' (ExtWrapBase)

We need to represent the most basic situation when a simple Class is enough. Since Class and ExternalizableWrapper don't share the same type hierarchy, we need a wrapper that represents a Class. You will never have to use this wrapper; it is used internally within the framework only.

'''Nullable wrapper''' (ExtWrapNullable)

This wrapper indicates that the wrapped object may be null.

'''Compound wrappers''' (ExtWrapList, ExtWrapListPoly, ExtWrapMap, and ExtWrapMapPoly)

These wrappers represent compound datastructures. They contain information about the type(s) of the elements within. Note that since wrappers can be nested, this allows us to represent Vectors of Hashtabless of Vectors of Hashtables, arbitrarily deep.

'''Tagging wrapper''' (ExtWrapTagged)

This wrapper indicates that, in addition to serializing the object itself, we should write a full description of the object's type as well. Meaning: the stream contains enough context that we can fully deserialize the object without any foreknowledge about what kind of object it is. The wrapper is useful when an object may be any one of several sub-classes of an abstract parent. And since wrappers can be nested, the 'tag' can encompass any possible compound datastructure.

'''Encoding wrappers''' (ExtWrapIntEncodingUniform, ExtWrapIntEncodingSmall, more to come for sure...)

These wrappers specify alternate encodings of the same core type. For example, integers can be encoded in any number of schemes from 'naive' (always dump 4 bytes), to a number of more efficient schemes with various weaknesses and strengths.

ExternalizableWrapper Details

We'll now go through all the defined wrappers with details how to use them. The Externalizable unit tests have many more usage examples.

As explained before, wrappers are used in two different ways:

  • When they contain the core data being wrapped. We use them this way during serialization. This is called '''value mode'''
  • When they don't contain any core data and are just an abstract representation of a 'type'. This is called '''type mode'''. We use them this way during deserialization ("here's the kind of object I want"), as well as for supplementary type annotations during serialization ("the wrapped Vector's elements are Vectors of String"-- we have an ExtWrapList in value mode for the top-level Vector, but also a type-mode ExtWrapList to represent the 'Vector of String' sub-type).

In the usage definitions below: <object> will mean any allowed base type (Externalizable, String, etc.), or any value-mode ExternalizableWrapperBR <type> will mean any allowed base Class (Externalizable.class, String.class, etc.), or any type-mode ExternalizableWrapper

ExtWrapNullable

Usage

Use this wrapper when trying to represent an object that may be null.

'''Value mode''': new ExtWrapNullable(<object>), where <object> may be nullBR '''Type mode''': new ExtWrapNullable(<type>)

Usage Details

ExtUtil.read() and ExtUtil.write() do not inherently support nulls:

String x = null;
ExtUtil.write(out, x);

will throw an exception.

Instead, do:

String x = null;
ExtUtil.write(out, new ExtWrapNullable(x));
x = (String)ExtUtil.read(in, new ExtWrapNullable(String.class));

Note how ExtWrapNullable(String.class) is an annotated extension of the base type String.

Serialization Details

This wrapper prefixes the serialized value with 0x01 when not null, or writes only 0x00 when null.

ExtWrapList

Usage

Use this wrapper for a Vector of objects. The objects must all be of the '''exact same type'''. Not even children of the same parent class, but exactly the same.

'''Value mode''': new ExtWrapList(Vector, <type>), where <type> is the type of the child elements; <type> may be omitted when it is simply a ClassBR '''Type mode''': new ExtWrapList(<type>), where <type> is the type of the child elements; <type> cannot be omitted

Usage Details

Vector v = new Vector();
v.add("string 1");
v.add("string 2");
v.add("string 3");
ExtUtil.write(out, new ExtWrapList(v));
v = (Vector)ExtUtil.read(in, new ExtWrapList(String.class));

Note that when writing, you don't need to specify type String; the serializer can figure it out from the Vector's contents.

We can serialize Vectors of Vectors (or any other complex type). In this context, 'exact same type' means all the extended details of the type must be the same as well. For instance, we can serialize a Vector when all its elements are Vectors of Strings, but not when some elements are Vectors of Strings and others are Vectors of Integers.

Vector vv = new Vector();
vv.add(v);
vv.add(v);
ExtUtil.write(out, new ExtWrapList(vv, new ExtWrapList(String.class)));
vv = (Vector)ExtUtil.read(in, new ExtWrapList(new ExtWrapList(String.class)));

Here we see how the supplementary parameter is the type of its elements. Whereas before, during serialization, read could figure out the type, now we need to supply it explicitly.

We can also represent a Vector that may be null by combining with an ExtWrapNullable wrapper as such:

Vector x = null;
ExtWrapList wl = (x == null ? x : new ExtWrapList(x));
ExtUtil.write(out, new ExtWrapNullable(wl));
x = (Vector)ExtUtil.read(in, new ExtWrapNullable(new ExtWrapList(String.class)));

Not the cleanest, but oh well.

ExtWrapMap

Usage

This wrapper is similar to ExtWrapList, but it is used for Hashtables of objects (including OrderedHashtable). All the keys must have the same exact type, and all the elements must have the exact same type (the key type can be different than the element type).

'''Value mode''': new ExtWrapMap(Hashtable, key <type>, element <type>), where <type> is the type of the keys and elements, respectively; also:BR new ExtWrapMap(Hashtable): both <type>s may be omitted when they are both Classes, andBR new ExtWrapMap(Hashtable, element <type>): the key <type> may be omitted if it is a Class but the element type is notBR '''Type mode''': new ExtWrapMap(key <type>, element <type>, boolean ordered), where <type> is the type of the keys and elements, respectively; <type> cannot be omitted. ordered, if true, will yield an OrderedHashtable; ordered may be omitted and defaults to false.

Usage Details

Usage is very similar to ExtWrapList.

Hashtable h = new Hashtable();
h.put("a", new Integer(1));
h.put("b", new Integer(2));
h.put("c", new Integer(3));
ExtUtil.write(out, new ExtWrapMap(h));
h = (Hashtable)ExtUtil.read(in, new ExtWrapMap(String.class, Integer.class));
OrderedHashtable hh = new OrderedHashtable();
hh.put(Boolean.TRUE, h);
hh.put(Boolean.FALSE, h);
ExtUtil.write(out, new ExtWrapMap(hh, new ExtWrapMap(String.class, Integer.class)));
hh = (OrderedHashtable)ExtUtil.read(in, new ExtWrapMap(Boolean.class, new ExtWrapMap(String.class, Integer.class), true));

You can serialize a Hashtable and deserialize it as an OrderedHashtable, no problem (and vice versa).

ExtWrapTagged

Usage

This wrapper writes out any object, but tags it with enough information to completely know the object's type. Its use is in serializing polymorphic types, where the actual object may be any of several classes. It can also be applied to any compound type, such as ExtWrapNullables, ExtWrapLists, etc.

'''Value mode''': new ExtWrapTagged(<object>)BR '''Type mode''': new ExtWrapTagged(); there are no arguments because all the necessary information is already in the stream

Usage Details

Usage is trivial.

ExtUtil.write(out, new ExtWrapTagged("string"));
String x = (String)ExtUtil.read(in, new ExtWrapTagged());
ExtUtil.write(out, new ExtWrapTagged(new CustomExternalizableType()));
CustomExternalizableType x = (CustomExternalizableType)ExtUtil.read(in, new ExtWrapTagged());
ExtUtil.write(out, new ExtWrapTagged(new ExtWrapMap(hh, new ExtWrapMap(String.class, Integer.class))));
OrderedHashtable x = (OrderedHashtable)ExtUtil.read(in, new ExtWrapTagged());

One complication is that you can't mix and match Hashtables and OrderedHashtables (serialize as one, deserialize as the other) when tagging; you will get a ClassCastException.

Serialization Details

Tagging is done by serializing the object as normal but prefixing it with a 'tag' that identifies its type.

For base classes (Externalizable, String, etc.), this tag is a 4-byte hash code of the fully-qualified class name. For example, String (java.lang.String) will tag as 0x42c25be3, Integer as 0x7ca16fdb, and org.javarosa.core.model.QuestionDef as 0x27512ec9. This scheme is ''far'' more efficient than writing out the full class name like we were before. There are two problems, though:

1) How do we get from a tag back to a Class? We can't, because hashes are one-way. Therefore, as we're deserializing, we need to keep a list of 'potential classes' that we can match the tags against. These potential classes are called 'prototypes', and are managed with a PrototypeFactory. More information about this is given in the [wiki:Serialization#Prototypes Prototypes section]. Just know now that if you use the tagging wrapper with any custom Externalizable (but not String, Integer, etc.), you will have to use a PrototypeFactory when deserializing.

2) Collisions. It is possible that two different classes would have the same tag. If this happens we have no hope of deserializing the object properly. The risk of a collision is small, however. It would take 2900 candidate classes ''at once'' before there was even a 0.1% chance (1 in 1,000) of collision; for 90 classes, the chance is one in a million. The PrototypeFactory will also detect a collision, if one occurs.

ExternalizableWrappers themselves use a longer tagging format. They first output 0xffffffff, to distinguish themselves from normal Classes, followed by a code that indicates the type of wrapper, followed by wrapper-specific 'meta-serialization' of the wrapper's type information. For most wrappers, this involves recursive tagging of the sub-types contained within it.

ExtWrapListPoly

Usage

Use this wrapper for a Vector of objects when the elements might not all be of the exact same type. The wrapper accomplishes this by tagging each element individually. It is (nearly, but not quite) functionally equivalent to new ExtWrapList(vector, new ExtWrapTagged()) (never do this).

'''Value mode''': new ExtWrapListPoly(Vector)BR '''Type mode''': new ExtWrapListPoly()

Usage Details

Vector vp = new Vector();
vp.add("string 1");
vp.add(new Integer(3));
vp.add(Boolean.FALSE);
ExtUtil.write(out, new ExtWrapListPoly(vp));
vp = (Vector)ExtUtil.read(in, new ExtWrapListPoly());

If you want to serialize a polymorphic Vector where some elements are compound types (meaning, they require a wrapper), you must insert the wrapped element directly into the source Vector.

...
vp.add(new ExtWrapList(v));
ExtUtil.write(out, new ExtWrapListPoly(vp));
vp = (Vector)ExtUtil.read(in, new ExtWrapListPoly());

ExtWrapMapPoly

Usage

Akin to ExtWrapListPoly, this wrapper is for Hashtables where the elements might not all be of the exact same type. Keys must still all be of the exact same type.

'''Value mode''': new ExtWrapMapPoly(Hashtable, key <type>); as with ExtWrapMap, key type can be omitted when it is a simple ClassBR '''Type mode''': new ExtWrapMapPoly(key <type>, boolean ordered); key type cannot be omitted; ordered can be omitted and behaves exactly as with ExtWrapMap

Usage Details

OrderedHashtable hp = new OrderedHashtable();
hp.put("a", "string");
hp.put("b", new Integer(3));
hp.put("c", Boolean.FALSE);
hp.put("d", new ExtWrapList(v));
ExtUtil.write(out, new ExtWrapMapPoly(hp));
hp = (OrderedHashtable)ExtUtil.read(in, new ExtWrapMapPoly(String.class, true));

Note that the caveat about wrapping compound types yourself still applies.

ExtWrapIntEncodingUniform

Usage

This wrapper specifies a numeric encoding that efficiently encodes integers, and strives for equal efficiency over the entire range of longs. This is the default encoding used by ExtUtil for all integer values, including all serializations of Integer, Long, Byte, and Short. Therefore, you should '''probably never use this encoding directly'''.

'''Value mode''': new ExtWrapIntEncodingUniform(long)BR '''Type mode''': new ExtWrapIntEncodingUniform()

Usage Details

Don't use this encoding directly. But if you did, it would look like:

ExtUtil.write(out, new ExtWrapIntEncodingUniform(123456789));
Long x = (Long)ExtUtil.read(in, new ExtWrapIntEncodingUniform());

Serialization Details

This encoding divides the value into 7-bit chunks, and serializes each chunk as one byte. The eighth bit is used to signal whether there are more chunks to follow. The encoding will only write as many chunks as are needed to fully represent the value.

The range [-64,63] can be represented in 1 byteBR [-8192,8191] will take 2 bytesBR [-1048576,1048575] will take 3 bytesBR and so on to the largest long, which takes 10 bytes.

ExtWrapIntEncodingSmall

Use this numeric encoding for integer domains where the value rarely exceeds the one byte range (0--255). This encoding covers the entire range of ints; it cannot be used with longs.

'''Value mode''': new ExtWrapIntEncodingSmall(long val, int bias): bias shifts the 'one byte range' around zero. Allowed values are 0--254. It may be omitted, in which case the default bias is 1.BR '''Type mode''': new ExtWrapIntEncodingSmall(int bias); bias may be omitted to use the default value

Usage Details

ExtUtil.write(out, new ExtWrapIntEncodingSmall(-26, 30));
Long x = (Long)ExtUtil.read(in, new ExtWrapIntEncodingSmall(30));

Serialization Details

The range [0 - bias, 254 - bias] (with default bias: [-1, 253]) will take 1 byte. The rest of the integer range will take 5 bytes.

Prototypes

PrototypeFactory

To deserialize tagged objects, we need a PrototypeFactory to provide a list of potential objects that we can match tags against. This need is so inherent that we've modified the Externalizable interface to accommodate it.

The new signature for Externalizable.readExternal is:

void readExternal (DataInputStream in, PrototypeFactory pf) throws IOException, DeserializationException

Now every object's deserialization will have access to a PrototypeFactory. Objects that don't need prototypes to deserialize can safely ignore it.

Where does the PrototypeFactory come from? You can create it yourself and explicitly pass it during deserialization. This, however, won't alleviate the pain of managing prototypes like we did before. So instead, you can (and likely should) use the global PrototypeFactory newly available in the JavaRosaServiceProvider. Any call to ExtUtil.read() that doesn't include a PrototypeFactory parameter (or that parameter is null) will use the global PrototypeFactory by default.

Registering Prototypes

You should register all the objects that your application needs to handle (remember, only the objects it will encounter in tagged form) with the JRSP in your shell initialization. Do so by calling:

JavaRosaServiceProvider.registerPrototype(String className)

where className is the fully qualified name of your class.

You don't need to register base classes like java.lang.String and java.lang.Integer; those are handled automatically.

One consequence of this reliance on literal class names is that '''you cannot obfuscate a class used as a prototype'''! Deserialization will break in an obfuscated JAR unless you explicitly mark in your project's build.xml that the prototyped classes not be obfuscated. Note: the contents of the class may be obfuscated, but the class name itself must not be touched.

Indicate this with directive(s) like the following:

<obfuscator name="ProGuard" unless="test or noobfuscate">
  ...
  <parameter name="keepnames" value="class [single taggable object]" />
  <parameter name="keepnames" value="!abstract class [package of taggable objects].* extends [abstract parent class of taggable objects]" />
</obfuscator>

(use fully qualified class names)

Prototype Exceptions

When using a PrototypeFactory to identify tagged objects, we still need to handle the event in which a tag doesn't match any of our prototypes. This could be because we forgot to register the prototype, we've changed the name of the class, the class was (wrongly) obfuscated, or we received the data from a foreign source that uses objects we've never heard of. When a PrototypeFactory has no class that matches a tag, it throws a DeserializationException. This exception is checked, and must be handled.

A lesser problem is that although the PrototypeFactory found a matching class, it cannot instantiate it at run-time. This problem is almost always due to programmer error. Common causes are: the object is not publicly visible, the object has no default (empty) constructor, or the 'object' is actually an interface or abstract class. When this occurs, the PrototypeFactory will throw a CannotCreateObjectException. This exception is a runtime exception, and need not be handled.

Backwards Compatibility

ExternalizableHelper has been moved to ExternalizableHelperDeprecated, and all the original helper functions now use the new framework as its backend. This old file will disappear shortly.

The original PrototypeFactory has been moved to PrototypeFactoryDeprecated, but will continue to work for objects that use it. These objects will be switched to use the new prototype scheme as quickly as possible.

Externalizable objects that don't require prototypes can just safely ignore the new PrototypeFactory parameter to Externalizable.readExternal. Although the PrototypeFactory will always be created behind the scenes during any deserialization, PrototypeFacotry itself employs lazy evaluation, and the performance hit of its initialization (computing hashes and such) is not incurred until it is certain that the PrototypeFactory is needed. Externalizables can avoid a minor performance penalty when calling ExtUtil.read() by passing along their PrototypeFactory parameter (even if it will never be used) to any such calls. A new PrototypeFactory object will be created, otherwise.

Updated