Wiki
Clone wikijavarosa / Serialization
Externalizable Framework
Background
The original serialization framework had a fatal flaw that seriously impacted its usability: you had to instantiate an object before you could deserialize it. This made it very hard to handle:
- object references that might be
null
- base types like
Integer
,String
,Boolean
, etc., that don't implementExternalizable
in J2ME - compound structures like
Vector
andHashtable
that also don't implementExternalizable
- abstract classes or interfaces where the actual object may be one of any number of child classes -- you don't know which until you start deserializing
A number of 'helpers' were created to assist with all these different situations, but the helpers had inconsistent APIs and their scope was limited; a slightly different need (say serializing a Vector
of Boolean
instead of a Vector
of Integer
) would require a whole new helper. Having to write a whole new helper to do every new thing wasn't very much help. The lack of a centralized framework for helpers also made it hard to consistently apply efficient encodings (e.g., not using four bytes to write an int
when one will do) across the board.
Another serious problem with this scheme was disambiguating multiple child classes when deserializing an abstract class. This was implemented in an ad hoc manner by each class that needed it, not only unduly complicating the class's serialization code, but also introducing serious management and coupling problems in trying to keep track of the many potential resultant classes. The encoding of these abstract classes was also wasteful.
The New Framework
The new framework builds on the existing Externalizable
interface (with some slight modifications discussed later). The main change is in the refactoring of the ExternalizableHelper
utility class. Don't use ExternalizableHelper
anymore; all the utility functions for the new framework are in ExtUtil
.
The ideal is that all serialization/deserialization, including all the tricky cases mentioned above, can be done through the same two functions:
void write (DataOutputStream out, Object o)
Object read (DataInputStream in, Class type)
To write (serialize) an object, just pass it to the function and it knows what to do. To read (deserialize), just say what kind of object you want, and it knows how to get it for you. This works great for all Externalizable
s, and for base types like Integer
and String
.
Usage Examples:
ExtUtil.write(out, new Integer(5)); Integer x = (Integer)ExtUtil.read(in, Integer.class);
ExtUtil.write(out, "a string"); String x = (String)ExtUtil.read(in, String.class);
ExtUtil.write(out, new CustomExternalizableType()); CustomExternalizableType x = (CustomExternalizableType)ExtUtil.read(in, CustomExternalizableType.class);
'''You can use this basic syntax with''' any Externalizable
, along with the following base types: String
, Integer
, Long
, Byte
, Short
, Float
, Double
, Character
, Boolean
, and Date
. For some primitive types, read
and write
may use encodings that are more efficient than the naive encodings available with DataOutputStream
. For example, integers (including Integer
, Long
, Short
, etc.) will use the minimum number of bytes necessary to encode the value instead of always using 4, 8, or 2 bytes, respectively. Therefore, you should always use the helper functions in ExtUtil
rather than calling the methods (such as out.writeInt()
) on your data stream directly.
Primitive types also have direct helper functions for when you don't want to box your value in an object. For example:
ExtUtil.writeNumeric(out, 17); long x = ExtUtil.readNumeric(in);
Wrappers
For more complex scenarios, however, we need more information about the object to properly serialize and deserialize it. We understand there is the concept of an object's 'type', which goes beyond simply what its class is. For simple situations, like in the previous section, a class is enough to fully describe the object's 'type', but oftentimes we need more.
So let's define the 'type', for serialization purposes, as an object's class plus annotations about the data it contains and how we want it serialized. This new kind of 'type' is represented as an ExternalizableWrapper
. The ExternalizableWrapper
wraps the original object and contains extra information about its type. Or it can stand alone, much like a Class
, and contain only the type annotations with no actual value inside. ExternalizableWrapper
s can be nested.
Here are the kinds of wrappers:
'''Base wrapper''' (ExtWrapBase
)
We need to represent the most basic situation when a simple Class
is enough. Since Class
and ExternalizableWrapper
don't share the same type hierarchy, we need a wrapper that represents a Class
. You will never have to use this wrapper; it is used internally within the framework only.
'''Nullable wrapper''' (ExtWrapNullable
)
This wrapper indicates that the wrapped object may be null
.
'''Compound wrappers''' (ExtWrapList
, ExtWrapListPoly
, ExtWrapMap
, and ExtWrapMapPoly
)
These wrappers represent compound datastructures. They contain information about the type(s) of the elements within. Note that since wrappers can be nested, this allows us to represent Vector
s of Hashtables
s of Vector
s of Hashtable
s, arbitrarily deep.
'''Tagging wrapper''' (ExtWrapTagged
)
This wrapper indicates that, in addition to serializing the object itself, we should write a full description of the object's type as well. Meaning: the stream contains enough context that we can fully deserialize the object without any foreknowledge about what kind of object it is. The wrapper is useful when an object may be any one of several sub-classes of an abstract parent. And since wrappers can be nested, the 'tag' can encompass any possible compound datastructure.
'''Encoding wrappers''' (ExtWrapIntEncodingUniform
, ExtWrapIntEncodingSmall
, more to come for sure...)
These wrappers specify alternate encodings of the same core type. For example, integers can be encoded in any number of schemes from 'naive' (always dump 4 bytes), to a number of more efficient schemes with various weaknesses and strengths.
ExternalizableWrapper Details
We'll now go through all the defined wrappers with details how to use them. The Externalizable
unit tests have many more usage examples.
As explained before, wrappers are used in two different ways:
- When they contain the core data being wrapped. We use them this way during serialization. This is called '''value mode'''
- When they don't contain any core data and are just an abstract representation of a 'type'. This is called '''type mode'''. We use them this way during deserialization ("here's the kind of object I want"), as well as for supplementary type annotations during serialization ("the wrapped
Vector
's elements areVector
s ofString
"-- we have anExtWrapList
in value mode for the top-levelVector
, but also a type-modeExtWrapList
to represent the 'Vector
ofString
' sub-type).
In the usage definitions below:
<object>
will mean any allowed base type (Externalizable
, String
, etc.), or any value-mode ExternalizableWrapper
BR
<type>
will mean any allowed base Class
(Externalizable.class
, String.class
, etc.), or any type-mode ExternalizableWrapper
ExtWrapNullable
Usage
Use this wrapper when trying to represent an object that may be null
.
'''Value mode''': new ExtWrapNullable(<object>)
, where <object>
may be null
BR
'''Type mode''': new ExtWrapNullable(<type>)
Usage Details
ExtUtil.read()
and ExtUtil.write()
do not inherently support null
s:
String x = null; ExtUtil.write(out, x);
will throw an exception.
Instead, do:
String x = null; ExtUtil.write(out, new ExtWrapNullable(x)); x = (String)ExtUtil.read(in, new ExtWrapNullable(String.class));
Note how ExtWrapNullable(String.class)
is an annotated extension of the base type String
.
Serialization Details
This wrapper prefixes the serialized value with 0x01
when not null
, or writes only 0x00
when null
.
ExtWrapList
Usage
Use this wrapper for a Vector
of objects. The objects must all be of the '''exact same type'''. Not even children of the same parent class, but exactly the same.
'''Value mode''': new ExtWrapList(Vector, <type>)
, where <type>
is the type of the child elements; <type> may be omitted when it is simply a Class
BR
'''Type mode''': new ExtWrapList(<type>)
, where <type>
is the type of the child elements; <type> cannot be omitted
Usage Details
Vector v = new Vector(); v.add("string 1"); v.add("string 2"); v.add("string 3"); ExtUtil.write(out, new ExtWrapList(v)); v = (Vector)ExtUtil.read(in, new ExtWrapList(String.class));
Note that when writing, you don't need to specify type String
; the serializer can figure it out from the Vector
's contents.
We can serialize Vector
s of Vector
s (or any other complex type). In this context, 'exact same type' means all the extended details of the type must be the same as well. For instance, we can serialize a Vector
when all its elements are Vector
s of String
s, but not when some elements are Vector
s of String
s and others are Vector
s of Integer
s.
Vector vv = new Vector(); vv.add(v); vv.add(v); ExtUtil.write(out, new ExtWrapList(vv, new ExtWrapList(String.class))); vv = (Vector)ExtUtil.read(in, new ExtWrapList(new ExtWrapList(String.class)));
Here we see how the supplementary parameter is the type of its elements. Whereas before, during serialization, read
could figure out the type, now we need to supply it explicitly.
We can also represent a Vector
that may be null
by combining with an ExtWrapNullable
wrapper as such:
Vector x = null; ExtWrapList wl = (x == null ? x : new ExtWrapList(x)); ExtUtil.write(out, new ExtWrapNullable(wl)); x = (Vector)ExtUtil.read(in, new ExtWrapNullable(new ExtWrapList(String.class)));
Not the cleanest, but oh well.
ExtWrapMap
Usage
This wrapper is similar to ExtWrapList
, but it is used for Hashtable
s of objects (including OrderedHashtable
). All the keys must have the same exact type, and all the elements must have the exact same type (the key type can be different than the element type).
'''Value mode''': new ExtWrapMap(Hashtable, key <type>, element <type>)
, where <type>
is the type of the keys and elements, respectively; also:BR
new ExtWrapMap(Hashtable)
: both <type>s may be omitted when they are both Class
es, andBR
new ExtWrapMap(Hashtable, element <type>)
: the key <type> may be omitted if it is a Class
but the element type is notBR
'''Type mode''': new ExtWrapMap(key <type>, element <type>, boolean ordered)
, where <type>
is the type of the keys and elements, respectively; <type> cannot be omitted. ordered
, if true
, will yield an OrderedHashtable
; ordered
may be omitted and defaults to false
.
Usage Details
Usage is very similar to ExtWrapList
.
Hashtable h = new Hashtable(); h.put("a", new Integer(1)); h.put("b", new Integer(2)); h.put("c", new Integer(3)); ExtUtil.write(out, new ExtWrapMap(h)); h = (Hashtable)ExtUtil.read(in, new ExtWrapMap(String.class, Integer.class));
OrderedHashtable hh = new OrderedHashtable(); hh.put(Boolean.TRUE, h); hh.put(Boolean.FALSE, h); ExtUtil.write(out, new ExtWrapMap(hh, new ExtWrapMap(String.class, Integer.class))); hh = (OrderedHashtable)ExtUtil.read(in, new ExtWrapMap(Boolean.class, new ExtWrapMap(String.class, Integer.class), true));
You can serialize a Hashtable
and deserialize it as an OrderedHashtable
, no problem (and vice versa).
ExtWrapTagged
Usage
This wrapper writes out any object, but tags it with enough information to completely know the object's type. Its use is in serializing polymorphic types, where the actual object may be any of several classes. It can also be applied to any compound type, such as ExtWrapNullable
s, ExtWrapList
s, etc.
'''Value mode''': new ExtWrapTagged(<object>)
BR
'''Type mode''': new ExtWrapTagged()
; there are no arguments because all the necessary information is already in the stream
Usage Details
Usage is trivial.
ExtUtil.write(out, new ExtWrapTagged("string")); String x = (String)ExtUtil.read(in, new ExtWrapTagged());
ExtUtil.write(out, new ExtWrapTagged(new CustomExternalizableType())); CustomExternalizableType x = (CustomExternalizableType)ExtUtil.read(in, new ExtWrapTagged());
ExtUtil.write(out, new ExtWrapTagged(new ExtWrapMap(hh, new ExtWrapMap(String.class, Integer.class)))); OrderedHashtable x = (OrderedHashtable)ExtUtil.read(in, new ExtWrapTagged());
One complication is that you can't mix and match Hashtable
s and OrderedHashtable
s (serialize as one, deserialize as the other) when tagging; you will get a ClassCastException
.
Serialization Details
Tagging is done by serializing the object as normal but prefixing it with a 'tag' that identifies its type.
For base classes (Externalizable
, String
, etc.), this tag is a 4-byte hash code of the fully-qualified class name. For example, String
(java.lang.String
) will tag as 0x42c25be3
, Integer
as 0x7ca16fdb
, and org.javarosa.core.model.QuestionDef
as 0x27512ec9
. This scheme is ''far'' more efficient than writing out the full class name like we were before. There are two problems, though:
1) How do we get from a tag back to a Class
? We can't, because hashes are one-way. Therefore, as we're deserializing, we need to keep a list of 'potential classes' that we can match the tags against. These potential classes are called 'prototypes', and are managed with a PrototypeFactory
. More information about this is given in the [wiki:Serialization#Prototypes Prototypes section]. Just know now that if you use the tagging wrapper with any custom Externalizable
(but not String
, Integer
, etc.), you will have to use a PrototypeFactory
when deserializing.
2) Collisions. It is possible that two different classes would have the same tag. If this happens we have no hope of deserializing the object properly. The risk of a collision is small, however. It would take 2900 candidate classes ''at once'' before there was even a 0.1% chance (1 in 1,000) of collision; for 90 classes, the chance is one in a million. The PrototypeFactory
will also detect a collision, if one occurs.
ExternalizableWrapper
s themselves use a longer tagging format. They first output 0xffffffff
, to distinguish themselves from normal Class
es, followed by a code that indicates the type of wrapper, followed by wrapper-specific 'meta-serialization' of the wrapper's type information. For most wrappers, this involves recursive tagging of the sub-types contained within it.
ExtWrapListPoly
Usage
Use this wrapper for a Vector
of objects when the elements might not all be of the exact same type. The wrapper accomplishes this by tagging each element individually. It is (nearly, but not quite) functionally equivalent to new ExtWrapList(vector, new ExtWrapTagged())
(never do this).
'''Value mode''': new ExtWrapListPoly(Vector)
BR
'''Type mode''': new ExtWrapListPoly()
Usage Details
Vector vp = new Vector(); vp.add("string 1"); vp.add(new Integer(3)); vp.add(Boolean.FALSE); ExtUtil.write(out, new ExtWrapListPoly(vp)); vp = (Vector)ExtUtil.read(in, new ExtWrapListPoly());
If you want to serialize a polymorphic Vector
where some elements are compound types (meaning, they require a wrapper), you must insert the wrapped element directly into the source Vector
.
... vp.add(new ExtWrapList(v)); ExtUtil.write(out, new ExtWrapListPoly(vp)); vp = (Vector)ExtUtil.read(in, new ExtWrapListPoly());
ExtWrapMapPoly
Usage
Akin to ExtWrapListPoly
, this wrapper is for Hashtable
s where the elements might not all be of the exact same type. Keys must still all be of the exact same type.
'''Value mode''': new ExtWrapMapPoly(Hashtable, key <type>)
; as with ExtWrapMap
, key type can be omitted when it is a simple Class
BR
'''Type mode''': new ExtWrapMapPoly(key <type>, boolean ordered)
; key type cannot be omitted; ordered
can be omitted and behaves exactly as with ExtWrapMap
Usage Details
OrderedHashtable hp = new OrderedHashtable(); hp.put("a", "string"); hp.put("b", new Integer(3)); hp.put("c", Boolean.FALSE); hp.put("d", new ExtWrapList(v)); ExtUtil.write(out, new ExtWrapMapPoly(hp)); hp = (OrderedHashtable)ExtUtil.read(in, new ExtWrapMapPoly(String.class, true));
Note that the caveat about wrapping compound types yourself still applies.
ExtWrapIntEncodingUniform
Usage
This wrapper specifies a numeric encoding that efficiently encodes integers, and strives for equal efficiency over the entire range of long
s. This is the default encoding used by ExtUtil
for all integer values, including all serializations of Integer
, Long
, Byte
, and Short
. Therefore, you should '''probably never use this encoding directly'''.
'''Value mode''': new ExtWrapIntEncodingUniform(long)
BR
'''Type mode''': new ExtWrapIntEncodingUniform()
Usage Details
Don't use this encoding directly. But if you did, it would look like:
ExtUtil.write(out, new ExtWrapIntEncodingUniform(123456789)); Long x = (Long)ExtUtil.read(in, new ExtWrapIntEncodingUniform());
Serialization Details
This encoding divides the value into 7-bit chunks, and serializes each chunk as one byte. The eighth bit is used to signal whether there are more chunks to follow. The encoding will only write as many chunks as are needed to fully represent the value.
The range [-64,63] can be represented in 1 byteBR
[-8192,8191] will take 2 bytesBR
[-1048576,1048575] will take 3 bytesBR
and so on to the largest long
, which takes 10 bytes.
ExtWrapIntEncodingSmall
Use this numeric encoding for integer domains where the value rarely exceeds the one byte range (0--255). This encoding covers the entire range of int
s; it cannot be used with long
s.
'''Value mode''': new ExtWrapIntEncodingSmall(long val, int bias)
: bias
shifts the 'one byte range' around zero. Allowed values are 0--254. It may be omitted, in which case the default bias is 1.BR
'''Type mode''': new ExtWrapIntEncodingSmall(int bias)
; bias
may be omitted to use the default value
Usage Details
ExtUtil.write(out, new ExtWrapIntEncodingSmall(-26, 30)); Long x = (Long)ExtUtil.read(in, new ExtWrapIntEncodingSmall(30));
Serialization Details
The range [0 - bias, 254 - bias] (with default bias: [-1, 253]) will take 1 byte. The rest of the integer range will take 5 bytes.
Prototypes
PrototypeFactory
To deserialize tagged objects, we need a PrototypeFactory
to provide a list of potential objects that we can match tags against. This need is so inherent that we've modified the Externalizable
interface to accommodate it.
The new signature for Externalizable.readExternal
is:
void readExternal (DataInputStream in, PrototypeFactory pf) throws IOException, DeserializationException
Now every object's deserialization will have access to a PrototypeFactory
. Objects that don't need prototypes to deserialize can safely ignore it.
Where does the PrototypeFactory
come from? You can create it yourself and explicitly pass it during deserialization. This, however, won't alleviate the pain of managing prototypes like we did before. So instead, you can (and likely should) use the global PrototypeFactory
newly available in the JavaRosaServiceProvider
. Any call to ExtUtil.read()
that doesn't include a PrototypeFactory
parameter (or that parameter is null
) will use the global PrototypeFactory
by default.
Registering Prototypes
You should register all the objects that your application needs to handle (remember, only the objects it will encounter in tagged form) with the JRSP
in your shell initialization. Do so by calling:
JavaRosaServiceProvider.registerPrototype(String className)
where className
is the fully qualified name of your class.
You don't need to register base classes like java.lang.String
and java.lang.Integer
; those are handled automatically.
One consequence of this reliance on literal class names is that '''you cannot obfuscate a class used as a prototype'''! Deserialization will break in an obfuscated JAR unless you explicitly mark in your project's build.xml
that the prototyped classes not be obfuscated. Note: the contents of the class may be obfuscated, but the class name itself must not be touched.
Indicate this with directive(s) like the following:
<obfuscator name="ProGuard" unless="test or noobfuscate"> ... <parameter name="keepnames" value="class [single taggable object]" /> <parameter name="keepnames" value="!abstract class [package of taggable objects].* extends [abstract parent class of taggable objects]" /> </obfuscator>
(use fully qualified class names)
Prototype Exceptions
When using a PrototypeFactory
to identify tagged objects, we still need to handle the event in which a tag doesn't match any of our prototypes. This could be because we forgot to register the prototype, we've changed the name of the class, the class was (wrongly) obfuscated, or we received the data from a foreign source that uses objects we've never heard of. When a PrototypeFactory
has no class that matches a tag, it throws a DeserializationException
. This exception is checked, and must be handled.
A lesser problem is that although the PrototypeFactory
found a matching class, it cannot instantiate it at run-time. This problem is almost always due to programmer error. Common causes are: the object is not publicly visible, the object has no default (empty) constructor, or the 'object' is actually an interface or abstract class. When this occurs, the PrototypeFactory
will throw a CannotCreateObjectException
. This exception is a runtime exception, and need not be handled.
Backwards Compatibility
ExternalizableHelper
has been moved to ExternalizableHelperDeprecated
, and all the original helper functions now use the new framework as its backend. This old file will disappear shortly.
The original PrototypeFactory
has been moved to PrototypeFactoryDeprecated
, but will continue to work for objects that use it. These objects will be switched to use the new prototype scheme as quickly as possible.
Externalizable
objects that don't require prototypes can just safely ignore the new PrototypeFactory
parameter to Externalizable.readExternal
. Although the PrototypeFactory
will always be created behind the scenes during any deserialization, PrototypeFacotry
itself employs lazy evaluation, and the performance hit of its initialization (computing hashes and such) is not incurred until it is certain that the PrototypeFactory
is needed. Externalizable
s can avoid a minor performance penalty when calling ExtUtil.read()
by passing along their PrototypeFactory
parameter (even if it will never be used) to any such calls. A new PrototypeFactory
object will be created, otherwise.
Updated