Lately I've been experimenting with serialization engines in Pharo. Besides the "traditional" alternative (SmartReferenceStream) I took the chance of evaluating Fuel, a new serializer which is nicely documented and supported, actually Mariano Martinez Peck and Martin Dias (the Fuel developers) answered privately my questions and requirements in a very fast way, so thanks to them I can show you an interesting feature now in BioSmalltalk
A typical serialization in bioinformatics includes a huge group of sequences or a big XML tree, so one of my requirements is to customize the serialization strategy to save precious memory. This means to change the serializer on-the-fly when a particular object is found in a graph of objects, specifically, if a DNA or protein sequence with a particular threshold is found, you certainly would like to zip it. Follows an example for serializing an Array with a random object ('hello') and the chromosome 28 of chicken:
A typical serialization in bioinformatics includes a huge group of sequences or a big XML tree, so one of my requirements is to customize the serialization strategy to save precious memory. This means to change the serializer on-the-fly when a particular object is found in a graph of objects, specifically, if a DNA or protein sequence with a particular threshold is found, you certainly would like to zip it. Follows an example for serializing an Array with a random object ('hello') and the chromosome 28 of chicken:
objectToSerialize := Array with: 'hello' with: (FileStream readOnlyFileNamed: 'GGA28.fa') contents.
threshold := 1000.
FileStream forceNewFileNamed: 'demo.fuel' do: [ :aStream |
aSerializer := FLSerializer newDefault.
aSerializer analyzer
when: [ :o | o isString and: [ o size > threshold and: [ (BioParser tokenizeFasta: o) second isDNASequence ] ] ]
substituteBy: [ :o | o zipped ].
aSerializer
serialize: objectToSerialize
on: aStream binary ].
and of course, the corresponding materializationresult := FileStream oldFileNamed: 'demo.fuel' do: [ :aStream |
aMaterialization := FLMaterializer newDefault materializeFrom: aStream binary.
zippedStrings := aMaterialization objects select: [:o | o isString and: [ o isDNASequence ]].
unzippedStrings := zippedStrings collect: [:o | o unzipped ].
zippedStrings elementsExchangeIdentityWith: unzippedStrings.
aMaterialization root ].
Looking at the possibilities, many of the custom DNA compression algorithms (or even XML) could be attached and used if saving space is becoming an issue in your bioinformatics experiments.