In this paragraph, we’ll assume that we’re dealing with the special case of an array of X, where X is a serializable type (either with MP or Protobuf). Furthermore, we’re interested in the case where the length of X[] isn’t known beforehand, which usually happens when we’re generating logs. The only way we can know that we’ve deserialized all elements is when we reach the end of the stream.
Character Count Message Framing:
In order to introduce concurrency, we’ll have to frame our data, i.e., introduce a delimiter between each X’s binary representation, specifying the length each instance of X will take on the stream. In this manner we can quickly read each X’s buffer and then queue it for further parallel processing. The approach followed will differ between Messagepack and Protobuf.
MesssagePack:
MessagePack has some similarities with the logic of a JSON message, so I’ll exemplify in JSON how the original array of X will look like before and after framing.
Before:
"src": "Images/Sun.png",
"hOffset": 250,
"vOffset": 200,
},
{
"src": "Images/Earth.png",
"hOffset": 100,
"vOffset": 100,
}
Note that this is actually not a single JSON msg, but a concatenation of individual JSON blocks (it’s missing the outer brackets). It’s not obvious by looking at the JSON representation, but it’s necessary to express X[] in this manner since we don’t know in advance how many elements the array has, and MessagePack needs to prefix each array or map with its number of elements.
Let’s assume the number of bytes taken by the serialized representation of X[0] is 11 and X[1] is 9 (I’m getting these numbers out of thin air). Then we’ll introduce framing by wrapping each X instance thus:
“length”: 11,
“body”:
{
"src": "Images/Sun.png",
"hOffset": 250,
"vOffset": 200,
}
},
{
“length”: 9,
“body”:
{
"src": "Images/Earth.png",
"hOffset": 100,
"vOffset": 100,
}
}
And its binary representation would look like:
which is
still a valid MessagePack encoding, insomuch as the original representation
was, but now we can extract from the stream enough information to split it into
blocks for parallel processing.
No comments:
Post a Comment