Calltable serialization scheme
The Calltable serialization scheme
is a way to byte-serialize data in a way that can change over time and can retain backwards compatibility.
Calltable envelope
As mentioned before, the calltable serialization approach
requires serializing a calltable evelope
, from now called an envelope
. The envelope
consists of:
fields
- field which is an ordered collection of Field. This collection is serialized as a Vec<Field> in the "regular"Binary Serialization Standard
schema. The invariants forfields
are:- for each fi, fj given
i
<j
implies fi.index < fj.index - for each fi, fj given
i
<j
implies fi.offset < fj.offset
- for each fi, fj given
bytes
- field which is an amorphous blob of bytes serialized asBytes
in the "regular"Binary Serialization Standard
schema. Bear in mind that this field will be serialized by appending first au32
integer (number of bytes) than the raw bytes payload.
A Field
consists of:
index
- field (u16
),offset
- field (u32
),
The fields
part of the envelope is the self describing
data for the binary payload stored in bytes
. To successfully serialize something into a calltable, we need to assign unique serialization index
identifiers to it's parts. In a structure those parts would be fields. Let's denote the index of nth field as in. We then byte-serialize (according to Binary Serialization Standard) each field, let's denote the calculated bytes of the n-th field as bn.
We create an ordered collection of indices I
which constist of all the serialization indices
of fields in an ascending order.
I = [i0, i1, ..., ii-1, ii]
We create an ordered collection of bytes collection B
from all bi keeping i's order, so:
B = [b0, b1, ..., bi-1, bi]
Next, we can create the calltable fields
and bytes
with this example implementation:
def build_calltable_data(I, B):
assert len(I) == len(B)
offset = 0
bytes_arr = bytes([])
fields = []
for i in range(0, len(I)):
this_fields_serialization_index = I[i]
this_field_bytes = B[i]
this_field_length = len(this_field_bytes)
fields.append({
"index": this_fields_serialization_index,
"offset": offset,
})
offset += this_field_length
bytes_arr += this_field_bytes
return {
"fields": fields,
"bytes": bytes_arr
}
so in this example of usage:
I = [0, 1, 3, 5]
B = [b'\x00\x01\xff', b'\x37\x0c\x6e\x3c\x0f', b'\x07\x95\x01', b'\x37']
envelope = build_calltable_data(I, B)
envelope["fields"]
would be [{'index': 0, 'offset': 0}, {'index': 1, 'offset': 3}, {'index': 3, 'offset': 8}, {'index': 5, 'offset': 11}]
and envelope["bytes"]
would be an array of bytes, which represented as hex string would look like:
00 01 ff 37 0c 6e 3c 0f 07 95 01 37
(whitespaces added for clarity)
Once we have the fields
and bytes
we can proceed to serialize those in the "default Binary Serialization Standard" way using this script:
def serialize_calltable_representation(fields, concatenated_serialized_fields):
bytes_arr = bytes([])
fields_len = len(fields)
bytes_arr += fields_len.to_bytes(4, byteorder = 'little')
for i in range(0, len(fields)):
bytes_arr += fields[i]["index"].to_bytes(2, byteorder = 'little')
bytes_arr += fields[i]["offset"].to_bytes(4, byteorder = 'little')
concatenated_serialized_fields_len = len(concatenated_serialized_fields)
print(f"{concatenated_serialized_fields_len}")
bytes_arr += concatenated_serialized_fields_len.to_bytes(4, byteorder = 'little')
bytes_arr += concatenated_serialized_fields
return bytes_arr
and once we have all that we can apply the scripts to example:
I = [0, 1, 3, 5]
B = [bytes([0, 1, 255]), bytes([55, 12, 110, 60, 15]), bytes([7, 149, 1]), bytes([55])]
envelope = build_calltable_data(I, B)
serialized_calltable_representation = serialize_calltable_representation(envelope["fields"], envelope["bytes"])
print(f"{serialized_calltable_representation.hex()}")
which produces:
0400000000000000000001000300000003000800000005000b0000000c0000000001ff370c6e3c0f07950137
In the above hex:
Serialized length of fields collection | field[0].index | field[0].offset | field[1].index | field[1].offset | field[2].index | field[2].offset | field[3].index | field[3].offset | number of bytes in bytes field | raw bytes of bytes field |
---|---|---|---|---|---|---|---|---|---|---|
04000000 | 0000 | 00000000 | 0100 | 03000000 | 0300 | 08000000 | 0500 | 0b000000 | 0c000000 | 0001ff370c6e3c0f07950137 |
This concludes how we construct and byte-serialize an envelope
. In the next paragraphs we will explain what are the assumptions and conventions when dealing with struct
s and tagged-union
s.
Serializing uniform data structures
In this paragraph we will explain how to use calltable serialization scheme
for non-polymorphic data structure. We know what their fields will be. We assign each field a unique serialization index
, starting from 0
. For a specific structure, an index used for a field is assigned to that field indefinitely - if we ever decided to retire a field from a structure, we will stop using that index in the serialization. If we ever decide to add a field to a struct, we will add a new index that was never used for that structure. Although it is not required for the indices to be contiguous, the assumption is that we are assigning them one-by-one. Any "holes" in the indexing should indicate that there used to be a field that is no longer used or is not mandatory.
struct A {
a: u16,
b: String,
c: Vec<OtherStruct>
}
In the above example we could assign a
index 0
, b
index 1
and c
index 2
. Knowing this and assuming that we know how to byte-serialize OtherStruct
we should be able to create an envelope
for this struct and serialize it in the calltable
scheme.
Serializing tagged-unions
By tagged-union
we understand polymorphic, but limited (an instance of an tagged-union can be only one of N known variants
) data structures that are unions of structures and/or other tagged-unions. An tagged-union variant can be:
- empty (tag variant)
- a struct
- a nested tagged-union
As mentioned, there is a polymorphic aspect to these kinds of tagged-union and we handle them by convention - serialization index
0
is always reserved for a 1 byte discriminator number which defines which tagged-union variant is being serialized. The value of this specific pseudo-field will be called variant discriminator
. Subsequent indices are used to serialize the fields of specific variants (for empty tag variants there will be no more indices). So, given an example tagged-union (implemented in rust, rust equivalent of tagged-union is enum
):
enum X {
A,
B {a: u16, b: u32},
C (u16, u32, u64)
}
First we need to chose variant discriminator values for each of the tagged-union variants, let's select:
- if variant
A
- the variant discriminator will be0
- if variant
B
- the variant discriminator will be1
- if variant
C
- the variant discriminator will be2
Again, as with fields in Serializing tagged-unions
the variant discriminator values does not need to start from 0
and does not need to be contiguous, but that is our convention and any "discontinuities" in the value set for variant disciminator would indicate a retired tagged-union variant or variants.
Next we need to assign field serialization indices
for each variant:
- for A there are no fields so no indices are needed
- for B:
1
for fielda
(remember,0
is reserved for the discriminator),2
for fieldb
- for C:
1
for the first tuple element (of type u16) (remember,0
is reserved for the discriminator),2
for the second tuple element (of type u32),3
for the second tuple element (of type u64),
As you can see, serialization indices
for fields need to be unique in scope of a particular tagged-union variant.
Knowing the above, let's see how the I
and B
collections would look like for different instances of this tagged-union:
- when serializing variant
X::A
(assuming python notation):I = [0]
B = [(0).to_bytes(1, byteorder = 'little')] - when serializing variant
X::B {a: 155, b: 9500}
(assuming python notation):I = [0, 1, 2]
B = [(1).to_bytes(1, byteorder = 'little'), (155).to_bytes(2, byteorder = 'little'), (9500).to_bytes(4, byteorder = 'little')] - when serializing variant
X::C(5, 10, 15)
(assuming python notation):I = [0, 1, 2]
B = [(2).to_bytes(1, byteorder = 'little'), (5).to_bytes(2, byteorder = 'little'), (10).to_bytes(4, byteorder = 'little'), (15).to_bytes(8, byteorder = 'little')]
You can apply the above I
and B
values to the prior defined functions: build_calltable_data
and serialize_calltable_representation
to determine the output bytes.