Version: 2.0.0

Calltable serialization scheme

The Calltable serialization scheme is a way to byte-serialize data in a way that can change over time and can retain backwards compatibility.

Calltable envelope

As mentioned before, the calltable serialization approach requires serializing a calltable evelope, from now called an envelope. The envelope consists of:

fields - field which is an ordered collection of Field. This collection is serialized as a Vec<Field> in the "regular" Binary Serialization Standard schema. The invariants for fields are:
- for each f_i, f_j given i < j implies f_i.index < f_j.index
- for each f_i, f_j given i < j implies f_i.offset < f_j.offset
bytes - field which is an amorphous blob of bytes serialized as Bytes in the "regular" Binary Serialization Standard schema. Bear in mind that this field will be serialized by appending first a u32 integer (number of bytes) than the raw bytes payload.

A Field consists of:

index - field (u16),
offset - field (u32),

The fields part of the envelope is the self describing data for the binary payload stored in bytes. To successfully serialize something into a calltable, we need to assign unique serialization index identifiers to it's parts. In a structure those parts would be fields. Let's denote the index of nth field as i_n. We then byte-serialize (according to Binary Serialization Standard) each field, let's denote the calculated bytes of the n-th field as b_n. We create an ordered collection of indices I which constist of all the serialization indices of fields in an ascending order.

I = [i₀, i₁, ..., i_i-1, i_i]

We create an ordered collection of bytes collection B from all b_i keeping i's order, so:

B = [b₀, b₁, ..., b_i-1, b_i]

Next, we can create the calltable fields and bytes with this example implementation:

  def build_calltable_data(I, B):
      assert len(I) == len(B)
      offset = 0
      bytes_arr = bytes([])
      fields = []
      for i in range(0, len(I)):
          this_fields_serialization_index = I[i]
          this_field_bytes = B[i]
          this_field_length = len(this_field_bytes)
          fields.append({
              "index": this_fields_serialization_index,
              "offset": offset,
          })
          offset += this_field_length
          bytes_arr += this_field_bytes
      return {
          "fields": fields,
          "bytes": bytes_arr
      }

so in this example of usage:

I = [0, 1, 3, 5]
B = [b'\x00\x01\xff', b'\x37\x0c\x6e\x3c\x0f', b'\x07\x95\x01', b'\x37']
envelope = build_calltable_data(I, B)

envelope["fields"] would be [{'index': 0, 'offset': 0}, {'index': 1, 'offset': 3}, {'index': 3, 'offset': 8}, {'index': 5, 'offset': 11}]

and envelope["bytes"] would be an array of bytes, which represented as hex string would look like:

00 01 ff 37 0c 6e 3c 0f 07 95 01 37

(whitespaces added for clarity)

Once we have the fields and bytes we can proceed to serialize those in the "default Binary Serialization Standard" way using this script:

  def serialize_calltable_representation(fields, concatenated_serialized_fields):
      bytes_arr = bytes([])
      fields_len = len(fields)
      bytes_arr += fields_len.to_bytes(4, byteorder = 'little')
      for i in range(0, len(fields)):
          bytes_arr += fields[i]["index"].to_bytes(2, byteorder = 'little')
          bytes_arr += fields[i]["offset"].to_bytes(4, byteorder = 'little')
      concatenated_serialized_fields_len = len(concatenated_serialized_fields)
      print(f"{concatenated_serialized_fields_len}")
      bytes_arr += concatenated_serialized_fields_len.to_bytes(4, byteorder = 'little')
      bytes_arr += concatenated_serialized_fields
      return bytes_arr

and once we have all that we can apply the scripts to example:

    I = [0, 1, 3, 5]
    B = [bytes([0, 1, 255]), bytes([55, 12, 110, 60, 15]), bytes([7, 149, 1]), bytes([55])]
    envelope = build_calltable_data(I, B)
    serialized_calltable_representation = serialize_calltable_representation(envelope["fields"], envelope["bytes"])
    print(f"{serialized_calltable_representation.hex()}")

which produces:

0400000000000000000001000300000003000800000005000b0000000c0000000001ff370c6e3c0f07950137

In the above hex:

Serialized length of `fields` collection	field[0].index	field[0].offset	field[1].index	field[1].offset	field[2].index	field[2].offset	field[3].index	field[3].offset	number of bytes in `bytes` field	raw bytes of `bytes` field
04000000	0000	00000000	0100	03000000	0300	08000000	0500	0b000000	0c000000	0001ff370c6e3c0f07950137

This concludes how we construct and byte-serialize an envelope. In the next paragraphs we will explain what are the assumptions and conventions when dealing with structs and tagged-unions.

Serializing uniform data structures

In this paragraph we will explain how to use calltable serialization scheme for non-polymorphic data structure. We know what their fields will be. We assign each field a unique serialization index, starting from 0. For a specific structure, an index used for a field is assigned to that field indefinitely - if we ever decided to retire a field from a structure, we will stop using that index in the serialization. If we ever decide to add a field to a struct, we will add a new index that was never used for that structure. Although it is not required for the indices to be contiguous, the assumption is that we are assigning them one-by-one. Any "holes" in the indexing should indicate that there used to be a field that is no longer used or is not mandatory.

struct A {
    a: u16,
    b: String,
    c: Vec<OtherStruct>
}

In the above example we could assign a index 0, b index 1 and c index 2. Knowing this and assuming that we know how to byte-serialize OtherStruct we should be able to create an envelope for this struct and serialize it in the calltable scheme.

Serializing tagged-unions

By tagged-union we understand polymorphic, but limited (an instance of an tagged-union can be only one of N known variants) data structures that are unions of structures and/or other tagged-unions. An tagged-union variant can be:

empty (tag variant)
a struct
a nested tagged-union

As mentioned, there is a polymorphic aspect to these kinds of tagged-union and we handle them by convention - serialization index 0 is always reserved for a 1 byte discriminator number which defines which tagged-union variant is being serialized. The value of this specific pseudo-field will be called variant discriminator. Subsequent indices are used to serialize the fields of specific variants (for empty tag variants there will be no more indices). So, given an example tagged-union (implemented in rust, rust equivalent of tagged-union is enum):

  enum X {
     A,
     B {a: u16, b: u32},
     C (u16, u32, u64)
  }

First we need to chose variant discriminator values for each of the tagged-union variants, let's select:

if variant A - the variant discriminator will be 0
if variant B - the variant discriminator will be 1
if variant C - the variant discriminator will be 2

Again, as with fields in Serializing tagged-unions the variant discriminator values does not need to start from 0 and does not need to be contiguous, but that is our convention and any "discontinuities" in the value set for variant disciminator would indicate a retired tagged-union variant or variants.

Next we need to assign field serialization indices for each variant:

for A there are no fields so no indices are needed
for B:
- 1 for field a (remember, 0 is reserved for the discriminator),
- 2 for field b
for C:
- 1 for the first tuple element (of type u16) (remember, 0 is reserved for the discriminator),
- 2 for the second tuple element (of type u32),
- 3 for the second tuple element (of type u64),

As you can see, serialization indices for fields need to be unique in scope of a particular tagged-union variant. Knowing the above, let's see how the I and B collections would look like for different instances of this tagged-union:

when serializing variant X::A (assuming python notation):

I = [0]
B = [(0).to_bytes(1, byteorder = 'little')]

when serializing variant X::B {a: 155, b: 9500} (assuming python notation):

I = [0, 1, 2]
B = [(1).to_bytes(1, byteorder = 'little'), (155).to_bytes(2, byteorder = 'little'), (9500).to_bytes(4, byteorder = 'little')]

when serializing variant X::C(5, 10, 15) (assuming python notation):

I = [0, 1, 2]
B = [(2).to_bytes(1, byteorder = 'little'), (5).to_bytes(2, byteorder = 'little'), (10).to_bytes(4, byteorder = 'little'), (15).to_bytes(8, byteorder = 'little')]

You can apply the above I and B values to the prior defined functions: build_calltable_data and serialize_calltable_representation to determine the output bytes.

Calltable envelope​

Serializing uniform data structures​

Serializing tagged-unions​

Calltable envelope

Serializing uniform data structures

Serializing tagged-unions