Serialize/Deserialize (SerDe)

A SerDe is a class used to serialize items sketches to a bytes object in binary. Several example SerDes are provided as references.

The use of binary-compatible SerDes in different languages is critical for cross-language compatibility.

Each implementation must extend the PyObjectSerDe class and override all three of its methods.

class PyObjectSerDe

An abstract base class for serde objects. All custom serdes must extend this class.

get_size(self, item: object) int

Returns the size in bytes of an item

Parameters:

item (object) – The specified object

Returns:

The size of the item in bytes

Return type:

int

to_bytes(self, item: object) bytes

Retuns a bytes object with a serialized version of an item

Parameters:

item (object) – The specified object

Returns:

A bytes object with the serialized object

Return type:

bytes

from_bytes(self, data: bytes, offset: int) tuple

Reads a bytes object starting from the given offest and returns a tuple of the reconstructed object and the number of additional bytes read

Parameters:
  • data (bytes) – A bytes object from which to deserialize

  • offset (int) – The offset, in bytes, at which to start reading

Returns:

A tuple with the reconstructed object and the number of bytes read

Return type:

tuple(object, int)

The provided SerDes are:

class PyStringsSerDe

Bases: PyObjectSerDe

Implements a simple string-encoding scheme where a string is written as <num_bytes> <string_contents>, with no null termination. This format allows pre-allocating each string, at the cost of additional storage. Using this format, the serialized string consumes 4 + len(item) bytes.

class PyIntsSerDe

Bases: PyObjectSerDe

Implements an integer encoding scheme where each integer is written as a 32-bit (4 byte) little-endian value.

class PyLongsSerDe

Bases: PyObjectSerDe

Implements an integer encoding scheme where each integer is written as a 64-bit (8 byte) little-endian value.

class PyFloatsSerDe

Bases: PyObjectSerDe

Implements a floating point encoding scheme where each value is written as a 32-bit floating point value.

class PyDoublesSerDe

Implements a floating point encoding scheme where each value is written as a 64-bit floating point value.