Serialize/Deserialize (SerDe)

A SerDe is a class used to serialize items sketches to a bytes object in binary. Several example SerDes are provided as references.

The use of binary-compatible SerDes in different languages is critical for cross-language compatibility.

Each implementation must extend the PyObjectSerDe class and override all three of its methods.

class PyObjectSerDe(*args, **kwargs)

An abstract base class for serde objects. All custom serdes must extend this class.

get_size(self, item: object) → int

Returns the size in bytes of an item

Parameters:: item (object) – The specified object
Returns:: The size of the item in bytes
Return type:: int

to_bytes(self, item: object) → bytes

Retuns a bytes object with a serialized version of an item

Parameters:: item (object) – The specified object
Returns:: A bytes object with the serialized object
Return type:: bytes

from_bytes(self, data: bytes, offset: int) → tuple

Reads a bytes object starting from the given offest and returns a tuple of the reconstructed object and the number of additional bytes read

Parameters:

data (bytes) – A bytes object from which to deserialize
offset (int) – The offset, in bytes, at which to start reading

Returns:

A tuple with the reconstructed object and the number of bytes read

Return type:

tuple(object, int)

The provided SerDes are:

class PyStringsSerDe(*args, **kwargs)

Bases: PyObjectSerDe

Implements a simple string-encoding scheme where a string is written as <num_bytes> <string_contents>, with no null termination. This format allows pre-allocating each string, at the cost of additional storage. Using this format, the serialized string consumes 4 + len(item) bytes.

class PyIntsSerDe(*args, **kwargs)

Bases: PyObjectSerDe

Implements an integer encoding scheme where each integer is written as a 32-bit (4 byte) little-endian value.

class PyLongsSerDe(*args, **kwargs)

Bases: PyObjectSerDe

Implements an integer encoding scheme where each integer is written as a 64-bit (8 byte) little-endian value.

class PyFloatsSerDe(*args, **kwargs)

Bases: PyObjectSerDe

Implements a floating point encoding scheme where each value is written as a 32-bit floating point value.

class PyDoublesSerDe(*args, **kwargs): Implements a floating point encoding scheme where each value is written as a 64-bit floating point value.