Serialize/Deserialize (SerDe)
A SerDe is a class used to serialize items sketches to a bytes
object in binary.
Several example SerDes are provided as references.
The use of binary-compatible SerDes in different languages is critical for cross-language compatibility.
Each implementation must extend the PyObjectSerDe
class and override all three of its methods.
- class PyObjectSerDe(*args, **kwargs)
An abstract base class for serde objects. All custom serdes must extend this class.
- get_size(self, item: object) int
Returns the size in bytes of an item
- Parameters:
item (object) – The specified object
- Returns:
The size of the item in bytes
- Return type:
int
- to_bytes(self, item: object) bytes
Retuns a bytes object with a serialized version of an item
- Parameters:
item (object) – The specified object
- Returns:
A
bytes
object with the serialized object- Return type:
bytes
- from_bytes(self, data: bytes, offset: int) tuple
Reads a bytes object starting from the given offest and returns a tuple of the reconstructed object and the number of additional bytes read
- Parameters:
data (bytes) – A
bytes
object from which to deserializeoffset (int) – The offset, in bytes, at which to start reading
- Returns:
A
tuple
with the reconstructed object and the number of bytes read- Return type:
tuple(object, int)
The provided SerDes are:
- class PyStringsSerDe(*args, **kwargs)
Bases:
PyObjectSerDe
Implements a simple string-encoding scheme where a string is written as <num_bytes> <string_contents>, with no null termination. This format allows pre-allocating each string, at the cost of additional storage. Using this format, the serialized string consumes
4 + len(item)
bytes.
- class PyIntsSerDe(*args, **kwargs)
Bases:
PyObjectSerDe
Implements an integer encoding scheme where each integer is written as a 32-bit (4 byte) little-endian value.
- class PyLongsSerDe(*args, **kwargs)
Bases:
PyObjectSerDe
Implements an integer encoding scheme where each integer is written as a 64-bit (8 byte) little-endian value.
- class PyFloatsSerDe(*args, **kwargs)
Bases:
PyObjectSerDe
Implements a floating point encoding scheme where each value is written as a 32-bit floating point value.
- class PyDoublesSerDe(*args, **kwargs)
Implements a floating point encoding scheme where each value is written as a 64-bit floating point value.