####################### Writing custom fields ####################### Steel provides fields for interacting with a wide range of common data formats, but there will always be more in the wild than any framework could hope to cover. For those custom cases, or where you want to customize the behavior of the framework, you can create your own custom fields. ***************** Field Structure ***************** Fields are simply Python classes, with a few specific characteristics: - Subclass from a common ancestor - A set of common methods and properties ``Field[T, D = None]`` ====================== The base field class is ``steel.fields.base.Field[T, D]``. As a generic type, it uses parameters to describe the native Python types it handles. - ``T`` is the native **type** that outside code would use to interact with this field. A field to manage strings would subclass ``Field[str]``, while a field to store timestmaps would use ``Field[datetime]``. - ``D`` is another native Python type representing an underlying **data** type that's being wrapped by the field. See :ref:`wrappedfield` for more details. .. note:: ``D`` defaults to ``None``, allowing most fields to ignore it and access the base class as ``Field[T]``. This is how most of the details in this document will refer to ``Field``, unless ``D`` is relevant for a specific use case. Existing fields already subclass the appropriate type, so sublcassing any of them will share the same type hinting as its parent class. .. note:: Python projects are well-known for their use of "duck typing", where they can use any object type, as long as it provides the right API. In order to more easily identify fields from other attributes, and to make type hinting more consistent, Steel does in fact require the use of the ``Field`` base class for all fields. API Methods =========== There are five key methods that all fields are expected to provide. Their definitions here include the type information they're expected to convey as well. ``validate(value: T) -> None`` ------------------------------ Validates that a value is suitable for this field, according to its type and configuration. It miust returns ``None`` when the provided value is valid for storage, and it should instead raise ``ValidationError`` if there's anything wrong with it. The message in the exception should describe what's wrong, so that it can be reported elsewhere. This is a helper method, which is expected to be called by separate validation tooling or user interfaces. It is __not__ called at any point during reading or writing data. So if you want to use to ensure data integrity, be sure to call it yourself separately. ``read(buffer: BufferedIOBase) -> tuple[T, int]`` ------------------------------------------------- Reads data from a data buffer, such as a file, and returns a fully-unpacked value as well the number of bytes consumed from the buffer. Not all data structures have a size that can be known up front, so this method can adjust the read length as necessary, based on any combination of the field and the actual data that was found. Returning the number of bytes alongside the Python value provides visibility into this behavior. This method is also responsible for the unpacking step below. Most implementations will read the appropriate number of bytes and pass them along to the ``pack()`` method to get a Python value to return. But some fields may already be fully unpacked as part of the process of reading data from the file. In these cases, there's no need to call ``pack()`` as a separate step, and ``read()`` can simply return the appropriate value directly. .. important:: This is defined an abstract method, so unless you're subclassing an existing discrete field, you will need to provide an implementation of this method. ``unpack(value: bytes) -> T`` ----------------------------- Converts a sequence of bytes into a native Python object. The bytes provided will have already been read from the buffer, this method simply provides a convenient way to modify how those bytes are interpreted. In most cases, this will simply perform the inverse of the ``pack()`` method. .. important:: This is defined an abstract method, so unless you're subclassing an existing discrete field, you will need to provide an implementation of this method. If your implementation of the ``read()`` method returns a native value without the need for an unpacking step, this method must still exist, and can simply contain ``pass``. ``pack(value: T) -> bytes`` --------------------------- Serializes a native Python object into a sequence of bytes, suitable for writing to the data buffer. In most cases, this will simply perform the inverse of the ``unpack()`` method. .. important:: This is defined an abstract method, so unless you're subclassing an existing discrete field, you will need to provide an implementation of this method. If your implementation of the ``write()`` method can write out data without the need for an unpacking step, this method must still exist, and can simply contain ``pass``. ``write(self, value: T, buffer: BufferedIOBase) -> int`` -------------------------------------------------------- Writes out a fully-packed byte sequence representing a native Python object. Like ``read()``, this returns the number of bytes that were written to the buffer. In most cases, there's no need to override this method with a custom implementation. The ``pack()`` method already returns a sequence of bytes suitable for writing, and the number of bytes to write can be determined by the length of that sequence. .. tip:: This method is __not__ defined as abstract, and most fields can safely rely on the base implementation, which simply writes all the bytes returned by the ``pack()`` method. **************** Helper Classes **************** ``ExplicitlySizedField[T]`` =========================== A specific form of ``Field`` base class that adds a ``size`` attribute. With a fixed size as part of the field's configuration, this class provides a default ``read()`` implementation that reads exactly ``self.size`` bytes and passes the result straight to the ``unpack()`` method. .. _wrappedfield: ``WrappedField[T, D]`` ====================== Subclassing an existing field can provide further customization, but the subclass must still use the same native Python type, such as all the `int` fields above. Sometimes you may want to use an existing field to interact with the data buffer but interact with Python using a different type. One example used within Steel is the ``Timestamp`` field, which stores data using an ``Integer`` field internally, but presents a `datetime` object to external code. ``WrappedField`` expands on the existing ``Field`` base class to specify two distinct data types. ``T`` works like any other field, specifying the data type that consumers of this field will interact with. The extra ``D`` refers to the type of the wrapped field. The actual interaction with the data buffer will be handled by a field supplied as an ``wrapped_field`` class attribute. In the timestamp example, ``T`` would be ``datetime``, while ``D`` would be ``int``. This handles the necessary type hinting, and an ``Integer`` field would handle the interactions in code. All that's left is to convert between ``datetime`` and ``int``. .. code:: python class Timestamp(WrappedField[datetime, int]): wrapped_field = Integer(size=4) def wrap(self, value: int) -> datetime: return datetime.fromtimestamp(value) def unwrap(self, value: datetime) -> int: return int(value.timestamp()) .. warning:: Don't use this ``Timestamp`` field. It's here for a useful demonstration, but the actual implementation has more features and has a stable API. ************* Error Types ************* Steel provides two main exception types for field operations: ``ConfigurationError`` ====================== Raised during field initialization when configuration is invalid. Use this when: - Invalid parameters are passed to field constructors - Incompatible options are specified together ``ValidationError`` =================== Raised when a value fails validation. Use this when: - Values are out of range - Required format constraints aren't met - Data cannot be properly serialized **************** Best practices **************** #. **Only read what you need**: Be conservative when reading from the data buffer. Consuming more data than is required by the field can cause problems with other fields that need to continue reading after your field is finished. It's also important to minimize reads on potentially large files, to keep memory usage as low as possible. #. **Account for partial reads**: When implementing ``read()``, account for the possibility that the data is incomplete. If a file gets truncated, or if certain structures are corrupted, a read may not return as much as you would expect. Some data, like strings, may be able to handle this gracefully, but most will have to raise an informative exception instead. .. code:: python data = buffer.read(expected_size) if len(data) < expected_size: raise ValueError( f"Unexpected end of buffer: got {len(data)}, expected {expected_size}" ) #. **Use configuration defaults sparingly**: It can be tempting to provide defaults for configuration options that seem to the obvious choice, but in practice it may not be as obvious as it seems. Defaults can obscure those differences, leading users to accidentally depend on a configuration that's unsuitable for their needs. Here are some examples: - x86 systems use little-endian byte ordering internally, and many applications will simply copy data structures from memory to files, preserving that ordering. But there are plenty of other systems with other needs, so defaulting to little-endian could make it harder for users to realize they should be making a conscious choice here. - Strings in C are null-terminated, which again is often written directly to files as a matter of convenience. But data written for other systems that don't use C may use other formats, such as storing a string's length __before__ the text, or may allocate a fixed number of bytes for strings, regardless of how many bytes are actually populated. Steel provides three different field types for these cases.