Writing custom fields
Steel provides fields for interacting with a wide range of common data formats, but there will always be more in the wild than any framework could hope to cover. For those custom cases, or where you want to customize the behavior of the framework, you can create your own custom fields.
Field Structure
Fields are simply Python classes, with a few specific characteristics:
Subclass from a common ancestor
A set of common methods and properties
Field[T, D = None]
The base field class is steel.fields.base.Field[T, D]. As a generic type, it uses parameters to
describe the native Python types it handles.
Tis the native type that outside code would use to interact with this field. A field to manage strings would subclassField[str], while a field to store timestmaps would useField[datetime].
Dis another native Python type representing an underlying data type that’s being wrapped by the field. See WrappedField[T, D] for more details.
Note
D defaults to None, allowing most fields to ignore it and access the base class as
Field[T]. This is how most of the details in this document will refer to Field, unless
D is relevant for a specific use case.
Existing fields already subclass the appropriate type, so sublcassing any of them will share the same type hinting as its parent class.
Note
Python projects are well-known for their use of “duck typing”, where they can use any object
type, as long as it provides the right API. In order to more easily identify fields from other
attributes, and to make type hinting more consistent, Steel does in fact require the use of the
Field base class for all fields.
API Methods
There are five key methods that all fields are expected to provide. Their definitions here include the type information they’re expected to convey as well.
validate(value: T) -> None
Validates that a value is suitable for this field, according to its type and configuration. It miust
returns None when the provided value is valid for storage, and it should instead raise
ValidationError if there’s anything wrong with it. The message in the exception should describe
what’s wrong, so that it can be reported elsewhere.
This is a helper method, which is expected to be called by separate validation tooling or user interfaces. It is __not__ called at any point during reading or writing data. So if you want to use to ensure data integrity, be sure to call it yourself separately.
read(buffer: BufferedIOBase) -> tuple[T, int]
Reads data from a data buffer, such as a file, and returns a fully-unpacked value as well the number of bytes consumed from the buffer.
Not all data structures have a size that can be known up front, so this method can adjust the read length as necessary, based on any combination of the field and the actual data that was found. Returning the number of bytes alongside the Python value provides visibility into this behavior.
This method is also responsible for the unpacking step below. Most implementations will read the
appropriate number of bytes and pass them along to the pack() method to get a Python value to
return. But some fields may already be fully unpacked as part of the process of reading data from
the file. In these cases, there’s no need to call pack() as a separate step, and read() can
simply return the appropriate value directly.
Important
This is defined an abstract method, so unless you’re subclassing an existing discrete field, you will need to provide an implementation of this method.
unpack(value: bytes) -> T
Converts a sequence of bytes into a native Python object. The bytes provided will have already been
read from the buffer, this method simply provides a convenient way to modify how those bytes are
interpreted. In most cases, this will simply perform the inverse of the pack() method.
Important
This is defined an abstract method, so unless you’re subclassing an existing discrete field, you will need to provide an implementation of this method.
If your implementation of the read() method returns a native value without the need for an
unpacking step, this method must still exist, and can simply contain pass.
pack(value: T) -> bytes
Serializes a native Python object into a sequence of bytes, suitable for writing to the data buffer.
In most cases, this will simply perform the inverse of the unpack() method.
Important
This is defined an abstract method, so unless you’re subclassing an existing discrete field, you will need to provide an implementation of this method.
If your implementation of the write() method can write out data without the need for an
unpacking step, this method must still exist, and can simply contain pass.
write(self, value: T, buffer: BufferedIOBase) -> int
Writes out a fully-packed byte sequence representing a native Python object. Like read(), this
returns the number of bytes that were written to the buffer.
In most cases, there’s no need to override this method with a custom implementation. The pack()
method already returns a sequence of bytes suitable for writing, and the number of bytes to write
can be determined by the length of that sequence.
Tip
This method is __not__ defined as abstract, and most fields can safely rely on the base
implementation, which simply writes all the bytes returned by the pack() method.
Helper Classes
ExplicitlySizedField[T]
A specific form of Field base class that adds a size attribute. With a fixed size as part of
the field’s configuration, this class provides a default read() implementation that reads
exactly self.size bytes and passes the result straight to the unpack() method.
WrappedField[T, D]
Subclassing an existing field can provide further customization, but the subclass must still use the
same native Python type, such as all the int fields above. Sometimes you may want to use an
existing field to interact with the data buffer but interact with Python using a different type. One
example used within Steel is the Timestamp field, which stores data using an Integer field
internally, but presents a datetime object to external code.
WrappedField expands on the existing Field base class to specify two distinct data types.
T works like any other field, specifying the data type that consumers of this field will
interact with. The extra D refers to the type of the wrapped field. The actual interaction with
the data buffer will be handled by a field supplied as an wrapped_field class attribute.
In the timestamp example, T would be datetime, while D would be int. This handles
the necessary type hinting, and an Integer field would handle the interactions in code. All
that’s left is to convert between datetime and int.
class Timestamp(WrappedField[datetime, int]):
wrapped_field = Integer(size=4)
def wrap(self, value: int) -> datetime:
return datetime.fromtimestamp(value)
def unwrap(self, value: datetime) -> int:
return int(value.timestamp())
Warning
Don’t use this Timestamp field. It’s here for a useful demonstration, but the actual
implementation has more features and has a stable API.
Error Types
Steel provides two main exception types for field operations:
ConfigurationError
Raised during field initialization when configuration is invalid. Use this when:
Invalid parameters are passed to field constructors
Incompatible options are specified together
ValidationError
Raised when a value fails validation. Use this when:
Values are out of range
Required format constraints aren’t met
Data cannot be properly serialized
Best practices
Only read what you need: Be conservative when reading from the data buffer. Consuming more data than is required by the field can cause problems with other fields that need to continue reading after your field is finished. It’s also important to minimize reads on potentially large files, to keep memory usage as low as possible.
Account for partial reads: When implementing
read(), account for the possibility that the data is incomplete. If a file gets truncated, or if certain structures are corrupted, a read may not return as much as you would expect. Some data, like strings, may be able to handle this gracefully, but most will have to raise an informative exception instead.data = buffer.read(expected_size) if len(data) < expected_size: raise ValueError( f"Unexpected end of buffer: got {len(data)}, expected {expected_size}" )
Use configuration defaults sparingly: It can be tempting to provide defaults for configuration options that seem to the obvious choice, but in practice it may not be as obvious as it seems. Defaults can obscure those differences, leading users to accidentally depend on a configuration that’s unsuitable for their needs. Here are some examples:
x86 systems use little-endian byte ordering internally, and many applications will simply copy data structures from memory to files, preserving that ordering. But there are plenty of other systems with other needs, so defaulting to little-endian could make it harder for users to realize they should be making a conscious choice here.
Strings in C are null-terminated, which again is often written directly to files as a matter of convenience. But data written for other systems that don’t use C may use other formats, such as storing a string’s length __before__ the text, or may allocate a fixed number of bytes for strings, regardless of how many bytes are actually populated. Steel provides three different field types for these cases.