Working with Schemas

Schemas collect fields for object serialization.

Defining Schemas

We already know how to define schemas: subclass lima.Schema (the shortcut for lima.schema.Schema) and add fields as class attributes.

But there’s more to schemas than this. First of all – schemas are composible:

from lima import Schema, fields

class PersonSchema(Schema):
    first_name = fields.String()
    last_name = fields.String()

class AccountSchema(Schema):
    login = fields.String()
    password_hash = fields.String()

class UserSchema(PersonSchema, AccountSchema):
    pass

list(UserSchema.__fields__)
# ['first_name', 'last_name', 'login', 'password_hash']

Secondly, it’s possible to remove fields from subclasses that are present in superclasses. This is done by setting a special class attribute __lima_args__ like so:

class UserProfileSchema(UserSchema):
    __lima_args__ = {'exclude': ['last_name', 'password_hash']}

list(UserProfileSchema.__fields__)
# ['first_name', 'login']

If there’s only one field to exclude, you don’t have to put its name inside a list - lima does that for you:

class NoLastNameSchema(UserSchema):
    __lima_args__ = {'exclude': 'last_name'}  # string instead of list

list(NoLastNameSchema.__fields__)
# ['first_name', 'login', 'password_hash']

If, on the other hand, there are lots of fields to exclude, you could provide __lima_args__['only'] (Note that "exclude" and "only" are mutually exclusive):

class JustNameSchema(UserSchema):
    __lima_args__ = {'only': ['first_name', 'last_name']}

list(JustNameSchema.__fields__)
# ['first_name', 'last_name']

Warning

Having to provide "only" on Schema definition hints at bad design - why would you add a lot of fields just to remove all but one of them afterwards? Have a look at Schema Objects for the preferred way to selectively remove fields.

And finally, we can’t just exclude fields, we can include them too. So here is a user schema with fields provided via __lima_args__:

class UserSchema(Schema):
    __lima_args__ = {
        'include': {
            'first_name': fields.String(),
            'last_name': fields.String(),
            'login': fields.String(),
            'password_hash': fields.String()
        }
    }

list(UserSchema.__fields__)
# ['password_hash', 'last_name', 'first_name', 'login']

Note

It’s possible to mix and match all those features to your heart’s content. lima tries to fail early if something doesn’t add up (remember, "exclude" and "only" are mutually exclusive).

Note

The inheritance and precedence rules for fields are intuitive, but should there ever arise the need for clarification, you can read about how a schema’s fields are determined in the documentation of lima.schema.SchemaMeta.

Schema Objects

Up until now we only ever needed a single instance of a schema class to marshal the fields defined in this class. But schema objects can do more.

Providing the keyword-only argument exclude, we may exclude certain fields from being serialized.

Keyword-only arguments

Keyword-only arguments can be recognized by their position in a method/function signature: Every argument coming after the varargs argument like *args (or after a single *) is a keyword-only argument.

A function that is defined as def foo(*, x, y): pass must be called like this: foo(x=1, y=2); calling foo(1, 2) will raise a TypeError.

It is the author’s opinion that enforcing keyword arguments in the right places makes the resulting code more readable.

For more information about keyword-only arguments, see PEP 3102

import datetime
from lima import Schema, fields

# again, our model
class Person:
    def __init__(self, first_name, last_name, birthday):
        self.first_name = first_name
        self.last_name = last_name
        self.birthday = birthday

# again, our schema
class PersonSchema(Schema):
    first_name = fields.String()
    last_name = fields.String()
    date_of_birth = fields.Date(attr='birthday')

# again, our person
person = Person('Ernest', 'Hemingway', datetime.date(1899, 7, 21))

# as before, for reference
person_schema = PersonSchema()
person_schema.dump(person)
# {'date_of_birth': '1899-07-21',
#  'first_name': 'Ernest',
#  'last_name': 'Hemingway'}

birthday_schema = PersonSchema(exclude=['first_name', 'last_name'])
birthday_schema.dump(person)
# {'date_of_birth': '1899-07-21'}

The same thing can be achieved via the only keyword-only argument:

birthday_schema = PersonSchema(only='date_of_birth')
birthday_schema.dump(person)
# {'date_of_birth': '1899-07-21'}

You may have already guessed: both exclude and only take lists of field names as well as simple strings for a single field name – just like __lima_args__['exclude'] and __lima_args__['only'].

For some use cases, exclude and only save the need to define lots of almost similar schema classes.

You could also include fields on schema object creation time:

getter = lambda o: '{}, {}'.format(o.last_name, o.first_name)

schema = PersonSchema(include={'sort_name': fields.String(get=getter)})

schema.dump(person)
# {'date_of_birth': '1899-07-21',
#  'first_name': 'Ernest',
#  'last_name': 'Hemingway',
#  'sort_name': 'Hemingway, Ernest'}

Warning

Having to provide include on Schema object creation hints at bad design - why not just include the fields in the Schema itself?

Field Order

Lima marshals objects to dictionaries. Field order doesn’t matter. Unless you want it to:

person_schema = PersonSchema(ordered=True)
person_schema.dump(person)
# OrderedDict([
#     ('first_name', 'Ernest'),
#     ('last_name', 'Hemingway'),
#     ('date_of_birth', '1899-07-21')])
# ])

Just provide the keyword-only argument ordered=True to a schema’s constructor, and the resulting instance will dump ordered dictionaries.

The order of the resulting key-value-pairs reflects the order in which the fields were defined at schema definition time.

If you use __lima_args__['include'], make sure to provide an instance of collections.OrderedDict if you care about the order of those fields as well.

Fields specified via __lima_args__['include'] are inserted at the position of the __lima_args__ class attribute in the Schema class. Here is a more complex example:

from collections import OrderedDict

class FooSchema(Schema):
    one = fields.String()
    two = fields.String()

class BarSchema(FooSchema):
    three = fields.String()
    __lima_args__ = {
        'include': OrderedDict([
            ('four', fields.String()),
            ('five', fields.String())
        ])
    }
    six = fields.String()

bar_schema = BarSchema(ordered=True)

bar_schema will dump ordered dictionaries with keys ordered from one to six.

Note

For the exact rules on how a complex schema’s fields are going to be ordered, see lima.schema.SchemaMeta or have a look at the source code.

Marshalling Collections

Consider this:

persons = [
    Person('Ernest', 'Hemingway', datetime.date(1899, 7, 21)),
    Person('Virginia', 'Woolf', datetime.date(1882, 1, 25)),
    Person('Stefan', 'Zweig', datetime.date(1881, 11, 28)),
]

Instead of looping over this collection ourselves, we can ask the schema object to do this for us by specifying many=True to the schema’s constructor):

many_persons_schema = PersonSchema(only='last_name', many=True)
many_persons_schema.dump(persons)
# [{'last_name': 'Hemingway'},
#  {'last_name': 'Woolf'},
#  {'last_name': 'Zweig'}]

Schema Recap

  • You now know how to compose bigger schemas from smaller ones (inheritance of schema classes).
  • You know how to exclude certain fields from schemas (__lima_args__['exclude']).
  • You know three different ways to add fields to schemas (class attributes, __lima_args__['include'] and inheriting from other schemas).
  • You can fine-tune what gets dumped by a schema object (only and exclude keyword-only arguments)
  • You can dump ordered dictionaries (ordered=True) and you can serialize collections of objects (many=True).