Skip to content

rmcomplexity/dataclasses-validation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Python Dataclasses Validation

Validation for dataclasses. No dependencies. Field-specific configuration.

Dataclasses are powerful, but we still need to validate incoming data. Validation libraries make you either subclass a 3rd party class or use a schema class. Now, you can easily validate fields in dataclasses by using field-specific validation.

#f03c15 This is a work in progress.

Please check the project board to see pendining tasks in case there isn't a proper release yet.

Example:

import logging
from dataclasses import dataclass, field, asdict
from typing import Optional
from enum import Enum
from dcv.fields import TextField, IntField


logging.basicConfig(level=logging.INFO)


@dataclass
class User:
    # trailing/leading blank spaces will be removed
    name: str = TextField(min_length=1, trim=True)

    # 'last_name' can be None or an empty string.
    # Optional fields have a default value of None.
    last_name: Optional[str] = TextField(min_length=1, trim=" ", optional=True, blank=True)

    # A user cannot be born before 1800. Time travelers are not considered here :(.
    year_of_birth: Optional[int] = IntField(gt=1800, optional=True)

    # 'opt_out' has a default value of "Yes", uses a regex
    # and it's not used in __init__
    opt_out: str = field(default=TextField(default="Yes", regex="(Yes|No)"), init=False)

# Insantiation without any issues
>>> user = User(name="Josué", last_name="Balandrano", year_of_birth=1985)
>>> logging.info(user)
... INFO:root:User(name="Josué", last_name="Balandrano", opt_out="Yes")

# We get a ValueError if we try to set an invalid value on a non-init attr.
>>> user.opt_out = "Maybe"
... ValueError: 'opt_out' does not match regex: (Yes|No) .

# We automatically have serialization with dataclasses
>>> asdict(user)
... {'name': 'Josué', 'last_name': 'Balandrano', 'opt_out': 'Yes'}

# We get a ValueError if an invalid value is used on init
>>> User(name = "", last_name="Balandrano", year_of_birth=1755)
... ValueError: 'name' cannot be blank.
>>> User(name = "Josué", last_name="Balandrano", year_of_birth=1775)
... ValueError: 'year_of_birth' value '1775' must be greater than 1800.

Features of dcv

  • Works with dataclasses out of the box.
  • Validation is implemented in descriptors and not in the class.
  • Validation happens when a value is assigned to an attribute, could be on __init__ or afterwards.
  • Easily nest objects simply by using more dataclasses.
  • No need to sublcass anything.
  • No need to create another class to define the schema.
  • Basic runtime type hint checking.

Rationale

Current validation libraries (like pydantic) modify classes to be aware of the data that is being stored on each instance. Some other libraries(like marshmallow) makes you use a schema (specialized class) for validation and data storage.

Python descriptors give us the power to specify how data is looked up, stored and deleted. And this is seameless to the main class. Python dataclasses are powerfull classes tailored to hold data. dcv implementation leverages descriptors and dataclasses to implement a less obtrusive validation and to be able to specify which fields will be validated instead of having a one-or-nothing solution.

Runtime type hint checking

dcv checks typehints in two instances.

First, when a field is instantiated and assigned to a dataclass field. The type hint used in the dataclass field will be used to make sure it matches the dcv field supported TYPES.

Second, when a value is assigned to a dataclass attribute managed by a dcv field. This could happen on __init__ or afterwards.

A type hint matches a dcv field if the origin of the type hint is present in the Field.TYPES class variable or if the origin is a subclass of an object present in the Field.TYPES class variable. The origin is retrieved by using typing.get_origin

If the origin cannot be retrieved then it means the type hint is a Generic container e.g. Optional, Union, etc. In this case the arguments of the type hint are checked against the objects in the Field.TYPES tuple.

Examples

  • field_name: str - Will check if any object in Field.TYPES is str or a subclass of str.
  • field_name: Optional[str] - Optional will be discarded and str will be used to check values.
  • field_name: List[str] - list will be used to check values.
  • field_name: Optional[List[int]] - list` will be used to check values.

Available Fields

Name Types Supported Implemented Parent Field
TextField str, bytes ✔️ Yes Field
NumberField int, float, complex, Decimal ✔️ Yes Field
IntField int ✔️ Yes NumberField
FloatField float ✔️ Yes NumberField
ComplexField complex ✔️ Yes NumberField
DecimalField Decimal ✔️ Yes NumberField
EnumField Enum ✔️ Yes Field
BooleanField bool ✔️ Yes Field
DateTimeBaseField date, time, datetime, timedelta ✔️ Yes Field
DateField date ✔️ Yes DateTimeBaseField
TimeField time ✔️ Yes DateTimeBaseField
DateTimeField datetime ✔️ Yes DateTimeBaseField
TimeDeltaField timedelta ✔️ Yes DateTimeBaseField
ContianerField collections.abc.Container ❌ No
SequenceField collections.abc.Sequence ❌ No
SetField collections.abc.Set ❌ No
MappingField collections.abc.Mapping ❌ No

Custom Fields

Subclassing existing field

Custom fields can be created by subclassing any of the existing ones. This is recommended when you want to have the same functionality but check for another specific value type.

For instance, you might want to validate a date field but you want to use another library and not python's datetime:

from dcv.fields import DateTimeField
from arrow import arrow

class ArrowDTField(DateTimeField):
    TYPES = (arrow.Arrow,)

Subclassing abstract Field

You can also subclass the Field abstract class which already implements everything a field validation descriptor needs. The only required method to implement is validate which accepts the value being set:

from dcv.fields.abstract import Field
from app.models import User

class UserField(Field):
    TYPES = (User,)

    def validate(self, value: User) -> None:
        validate_user(value)

Future Work

Check the project board for in-flight and future work.

If you have a specific question or request, please create a github issue.

About

Validation for dataclasses implemented as descriptors

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages