A Model Type encaptures the interface and schema of a concrete ML model. It acts as an adaptor between the raw ML model input/output Tensors and Spark / Pandas.
Here is the key code snippet of the sklearn classifier
model type (liga.sklearn.models.classifier
):
class Classifier(SklearnModelType):
"""Classification model type"""
def schema(self) -> str:
return "int"
def predict(self, *args: Any, **kwargs: Any) -> List[int]:
assert self.model is not None
assert len(args) == 1
return self.model.predict(args[0]).tolist()
A Flavor describes the framework upon which the model was built.
A Liga model flavor should provide:
- generate_udf
- to construct a Pandas UDF to run flavor-specific models. The special UDF `ML_PREDICT` will be translated into the generated pandas udf per flavor.
- load_model_from_uri
- to load models from filesystem URI for `FileSystemRegistry`. Because there are different ways to load a model from a filesystem URI for different ML frameworks. Model Registries like MLflow unify the way to load a model from the registry. That's why for those model registries, a URI (eg. `mlflow:///yolov5`) is sufficient.
Supported flavors:
- sklearn (provided by
liga-sklearn
) - pytorch (provided by
liga-pytorch
) - ...
A model registry specifies where and how to load a model.
Name | Pypi | URI | |
---|---|---|---|
DummyRegistry | liga |
A special registry without URI provided. How and where to load model is hard-coded in model types, eg. torchvision.models.resnet50() . |
|
FileSystemRegistry | liga |
http:/// ,file:/// ,s3:/// ,... |
|
MLflowRegistry | liga-mlflow |
mlflow:/// |
MLflowRegistry is the recommended production-ready model registry. |
Currently, only a in-memory model catalog is available in Liga. Via Model Catalog, ML enhanced-SQL users only needs focus on how to apply ML-enhanced SQL on datasets at scale. Models are carefully maintained by Data/ML Engineers or Data Scientists.
WARNING: Python API to customize the Model Catalog is not yet provided!