diff --git a/README.md b/README.md index 4346ac26..aaba32b7 100644 --- a/README.md +++ b/README.md @@ -545,8 +545,8 @@ The official documentation of GNES is hosted on [doc.gnes.ai](https://doc.gnes.a > 🚧 Tutorial is still under construction. Stay tuned! Meanwhile, we sincerely welcome you to contribute your own learning experience / case study with GNES! -- [How to write your GNES YAML config](tutorials/gnes-yaml-specifications.md) -- How to write a component-wise YAML config +- [How to write your GNES YAML config](tutorials/gnes-compose-yaml-spec.md) +- [How to write a component-wise YAML config](tutorials/component-yaml-spec.md) - Understanding preprocessor, encoder, indexer and router - Index and query text data with GNES - Index and query image data with GNES diff --git a/docs/conf.py b/docs/conf.py index 242842b6..d4c48f5a 100644 --- a/docs/conf.py +++ b/docs/conf.py @@ -55,7 +55,6 @@ 'sphinxcontrib.apidoc', 'sphinxarg.ext', 'recommonmark', - 'sphinx_markdown_tables', ] diff --git a/docs/index.rst b/docs/index.rst index 7a0408cc..bad5ca2f 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -83,9 +83,6 @@ Tutorials 🚧 Tutorial is still under construction. Stay tuned! Meanwhile, we sincerely welcome you to contribute your own learning experience / case study with GNES! -Miscs ------ - .. toctree:: :maxdepth: 1 :caption: Miscs diff --git a/docs/requirements.txt b/docs/requirements.txt index b9d3f9f0..1f36fde0 100644 --- a/docs/requirements.txt +++ b/docs/requirements.txt @@ -1,3 +1,2 @@ sphinx-argparse -sphinxcontrib-apidoc -sphinx-markdown-tables \ No newline at end of file +sphinxcontrib-apidoc \ No newline at end of file diff --git a/tutorials/component-yaml-spec.md b/tutorials/component-yaml-spec.md new file mode 100644 index 00000000..a6c6db74 --- /dev/null +++ b/tutorials/component-yaml-spec.md @@ -0,0 +1,290 @@ +# How to write a component-wise YAML config + +YAML is everywhere. This is pretty much your impression when first trying GNES. Understanding the YAML config is therefore extremely important to use GNES. + +Essentially, GNES requires two types of YAML config: +- [GNES-compose YAML](gnes-compose-yaml-spec.md) +- Component-wise YAML + +![](./img/mermaid-diagram-20190726180826.svg) + +All other YAML files, including the docker-compose YAML config and Kubernetes config generated from the [GNES Board](https://board.gnes.ai) or `gnes compose` command are not a part of this tutorial. Interested readers are welcome to read their [YAML specification](https://docs.docker.com/compose/compose-file/) respectively. + + +## Table of Content + +* [Component-wise YAML specification](#component-wise-yaml-specification) +* [`!CLS` specification](#--cls--specification) +* [`parameter` specification](#-parameter--specification) + - [Use `args` and `kwargs` to simplify the constructor](#use--args--and--kwargs--to-simplify-the-constructor) +* [`gnes_config` specification](#-gnes-config--specification) +* [Every component can be described with YAML in GNES](#every-component-can-be-described-with-yaml-in-gnes) +* [Stack multiple encoders into a `PipelineEncoder`](#stack-multiple-encoders-into-a--pipelineencoder-) +* [What's Next?](#what-s-next-) + + + +## Component-wise YAML specification + +Preprocessor, encoder, indexer and router are fundamental components of GNES. They share the same YAML specification. The component-wise YAML defines how a component behaves. On the highest level, it contains three field: + +|Argument| Type | Description| +|---|---|---| +| `!CLS` | str | choose from all class names registered in GNES | +| `parameter` | map/dict | a list of key-value pairs that `CLS.__init__()` accepts| +| `gnes_config`| map/dict | a list of key-value pairs for GNES | + +Let's take a look an example: + +```yaml +!BasePytorchEncoder +parameter: + model_dir: ${VGG_MODEL} + model_name: vgg16 + layers: + - features + - avgpool + - x.view(x.size(0), -1) + - classifier[0] +gnes_config: + is_trained: true + name: my-awesome-vgg +``` + +In this example, we define a `BasePytorchEncoder` that loads a pretrained VGG16 model from the path`${VGG_MODEL}`. We then label this component as trained via `is_trained: true` and set its name to `my-awesome-vgg`. + +## `!CLS` specification + +`!CLS` is a name tag choosed from all class names registered in GNES. Currently, the following names are available: + +|`!CLS`| Component Type | +|---|---| +|`!BasePreprocessor`|Preprocessor| +|`!TextPreprocessor`|Preprocessor| +|`!BaseImagePreprocessor`|Preprocessor| +|`!BaseTextPreprocessor`|Preprocessor| +|`!BaseSlidingPreprocessor`|Preprocessor| +|`!VanillaSlidingPreprocessor`|Preprocessor| +|`!WeightedSlidingPreprocessor`|Preprocessor| +|`!SegmentPreprocessor`|Preprocessor| +|`!BaseUnaryPreprocessor`|Preprocessor| +|`!BaseVideoPreprocessor`|Preprocessor| +|`!FFmpegPreprocessor`|Preprocessor| +|`!ShotDetectPreprocessor`|Preprocessor| +|`!BertEncoder`|Encoder| +|`!BertEncoderWithServer`|Encoder| +|`!BertEncoderServer`|Encoder| +|`!ElmoEncoder`|Encoder| +|`!FlairEncoder`|Encoder| +|`!GPTEncoder`|Encoder| +|`!GPT2Encoder`|Encoder| +|`!PCALocalEncoder`|Encoder| +|`!PQEncoder`|Encoder| +|`!TFPQEncoder`|Encoder| +|`!Word2VecEncoder`|Encoder| +|`!BaseEncoder`|Encoder| +|`!BaseBinaryEncoder`|Encoder| +|`!BaseTextEncoder`|Encoder| +|`!BaseNumericEncoder`|Encoder| +|`!CompositionalEncoder`|Encoder| +|`!PipelineEncoder`|Encoder| +|`!HashEncoder`|Encoder| +|`!BasePytorchEncoder`|Encoder| +|`!TFInceptionEncoder`|Encoder| +|`!CVAEEncoder`|Encoder| +|`!FaissIndexer`|Indexer| +|`!LVDBIndexer`|Indexer| +|`!AsyncLVDBIndexer`|Indexer| +|`!NumpyIndexer`|Indexer| +|`!BIndexer`|Indexer| +|`!HBIndexer`|Indexer| +|`!JointIndexer`|Indexer| +|`!BaseIndexer`|Indexer| +|`!BaseTextIndexer`|Indexer| +|`!AnnoyIndexer`|Indexer| +|`!BaseRouter`|Router| +|`!BaseMapRouter`|Router| +|`!BaseReduceRouter`|Router| +|`!ChunkReduceRouter`|Router| +|`!DocReduceRouter`|Router| +|`!ConcatEmbedRouter`|Router| +|`!PublishRouter`|Router| +|`!DocBatchRouter`|Router| + +## `parameter` specification + +The key-value pair defined in `parameter` is basically a map of the arguments defined in the constructor of `!CLS`. Let's look at the signature of the constructor `BasePytorchEncoder` as an example: + +
__init__() | YAML config | +
---|---|
+
+def __init__(self, model_name: str,
+ layers: List[str],
+ model_dir: str,
+ batch_size: int = 64,
+ use_cuda: bool = False,
+ *args, **kwargs):
+ # do model init...
+ # ...
+
+ |
+
+
+!BasePytorchEncoder
+parameter:
+ model_dir: ${VGG_MODEL}
+ model_name: vgg16
+ layers:
+ - features
+ - avgpool
+ - x.view(x.size(0), -1)
+ - classifier[0]
+
+ |
+
__init__() | YAML config | +
---|---|
+
+class BertEncoder(BaseTextEncoder):
+ store_args_kwargs = True
+ def __init__(self, *args, **kwargs):
+ super().__init__(*args, **kwargs)
+ self.bert_client = BertClient(*args, **kwargs)
+
+ |
+
+
+!BertEncoder
+parameter:
+ kwargs:
+ port: $BERT_CI_PORT
+ port_out: $BERT_CI_PORT_OUT
+ ignore_all_checks: true
+gnes_config:
+ is_trained: true
+
+ |
+
+ + + +
+ +## `gRPCFrontend` and `Router`, why are they in my graph? + +Careful readers may notice that `gRPCFrontend` and `Router` components may be added to the workflow graph, even though they are not defined in the YAML file. Here is the explanation: +- `gRPCFrontend` serves as **the only interface** between GNES and the outside. All data must be sent to it and all results will be returned from it, which likes a hole on the black-box. Its data-flow pattern and the role it's playing in GNES is *so deterministic* that we don't even want to bother users to define it. +- Put simply, `Router` forwards messages. It is often required when `replicas` > 1. However, the behavior of a router depends on the topology and the runtime (i.e. training, indexing and querying). Sometimes it serves as a mapper, other times it serves as a reducer or an aggregator, or even not required. In general, it might not be very straightforward for beginners to choose the right router. Fortunately, the type of the router can often be determined by the two consecutive layers, which is exactly what GNES Board (`gnes compose`) does. ## What's Next? -The GNES-compose YAML describes a high-level picture of the GNES topology, the detailed specification of each component is defined in `yaml_path` respectively, namely the *component-wise YAML config*. In the next tutorial, you will learn how to write a component-wise YAML config. +The GNES-compose YAML describes a high-level picture of the GNES topology. Having it only is not enough. The detailed specification of each component is defined in `yaml_path` respectively, namely the *component-wise YAML config*. In the next tutorial, you will learn how to write a component-wise YAML config. diff --git a/tutorials/img/mermaid-diagram-20190726180826.svg b/tutorials/img/mermaid-diagram-20190726180826.svg new file mode 100644 index 00000000..de5d3ec7 --- /dev/null +++ b/tutorials/img/mermaid-diagram-20190726180826.svg @@ -0,0 +1,394 @@ + \ No newline at end of file diff --git a/tutorials/img/mermaid-diagram-20190726183010.svg b/tutorials/img/mermaid-diagram-20190726183010.svg new file mode 100644 index 00000000..867bac94 --- /dev/null +++ b/tutorials/img/mermaid-diagram-20190726183010.svg @@ -0,0 +1,394 @@ + \ No newline at end of file diff --git a/tutorials/img/mermaid-diagram-20190726183216.svg b/tutorials/img/mermaid-diagram-20190726183216.svg new file mode 100644 index 00000000..c506b4e7 --- /dev/null +++ b/tutorials/img/mermaid-diagram-20190726183216.svg @@ -0,0 +1,394 @@ + \ No newline at end of file