[GIE Dev Test] Refine GIE Dev and Test (#2767)

## What do these changes do? Add the following content to the `dev_and_test.md`: 1. add steps to test GIE with vineyard store on a local machine 2. add steps to manually start the GIE services  ## Related issue number  Fixes --------- Co-authored-by: longbinlai <[email protected]> Co-authored-by: Longbin Lai <[email protected]>
alibaba · May 30, 2023 · e294377 · e294377
1 parent bf8d7b0
commit e294377
Show file tree

Hide file tree

Showing 9 changed files with 352 additions and 136 deletions.
diff --git a/charts/gie-standalone/config/v6d_modern_loader.json b/charts/gie-standalone/config/v6d_modern_loader.json
@@ -3,7 +3,7 @@
         {
             "data_path": "$STORE_DATA_PATH/modern_graph/person.csv",
             "label": "person",
-            "options": "header_row=true&delimiter=|"
+            "options": "header_row=true&delimiter=|&schema=0,1,2&column_types=,,int"
         },
         {
             "data_path": "$STORE_DATA_PATH/modern_graph/software.csv",

diff --git a/docs/interactive_engine/deployment.md b/docs/interactive_engine/deployment.md
@@ -1,26 +1,19 @@
 # Standalone Deployment for GIE
 
-We have demonstrated [how to execute interactive queries](./getting_started.md) easily by installing Graphscope via `pip` on a local machine. However, in real-life applications, graphs are often too large to fit on a single machine. In such cases, Graphscope can be deployed on a cluster, such as a [self-managed k8s cluster](../deploy_graphscope_on_self_managed_k8s.md), for processing large-scale graphs. But you may wonder, "what if I only need the GIE engine and not the whole package that includes GAE and GLE?" This tutorial will walk you through the process of standalone deployment of GIE on a self-managed k8s cluster.
+We have demonstrated [how to execute interactive queries](./getting_started.md) easily by installing Graphscope via `pip` on a local machine. However, in real-life applications, graphs are often too large to fit on a single machine. In such cases, Graphscope can be deployed on a cluster, such as a [self-managed k8s cluster](../deploy_graphscope_on_self_managed_k8s.md), for processing large-scale graphs. But you may wonder, "what if I only need the GIE engine and not the whole package of GraphScope?" This tutorial will walk you through the process of standalone deployment of GIE on a self-managed k8s cluster.
 
 Throughout the tutorial, we assume all machines are running Linux system.
 We do not guarantee that it works as smoothly as Linux on the other platform.
 For your reference, we've tested the tutorial on Ubuntu 20.04.
 
-## The K8s Cluster
-If you do not have a K8s cluster to work on, don't worry. We have three simple ways for you to create one and get started with the deployment:
+## Prerequisites
 
-- Use a K8s cluster from Cloud Providers like [ACK](https://www.aliyun.com/product/kubernetes) from Alibaba Cloud.
-- Create a K8s cluster using [kubeadm](https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm/).
-- Create a local K8s cluster using [minikube](https://minikube.sigs.k8s.io/docs/start/):
-  ```Bash
-  # Install `minikube` on your platform
-  # Recommend using `none` driver on a Linux machine to free from loading image to control plane.
-  # Check https://minikube.sigs.k8s.io/docs/handbook/pushing/ for details.
-  minikube start --driver=none
-  ```
-- Use a local k8s cluster in [docker desktop](https://docs.docker.com/desktop/kubernetes/).
+- Kubernetes Cluster
+- Python >= 3.9
 
-To learn more about the creation of a k8s cluster, please refer to the [official guide](https://kubernetes.io/zh-cn/docs/tutorials/kubernetes-basics/create-cluster/).
+To get started, you need to prepare a Kubernetes Cluster to continue.
+
+Incase you doesn't have one, you could refer to the instruction of [create kubernetes cluster](../deployment/deploy_graphscope_on_self_managed_k8s.md#prepare-a-kubernetes-cluster).
 
 
 ## Deploy Your First GIE Service
@@ -106,9 +99,11 @@ deployment and management of applications. To deploy GIE standalone using Helm,
 
    Download Gremlin console and unpack to your local directory.
    ```bash
-   curl -LO https://dlcdn.apache.org/tinkerpop/3.6.2/apache-tinkerpop-gremlin-console-3.6.2-bin.zip && \
-   unzip apache-tinkerpop-gremlin-console-3.6.2-bin.zip && \
-   cd apache-tinkerpop-gremlin-console-3.6.2
+   # if the given version (3.6.4) is not found, try to access https://dlcdn.apache.org to
+   # download an available version.
+   curl -LO https://dlcdn.apache.org/tinkerpop/3.6.4/apache-tinkerpop-gremlin-console-3.6.4-bin.zip && \
+   unzip apache-tinkerpop-gremlin-console-3.6.4-bin.zip && \
+   cd apache-tinkerpop-gremlin-console-3.6.4
    ```
 
    Modify the `hosts` and `port` in `conf/remote.yaml` to the GIE Frontend Service endpoint.
@@ -134,7 +129,6 @@ deployment and management of applications. To deploy GIE standalone using Helm,
    helm uninstall [YOUR_RELEASE_NAME]
 ```
 
-
 ## Using Your Own Data
 Currently, a single instance of GIE can only handle one set of graph data. This means that you must
 indicate which raw data should be uploaded into GIE's graph store, and all subsequent queries made

diff --git a/docs/interactive_engine/dev_and_test.md b/docs/interactive_engine/dev_and_test.md
@@ -12,34 +12,124 @@ docker run --name dev -it --shm-size=4096m registry.cn-hongkong.aliyuncs.com/gra
 
 Please refer to [Dev Environment](../development/dev_guide.md#dev-environment) to find more options to get a dev environment.
 
-## Build Interactive Engine
+## Build GIE with Vineyard Store on Local
+In [GIE standalone deployment](./deployment.md), we have instructed on how to deploy GIE in a Kubenetes cluster with Vineyard store. Here, we show how to develop and test GIE with vineyard store on a local machine.
 
-With `gs` command-line utility, you can build interactive engine of GraphScope with a single command.
+Clone the ``graphscope'' repo if you do not have it.
+```bash
+git clone https://github.com/alibaba/graphscope
+cd graphscope
+```
+
+Now you are ready to build the GIE engine (on vineyard store) with the following command:
+```bash
+./gs make interactive --storage-type=vineyard
+```
+You can find the built artifacts in `interactive_engine/assembly/target/graphscope`.
 
+## Test GIE with Vineyard Store on Local
+You could test the GIE engine on vineyard store with the following command:
 ```bash
-# Clone a repo if needed
-# git clone https://github.com/alibaba/graphscope
-# cd graphscope
-./gs make interactive --experimental
+./gs test interactive --local --storage-type=vineyard
 ```
 
-You may want to grab a cup of coffee cause this compiling will take a while, which
-includes compiling the java code of GIE compiler, and the rust code of GIE engine.
-You may found the built artifacts in `interactive_engine/assembly/target/graphscope.tar.gz`.
+This will run end2end tests, from compiling a gremlin queries to obtaining and verifying the results from the computed engine. The test includes:
+  - [Tinkerpop's gremlin test](https://github.com/alibaba/GraphScope/tree/main/interactive_engine/compiler/src/main/java/com/alibaba/graphscope/gremlin/integration/suite/standard): We replicate Tinkerpop's official test suit, which is mostly based on Tinkerpop's [modern](https://tinkerpop.apache.org/docs/3.6.2/tutorials/getting-started/)
+  graph.
+  - [IR pattern test](https://github.com/alibaba/GraphScope/tree/main/interactive_engine/compiler/src/main/java/com/alibaba/graphscope/gremlin/integration/suite/pattern): In addition to Tinkerpop's official test of `match` steps, we offer extra pattern queries on modern graph.
+  - [LDBC test](https://github.com/alibaba/GraphScope/blob/main/interactive_engine/compiler/src/main/java/com/alibaba/graphscope/gremlin/integration/suite/ldbc): We further test GIE against the LDBC complex workloads on the LDBC social network with the scale factor (sf) 1.
+   Please refer to the [tutorial](./tutorial_ldbc_gremlin.md) for more information.
+
+## Manually Start the GIE Services
+A minimum set of GIE services includes a `frontend` to send Gremlin queries, and an `executor` (with vineyard) to execute those queries. The subsequent instructions outline the process of individually starting the `frontend` and `executor` to facilitate a more in-depth exploration of the engine.
 
-## How to Test
+1. First, make sure that a vineyard service is already running and a graph has been successfully loaded. Once the graph is successfully loaded into vineyard, you will obtain an `<v6d_object_id>`
+for accessing the graph data.
 
-You could easily test with the new artifacts with a single command:
+````{hint}
+If you are unsure about how to initiate a vineyard store, the subsequent instructions can assist you in creating a
+vineyard store with a [modern graph](https://tinkerpop.apache.org/docs/3.6.2/tutorials/getting-started/).
+
+```bash
+export VINEYARD_IPC_SOCKET=/tmp/vineyard.sock
+vineyardd --socket=${VINEYARD_IPC_SOCKET} --meta=local &
+# load modern graph
+export STORE_DATA_PATH=charts/gie-standalone/data  # relative to graphscope repo
+vineyard-graph-loader --config charts/gie-standalone/config/v6d_modern_loader.json
+```
+````
 
-Here we set the working directory to local repo.
+2. Set the `GIE_TEST_HOME` environment variable:
 ```bash
-export GRAPHSCOPE_HOME=`pwd`
-# Here the `pwd` is the root path of GraphScope repository
+export GIE_TEST_HOME=interactive_engine/assembly/target/graphscope
 ```
-See more about `GRAPHSCOPE_HOME` in [run tests](../development/how_to_test.md#run-tests)
 
+3. Configure the `$GIE_TEST_HOME/conf/executor.vineyard.properties` file:
 ```bash
-./gs test interactive
+graph.name = GRAPH_NAME
+# RPC port that executor will listen on
+rpc.port = 1234
+
+# Server ID
+server.id = 0
+
+# Total server size
+server.size = 1
+
+# ip:port separated by ','
+# e.g., 1.2.3.4:1234,1.2.3.5:1234
+network.servers = 127.0.0.1:11234
+
+# This worker refers to the number of threads
+pegasus.worker.num = 1
+
+graph.type = VINEYARD
+
+# Please replace with the actual object ID of your graph
+graph.vineyard.object.id: <v6d_object_id>
 ```
 
-It would download the test dataset to the `/tmp/gstest` (if not exists) and run multiple algorithms against various graphs, and compare the result with the ground truth.
+4. Start the `gaia_executor`:
+```bash
+$GIE_TEST_HOME/bin/gaia_executor $GIE_TEST_HOME/conf/log4rs.yml $GIE_TEST_HOME/conf/executor.vineyard.properties &
+```
+
+5. Configure the `$GIE_TEST_HOME/conf/frontend.vineyard.properties` file:
+```bash
+## Pegasus service config
+# a.k.a. thread num
+pegasus.worker.num = 1
+pegasus.timeout = 240000
+pegasus.batch.size = 1024
+pegasus.output.capacity = 16
+
+# executor config
+# ip:port separated by ','
+# e.g., 1.2.3.4:1234,1.2.3.5:1234
+pegasus.hosts = localhost:1234
+
+# graph schema path
+graph.schema = /tmp/<v6d_object_id>.json
+
+## Frontend Config
+frontend.service.port = 8182
+
+# disable authentication if username or password is not set
+# auth.username = default
+# auth.password = default
+```
+
+6. Start the `frontend`:
+```bash
+java -cp ".:$GIE_TEST_HOME/lib/*" -Djna.library.path=$GIE_TEST_HOME/lib com.alibaba.graphscope.frontend.Frontend $GIE_TEST_HOME/conf/frontend.vineyard.properties &
+```
+
+With the frontend service, you can open the gremlin console and set the endpoint to
+`localhost:8182`, as given [here](./deployment.md#deploy-your-first-gie-service).
+
+7. Kill the services of `vineyardd`, `gaia_executor` and `frontend`:
+```
+pkill -f vineyardd
+pkill -f gaia_executor
+pkill -f Frontend
+```
diff --git a/docs/interactive_engine/faq.md b/docs/interactive_engine/faq.md
@@ -1,23 +1,26 @@
-# Frequently Asked Questions (FAQs) for GIE Gremlin Usage
+# FAQs for GIE Gremlin Usage
 
 ## What's the difference between Inner ID and Property ID ?
 
-The main difference between Inner ID and Property ID is that Inner ID is a system-assigned identifier used internally by the graph engine for efficient data storage and retrieval, while Property ID is a user-defined property within a specific entity type. 
+The main difference between Inner ID and Property ID is that Inner ID is a system-assigned identifier used internally by the graph engine for efficient data storage and retrieval, while Property ID is a user-defined property within a specific entity type.
 
 For example, in the LDBC (Linked Data Benchmark Council) schema, we have an entity type called 'PERSON', which has its own list of properties, consisting of 'id', 'name' and 'birthday'. In the actual storage, we maintain key-value pairs for each instance of entity type 'PERSON', and internally maintain a unique ID to differentiate each such instance. The unique ID in this context is referred to as the Inner ID, and the 'id' in the attribute list is the Property ID.
 
 GIE Gremlin provides different approaches to query a vertex instance by its Inner ID or Property ID, similar to:
 ```scss
 // by its inner id
-g.V(1)
-g.V().hasId(1)
+g.V(123456)
+g.V().hasId(123456)
 
-// by its property id 
+// by its property id
 g.V().has('id', 1)
 ```
 
-For edges, we do not currently provide any approaches to query based on Inner ID, for two reasons: 
-- Firstly, Inner ID is internally maintained by the system and should not be exposed to users by default. 
+In the above case, the vertex may have a property `id` with value 1, which is mapped to a globally
+unique inner id `123456`.
+
+For edges, we do not currently provide any approaches to query based on Inner ID, for two reasons:
+- Firstly, Inner ID is internally maintained by the system and should not be exposed to users by default.
 - Secondly, a single edge instance may not be uniquely identified by Inner ID alone, as it typically requires a triplet such as \<src, dst, edge\>.
 
 ## How to use path expand in GIE Gremlin ?
@@ -63,14 +66,14 @@ g.V().hasLabel('PERSON').groupCount().by('name', 'age')
 ```
 which is equivalent to:
 ```scss
-SELECT 
-  PERSON.name, 
-  PERSON.age, 
-  COUNT(*) 
-FROM 
-  PERSON 
-GROUP BY 
-  PERSON.name, 
+SELECT
+  PERSON.name,
+  PERSON.age,
+  COUNT(*)
+FROM
+  PERSON
+GROUP BY
+  PERSON.name,
   PERSON.age
 ```
 ### group by multiple values:
@@ -83,13 +86,13 @@ g.V()
 ```
 which is equivalent to :
 ```scss
-SELECT 
-  PERSON.name, 
-  COUNT(age) AS age_cnt, 
-  SUM(age) AS age_sum 
-FROM 
-  PERSON 
-GROUP BY 
+SELECT
+  PERSON.name,
+  COUNT(age) AS age_cnt,
+  SUM(age) AS age_sum
+FROM
+  PERSON
+GROUP BY
   name
 ```
 Please refer to [Aggregate](https://github.com/alibaba/GraphScope/blob/main/docs/interactive_engine/supported_gremlin_steps.md#aggregate-group) for more usage.
@@ -114,4 +117,4 @@ Therefore, You can only perform subgraph operations after edge-output operators
 ```scss
 g.V().outE().limit(10).subgraph('sub_graph').count()
 ```
-Please refer to [Subgraph](https://github.com/alibaba/GraphScope/blob/main/docs/interactive_engine/supported_gremlin_steps.md#subgraph) for more usage.
+Please refer to [Subgraph](https://github.com/alibaba/GraphScope/blob/main/docs/interactive_engine/supported_gremlin_steps.md#subgraph) for more usage.