Skip to content

Commit

Permalink
[GIE Dev Test] Refine GIE Dev and Test (#2767)
Browse files Browse the repository at this point in the history
<!--
Thanks for your contribution! please review
https://github.com/alibaba/GraphScope/blob/main/CONTRIBUTING.md before
opening an issue.
-->

## What do these changes do?
Add the following content to the `dev_and_test.md`:
1. add steps to test GIE with vineyard store on a local machine
2. add steps to manually start the GIE services

<!-- Please give a short brief about these changes. -->

## Related issue number

<!-- Are there any issues opened that will be resolved by merging this
change? -->

Fixes

---------

Co-authored-by: longbinlai <[email protected]>
Co-authored-by: Longbin Lai <[email protected]>
  • Loading branch information
3 people authored May 30, 2023
1 parent bf8d7b0 commit e294377
Show file tree
Hide file tree
Showing 9 changed files with 352 additions and 136 deletions.
2 changes: 1 addition & 1 deletion charts/gie-standalone/config/v6d_modern_loader.json
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
{
"data_path": "$STORE_DATA_PATH/modern_graph/person.csv",
"label": "person",
"options": "header_row=true&delimiter=|"
"options": "header_row=true&delimiter=|&schema=0,1,2&column_types=,,int"
},
{
"data_path": "$STORE_DATA_PATH/modern_graph/software.csv",
Expand Down
30 changes: 12 additions & 18 deletions docs/interactive_engine/deployment.md
Original file line number Diff line number Diff line change
@@ -1,26 +1,19 @@
# Standalone Deployment for GIE

We have demonstrated [how to execute interactive queries](./getting_started.md) easily by installing Graphscope via `pip` on a local machine. However, in real-life applications, graphs are often too large to fit on a single machine. In such cases, Graphscope can be deployed on a cluster, such as a [self-managed k8s cluster](../deploy_graphscope_on_self_managed_k8s.md), for processing large-scale graphs. But you may wonder, "what if I only need the GIE engine and not the whole package that includes GAE and GLE?" This tutorial will walk you through the process of standalone deployment of GIE on a self-managed k8s cluster.
We have demonstrated [how to execute interactive queries](./getting_started.md) easily by installing Graphscope via `pip` on a local machine. However, in real-life applications, graphs are often too large to fit on a single machine. In such cases, Graphscope can be deployed on a cluster, such as a [self-managed k8s cluster](../deploy_graphscope_on_self_managed_k8s.md), for processing large-scale graphs. But you may wonder, "what if I only need the GIE engine and not the whole package of GraphScope?" This tutorial will walk you through the process of standalone deployment of GIE on a self-managed k8s cluster.

Throughout the tutorial, we assume all machines are running Linux system.
We do not guarantee that it works as smoothly as Linux on the other platform.
For your reference, we've tested the tutorial on Ubuntu 20.04.

## The K8s Cluster
If you do not have a K8s cluster to work on, don't worry. We have three simple ways for you to create one and get started with the deployment:
## Prerequisites

- Use a K8s cluster from Cloud Providers like [ACK](https://www.aliyun.com/product/kubernetes) from Alibaba Cloud.
- Create a K8s cluster using [kubeadm](https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm/).
- Create a local K8s cluster using [minikube](https://minikube.sigs.k8s.io/docs/start/):
```Bash
# Install `minikube` on your platform
# Recommend using `none` driver on a Linux machine to free from loading image to control plane.
# Check https://minikube.sigs.k8s.io/docs/handbook/pushing/ for details.
minikube start --driver=none
```
- Use a local k8s cluster in [docker desktop](https://docs.docker.com/desktop/kubernetes/).
- Kubernetes Cluster
- Python >= 3.9

To learn more about the creation of a k8s cluster, please refer to the [official guide](https://kubernetes.io/zh-cn/docs/tutorials/kubernetes-basics/create-cluster/).
To get started, you need to prepare a Kubernetes Cluster to continue.

Incase you doesn't have one, you could refer to the instruction of [create kubernetes cluster](../deployment/deploy_graphscope_on_self_managed_k8s.md#prepare-a-kubernetes-cluster).


## Deploy Your First GIE Service
Expand Down Expand Up @@ -106,9 +99,11 @@ deployment and management of applications. To deploy GIE standalone using Helm,

Download Gremlin console and unpack to your local directory.
```bash
curl -LO https://dlcdn.apache.org/tinkerpop/3.6.2/apache-tinkerpop-gremlin-console-3.6.2-bin.zip && \
unzip apache-tinkerpop-gremlin-console-3.6.2-bin.zip && \
cd apache-tinkerpop-gremlin-console-3.6.2
# if the given version (3.6.4) is not found, try to access https://dlcdn.apache.org to
# download an available version.
curl -LO https://dlcdn.apache.org/tinkerpop/3.6.4/apache-tinkerpop-gremlin-console-3.6.4-bin.zip && \
unzip apache-tinkerpop-gremlin-console-3.6.4-bin.zip && \
cd apache-tinkerpop-gremlin-console-3.6.4
```

Modify the `hosts` and `port` in `conf/remote.yaml` to the GIE Frontend Service endpoint.
Expand All @@ -134,7 +129,6 @@ deployment and management of applications. To deploy GIE standalone using Helm,
helm uninstall [YOUR_RELEASE_NAME]
```


## Using Your Own Data
Currently, a single instance of GIE can only handle one set of graph data. This means that you must
indicate which raw data should be uploaded into GIE's graph store, and all subsequent queries made
Expand Down
124 changes: 107 additions & 17 deletions docs/interactive_engine/dev_and_test.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,34 +12,124 @@ docker run --name dev -it --shm-size=4096m registry.cn-hongkong.aliyuncs.com/gra

Please refer to [Dev Environment](../development/dev_guide.md#dev-environment) to find more options to get a dev environment.

## Build Interactive Engine
## Build GIE with Vineyard Store on Local
In [GIE standalone deployment](./deployment.md), we have instructed on how to deploy GIE in a Kubenetes cluster with Vineyard store. Here, we show how to develop and test GIE with vineyard store on a local machine.

With `gs` command-line utility, you can build interactive engine of GraphScope with a single command.
Clone the ``graphscope'' repo if you do not have it.
```bash
git clone https://github.com/alibaba/graphscope
cd graphscope
```

Now you are ready to build the GIE engine (on vineyard store) with the following command:
```bash
./gs make interactive --storage-type=vineyard
```
You can find the built artifacts in `interactive_engine/assembly/target/graphscope`.

## Test GIE with Vineyard Store on Local
You could test the GIE engine on vineyard store with the following command:
```bash
# Clone a repo if needed
# git clone https://github.com/alibaba/graphscope
# cd graphscope
./gs make interactive --experimental
./gs test interactive --local --storage-type=vineyard
```

You may want to grab a cup of coffee cause this compiling will take a while, which
includes compiling the java code of GIE compiler, and the rust code of GIE engine.
You may found the built artifacts in `interactive_engine/assembly/target/graphscope.tar.gz`.
This will run end2end tests, from compiling a gremlin queries to obtaining and verifying the results from the computed engine. The test includes:
- [Tinkerpop's gremlin test](https://github.com/alibaba/GraphScope/tree/main/interactive_engine/compiler/src/main/java/com/alibaba/graphscope/gremlin/integration/suite/standard): We replicate Tinkerpop's official test suit, which is mostly based on Tinkerpop's [modern](https://tinkerpop.apache.org/docs/3.6.2/tutorials/getting-started/)
graph.
- [IR pattern test](https://github.com/alibaba/GraphScope/tree/main/interactive_engine/compiler/src/main/java/com/alibaba/graphscope/gremlin/integration/suite/pattern): In addition to Tinkerpop's official test of `match` steps, we offer extra pattern queries on modern graph.
- [LDBC test](https://github.com/alibaba/GraphScope/blob/main/interactive_engine/compiler/src/main/java/com/alibaba/graphscope/gremlin/integration/suite/ldbc): We further test GIE against the LDBC complex workloads on the LDBC social network with the scale factor (sf) 1.
Please refer to the [tutorial](./tutorial_ldbc_gremlin.md) for more information.

## Manually Start the GIE Services
A minimum set of GIE services includes a `frontend` to send Gremlin queries, and an `executor` (with vineyard) to execute those queries. The subsequent instructions outline the process of individually starting the `frontend` and `executor` to facilitate a more in-depth exploration of the engine.

## How to Test
1. First, make sure that a vineyard service is already running and a graph has been successfully loaded. Once the graph is successfully loaded into vineyard, you will obtain an `<v6d_object_id>`
for accessing the graph data.

You could easily test with the new artifacts with a single command:
````{hint}
If you are unsure about how to initiate a vineyard store, the subsequent instructions can assist you in creating a
vineyard store with a [modern graph](https://tinkerpop.apache.org/docs/3.6.2/tutorials/getting-started/).
```bash
export VINEYARD_IPC_SOCKET=/tmp/vineyard.sock
vineyardd --socket=${VINEYARD_IPC_SOCKET} --meta=local &
# load modern graph
export STORE_DATA_PATH=charts/gie-standalone/data # relative to graphscope repo
vineyard-graph-loader --config charts/gie-standalone/config/v6d_modern_loader.json
```
````

Here we set the working directory to local repo.
2. Set the `GIE_TEST_HOME` environment variable:
```bash
export GRAPHSCOPE_HOME=`pwd`
# Here the `pwd` is the root path of GraphScope repository
export GIE_TEST_HOME=interactive_engine/assembly/target/graphscope
```
See more about `GRAPHSCOPE_HOME` in [run tests](../development/how_to_test.md#run-tests)

3. Configure the `$GIE_TEST_HOME/conf/executor.vineyard.properties` file:
```bash
./gs test interactive
graph.name = GRAPH_NAME
# RPC port that executor will listen on
rpc.port = 1234

# Server ID
server.id = 0

# Total server size
server.size = 1

# ip:port separated by ','
# e.g., 1.2.3.4:1234,1.2.3.5:1234
network.servers = 127.0.0.1:11234

# This worker refers to the number of threads
pegasus.worker.num = 1

graph.type = VINEYARD

# Please replace with the actual object ID of your graph
graph.vineyard.object.id: <v6d_object_id>
```

It would download the test dataset to the `/tmp/gstest` (if not exists) and run multiple algorithms against various graphs, and compare the result with the ground truth.
4. Start the `gaia_executor`:
```bash
$GIE_TEST_HOME/bin/gaia_executor $GIE_TEST_HOME/conf/log4rs.yml $GIE_TEST_HOME/conf/executor.vineyard.properties &
```

5. Configure the `$GIE_TEST_HOME/conf/frontend.vineyard.properties` file:
```bash
## Pegasus service config
# a.k.a. thread num
pegasus.worker.num = 1
pegasus.timeout = 240000
pegasus.batch.size = 1024
pegasus.output.capacity = 16

# executor config
# ip:port separated by ','
# e.g., 1.2.3.4:1234,1.2.3.5:1234
pegasus.hosts = localhost:1234

# graph schema path
graph.schema = /tmp/<v6d_object_id>.json

## Frontend Config
frontend.service.port = 8182

# disable authentication if username or password is not set
# auth.username = default
# auth.password = default
```

6. Start the `frontend`:
```bash
java -cp ".:$GIE_TEST_HOME/lib/*" -Djna.library.path=$GIE_TEST_HOME/lib com.alibaba.graphscope.frontend.Frontend $GIE_TEST_HOME/conf/frontend.vineyard.properties &
```

With the frontend service, you can open the gremlin console and set the endpoint to
`localhost:8182`, as given [here](./deployment.md#deploy-your-first-gie-service).

7. Kill the services of `vineyardd`, `gaia_executor` and `frontend`:
```
pkill -f vineyardd
pkill -f gaia_executor
pkill -f Frontend
```
49 changes: 26 additions & 23 deletions docs/interactive_engine/faq.md
Original file line number Diff line number Diff line change
@@ -1,23 +1,26 @@
# Frequently Asked Questions (FAQs) for GIE Gremlin Usage
# FAQs for GIE Gremlin Usage

## What's the difference between Inner ID and Property ID ?

The main difference between Inner ID and Property ID is that Inner ID is a system-assigned identifier used internally by the graph engine for efficient data storage and retrieval, while Property ID is a user-defined property within a specific entity type.
The main difference between Inner ID and Property ID is that Inner ID is a system-assigned identifier used internally by the graph engine for efficient data storage and retrieval, while Property ID is a user-defined property within a specific entity type.

For example, in the LDBC (Linked Data Benchmark Council) schema, we have an entity type called 'PERSON', which has its own list of properties, consisting of 'id', 'name' and 'birthday'. In the actual storage, we maintain key-value pairs for each instance of entity type 'PERSON', and internally maintain a unique ID to differentiate each such instance. The unique ID in this context is referred to as the Inner ID, and the 'id' in the attribute list is the Property ID.

GIE Gremlin provides different approaches to query a vertex instance by its Inner ID or Property ID, similar to:
```scss
// by its inner id
g.V(1)
g.V().hasId(1)
g.V(123456)
g.V().hasId(123456)

// by its property id
// by its property id
g.V().has('id', 1)
```

For edges, we do not currently provide any approaches to query based on Inner ID, for two reasons:
- Firstly, Inner ID is internally maintained by the system and should not be exposed to users by default.
In the above case, the vertex may have a property `id` with value 1, which is mapped to a globally
unique inner id `123456`.

For edges, we do not currently provide any approaches to query based on Inner ID, for two reasons:
- Firstly, Inner ID is internally maintained by the system and should not be exposed to users by default.
- Secondly, a single edge instance may not be uniquely identified by Inner ID alone, as it typically requires a triplet such as \<src, dst, edge\>.

## How to use path expand in GIE Gremlin ?
Expand Down Expand Up @@ -63,14 +66,14 @@ g.V().hasLabel('PERSON').groupCount().by('name', 'age')
```
which is equivalent to:
```scss
SELECT
PERSON.name,
PERSON.age,
COUNT(*)
FROM
PERSON
GROUP BY
PERSON.name,
SELECT
PERSON.name,
PERSON.age,
COUNT(*)
FROM
PERSON
GROUP BY
PERSON.name,
PERSON.age
```
### group by multiple values:
Expand All @@ -83,13 +86,13 @@ g.V()
```
which is equivalent to :
```scss
SELECT
PERSON.name,
COUNT(age) AS age_cnt,
SUM(age) AS age_sum
FROM
PERSON
GROUP BY
SELECT
PERSON.name,
COUNT(age) AS age_cnt,
SUM(age) AS age_sum
FROM
PERSON
GROUP BY
name
```
Please refer to [Aggregate](https://github.com/alibaba/GraphScope/blob/main/docs/interactive_engine/supported_gremlin_steps.md#aggregate-group) for more usage.
Expand All @@ -114,4 +117,4 @@ Therefore, You can only perform subgraph operations after edge-output operators
```scss
g.V().outE().limit(10).subgraph('sub_graph').count()
```
Please refer to [Subgraph](https://github.com/alibaba/GraphScope/blob/main/docs/interactive_engine/supported_gremlin_steps.md#subgraph) for more usage.
Please refer to [Subgraph](https://github.com/alibaba/GraphScope/blob/main/docs/interactive_engine/supported_gremlin_steps.md#subgraph) for more usage.
Loading

0 comments on commit e294377

Please sign in to comment.