Skip to content

Commit

Permalink
[GIE Compiler] Introduce cypher service to accept queries from neo4j …
Browse files Browse the repository at this point in the history
…ecosystem (#2848)

<!--
Thanks for your contribution! please review
https://github.com/alibaba/GraphScope/blob/main/CONTRIBUTING.md before
opening an issue.
-->

## What do these changes do?
1. introdure `GraphServer` which wrappers `IrGremlinServer` (gremlin
service) and `CommunityBootstrapper` (cypher service)
2. remove cypher service from gremlin stack
3. add document of neo4j ecosystem

<!-- Please give a short brief about these changes. -->

## Related issue number

<!-- Are there any issues opened that will be resolved by merging this
change? -->

#2598

---------

Co-authored-by: siyuan0322 <[email protected]>
Co-authored-by: Longbin Lai <[email protected]>
Co-authored-by: longbinlai <[email protected]>
  • Loading branch information
4 people authored Jun 20, 2023
1 parent 7637aaf commit c1007cf
Show file tree
Hide file tree
Showing 52 changed files with 2,029 additions and 861 deletions.
2 changes: 1 addition & 1 deletion charts/gie-standalone/templates/frontend/statefulset.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -96,7 +96,7 @@ spec:
cd /opt/graphscope/interactive_engine/compiler && ./set_properties.sh
java -cp ".:./target/libs/*:./target/compiler-0.0.1-SNAPSHOT.jar" \
-Djna.library.path=../executor/ir/target/release \
-Dgraph.schema=/etc/groot/config/$GRAPH_SCHEMA com.alibaba.graphscope.gremlin.service.GraphServiceMain
-Dgraph.schema=/etc/groot/config/$GRAPH_SCHEMA com.alibaba.graphscope.GraphServer
{{- end }}
env:
- name: GAIA_RPC_PORT
Expand Down
2 changes: 2 additions & 0 deletions charts/graphscope-store/templates/configmap.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,8 @@ data:
## Frontend Config
gremlin.server.port=12312
## disable neo4j when launching groot server by default
neo4j.bolt.server.disabled=true
executor.worker.per.process={{ .Values.executorWorkerPerProcess }}
executor.query.thread.count={{ .Values.executorQueryThreadCount }}
Expand Down
1 change: 1 addition & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,7 @@ and the vineyard store that offers efficient in-memory data transfers.
interactive_engine/getting_started
interactive_engine/deployment
interactive_engine/tinkerpop_eco
interactive_engine/neo4j_eco
.. interactive_engine/guide_and_examples
interactive_engine/design_of_gie
.. interactive_engine/supported_gremlin_steps
Expand Down
52 changes: 52 additions & 0 deletions docs/interactive_engine/neo4j/cypher_sdk.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
# GIE for Cypher
This document will provide you with step-by-step guidance on how to connect your Cypher applications to the GIE's
FrontEnd service, which offers functionalities similar to the official Tinkerpop service.

Your first step is to obtain the Bolt Connector of GIE Frontend service:
- Follow the [instruction](./dev_and_test.md#manually-start-the-gie-services) while starting GIE on a local machine.

## Connecting via Python Driver

GIE makes it easy to connect to a loaded graph with Neo4j's [Python Driver]](https://pypi.org/project/neo4j/).

You first install the dependency:
```bash
pip3 install neo4j
```

Then connect to the service and run queries:

```Python
from neo4j import GraphDatabase, RoutingControl

URI = "neo4j://localhost:7687" # the bolt connector you've obtained
AUTH = ("", "") # We have not implemented authentication yet

def print_top_10(driver):
records, _, _ = driver.execute_query(
"MATCH (n) RETURN n Limit 10",
routing_=RoutingControl.READ,
)
for record in records:
print(record["n"])


with GraphDatabase.driver(URI, auth=AUTH) as driver:
print_top_10(driver)
```


## Connecting via Cypher-Shell
1. Download and extract `cypher-shell`
```bash
wget https://dist.neo4j.org/cypher-shell/cypher-shell-4.4.19.zip
unzip cypher-shell-4.4.19.zip && cd cypher-shell
```
2. Connect to the Bolt Connector
```bash
./cypher-shell -a neo4j://localhost:7687
```
3. Run Queries
```bash
@neo4j> Match (n) Return n Limit 10;
```
123 changes: 123 additions & 0 deletions docs/interactive_engine/neo4j/supported_cypher.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,123 @@
# Cypher Support
This document outlines the current capabilities of GIE in supporting Neo4j's Cypher queries and
compares them to the [syntax](https://neo4j.com/docs/cypher-manual/current/syntax/) specified in Neo4j.
While our goal is to comply with Neo4j's syntax, GIE currently has some limitations.
One major constraint is that we solely support the **read** path in Cypher.
Therefore, functionalities associated with writing, such as adding vertices/edges or modifying their properties, remain **unaddressed**.

We provide in-depth details regarding Cypher's support in GIE, mainly including data types, operators and clauses.
We further highlight planned features that we intend to offer in the near future.
While all terminologies, including data types, operators, and keywords in clauses, are case-insensitive in this document, we use capital and lowercase letters for the terminologies of Neo4j and GIE, respectively, to ensure clarity.

## Data Types
As [Neo4j](https://neo4j.com/docs/cypher-manual/current/values-and-types), we have provided support for
data value of types in the categories of **property**, **structural** and **constructed**.
However, the specific data types that we support are slightly modified from those in Cypher to ensure compatibility with our storage system. Further details will be elaborated upon.

### Property Types
The available data types stored in the vertices (equivalent of nodes in Cypher) and edges (equivalent of relationships in Cypher), known as property types, are divided into several categories including Boolean, Integer, Float, String, Bytes, Placeholder and Temporal. These property types are extensively utilized and can be commonly utilized in queries and as parameters -- making them the most commonly used data types.

| Category | Cypher Type | GIE Type | Supported | Todo |
|:---|:---|:---|:---:|:---|
| Boolean | BOOLEAN | bool | <input type="checkbox" disabled checked /> | |
| Integer | INTEGER | int32/uint32/int64/uint64 | <input type="checkbox" disabled checked /> | |
| Float | FLOAT | float/double | <input type="checkbox" disabled checked /> | |
| String | STRING | string | <input type="checkbox" disabled checked /> | |
| Bytes| BYTE_ARRAY | bytes | <input type="checkbox" disabled checked /> | |
| Placeholder | NULL | none | <input type="checkbox" disabled /> | Planned |
| Temporal | DATE | date | <input type="checkbox" disabled /> | Planned |
| Temporal | DATETIME (ZONED) | datetime (Zoned) | <input type="checkbox" disabled /> | Planned |
| Temporal | TIME (ZONED) | time (Zoned) | <input type="checkbox" disabled /> | Planned |

### Structural types
In a graph, Structural Types are the first-class citizens and are comprised of the following:
- Vertex: It encodes the information of a particular vertex in the graph. The information includes the id, label, and a map of properties. However, it is essential to note that multiple labels in a vertex are currently unsupported in GIE.
- Edge: It encodes the information of a particular edge in the graph. The information comprises the id, edge label, a map of properties, and a pair of vertex ids that refer to source/destination vertices.
- Path: It encodes the alternating sequence of vertices and conceivably edges while traversing the graph.

|Category | Cypher Type | GIE Type | Supported | Todo |
|:---|:---|:---|:---:|:---|
|Graph | NODE | vertex | <input type="checkbox" disabled checked /> | |
|Graph | RELATIONSHIP | edge | <input type="checkbox" disabled checked /> | |
|Graph | PATH | path | <input type="checkbox" disabled checked /> | |

### Constructed Types
Constructed types mainly include the categories of Array and Map.

| Category | Cypher Type | GIE Type | Supported | Todo |
|:---|:---|:---|:---:|:---|
| Array | LIST<INNER_TYPE> | int32/int64/double/string/pair Array | <input type="checkbox" disabled checked /> | |
| Map | MAP | N/A | <input type="checkbox" disabled />| only used in Vertex/Edge |

## Operators
We list GIE's support of the operators in the categories of Aggregation, Property, Mathematical,
Comparison, String and Boolean. Examples and functionalities of these operators are the same
as in [Neo4j](https://neo4j.com/docs/cypher-manual/current/syntax/operators/).
Note that some Aggregator operators, such as `max()`, we listed here are implemented in Neo4j as
[functions](https://neo4j.com/docs/cypher-manual/current/functions/). We have not introduced functions at this moment.


| Category | Description | Cypher Operation | GIE Operation | Supported | Todo |
|:---|:----|:---|:----|:---:|:---|
| Aggregate | Average value | AVG() | avg() | <input type="checkbox" disabled checked /> | |
| Aggregate | Minimum value | MIN() | min() | <input type="checkbox" disabled checked /> | |
| Aggregate | Maximum value |MAX() | max() | <input type="checkbox" disabled checked /> | |
| Aggregate | Count the elements |COUNT() | count() | <input type="checkbox" disabled checked /> | |
| Aggregate | Count the distinct elements | COUNT(DISTINCT) | count(distinct) | <input type="checkbox" disabled checked /> | |
| Aggregate | Summarize the value | SUM() | sum() | <input type="checkbox" disabled checked /> | |
| Aggregate | Collect into a list | COLLECT() | collect() | <input type="checkbox" disabled checked /> | |
| Aggregate | Collect into a set | COLLECT(DISTINCT) | collect(distinct) | <input type="checkbox" disabled checked /> | |
| Property | Get property of a vertex/edge | [N\|R]."KEY" | [v\|e]."key" | <input type="checkbox" disabled checked /> | |
| Mathematical | Addition | + | + | <input type="checkbox" disabled checked /> | |
| Mathematical | Subtraction | - | - | <input type="checkbox" disabled checked /> | |
| Mathematical | Multiplication | * | * | <input type="checkbox" disabled checked /> | |
| Mathematical | Division | / | / | <input type="checkbox" disabled checked /> | |
| Mathematical | Modulo division | % | % | <input type="checkbox" disabled checked /> | |
| Mathematical | Exponentiation | ^ | ^^ | <input type="checkbox" disabled checked /> | |
| Comparison | Equality | = | = | <input type="checkbox" disabled checked /> | |
| Comparison | Inequality| <> | <> | <input type="checkbox" disabled checked /> | |
| Comparison | Less than | < | < | <input type="checkbox" disabled checked /> | |
| Comparison | Less than or equal | <= | <= | <input type="checkbox" disabled checked /> | |
| Comparison | Greater than | > | > | <input type="checkbox" disabled checked /> | |
| Comparison | Greater than or equal | >= | >= | <input type="checkbox" disabled checked /> | |
| Comparison | Verify as `NULL`| IS NULL | is null | <input type="checkbox" disabled /> | planned |
| Comparison | Verify as `NOT NULL`| IS NOT NULL | is not null | <input type="checkbox" disabled /> | planned |
| Comparison | String starts with | STARTS WITH | starts with | <input type="checkbox" disabled />| planned |
| Comparison | String ends with | ENDS WITH | ends with | <input type="checkbox" disabled />| planned |
| Comparison | String contains | CONTAINS | contains | <input type="checkbox" disabled />| planned |
| Boolean | Conjunction | AND | and | <input type="checkbox" disabled checked /> | |
| Boolean | Disjunction | OR | or | <input type="checkbox" disabled checked /> | |
| Boolean | Exclusive Disjunction | XOR | xor | <input type="checkbox" disabled /> | planned |
| Boolean | Negation | NOT | not | <input type="checkbox" disabled /> | planned |
| BitOpr | Bit and | via function | & | <input type="checkbox" disabled checked /> | |
| BitOpr | Bit or | via function | \| | <input type="checkbox" disabled checked /> | |
| Boolean | Bit xor | via function | ^ | <input type="checkbox" disabled checked /> | |
| BitOpr | Bit reverse | via function | ~ | <input type="checkbox" disabled checked /> | |
| BitOpr | Bit left shift | via function | << | <input type="checkbox" disabled />| planned |
| BitOpr | Bit right shift | via function | >> | <input type="checkbox" disabled />| planned |
| Branch | Use with `Project` and `Return` | CASE WHEN | CASE WHEN | <input type="checkbox" disabled />| planned |



## Clause
A notable limitation for now is that we do not
allow specifying multiple `MATCH` clauses in **one** query. For example,
the following code will not compile:
```Cypher
MATCH (a) -[]-> (b)
WITH a, b
MATCH (a) -[]-> () -[]-> (b) # second MATCH clause
RETURN a, b;
```

| Keyword | Comments | Supported | Todo
|:---|---|:---:|:---|
| MATCH | only one Match clause is allowed | <input type="checkbox" disabled checked /> |
| OPTIONAL MATCH | implements as left outer join | <input type="checkbox" disabled /> | planned |
| RETURN .. [AS] | | <input type="checkbox" disabled checked /> | |
| WITH .. [AS] | project, aggregate, distinct | <input type="checkbox" disabled checked /> | |
| WHERE | | <input type="checkbox" disabled checked /> | |
| NOT EXIST (an edge/path) | implements as anti join | <input type="checkbox" disabled />| |
| ORDER BY | | <input type="checkbox" disabled checked /> | |
| LIMIT | | <input type="checkbox" disabled checked /> | |

19 changes: 19 additions & 0 deletions docs/interactive_engine/neo4j_eco.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
# Neo4j Ecosystem

[Neo4j](https://neo4j.com/) is a graph database management system that utilizes graph natively to store and process data.
Unlike traditional relational databases that rely on relational schemas, Neo4j leverages the power of interconnected nodes and relationships,
forming a highly flexible and expressive data model. GIE implements Neo4j's HTTP and TCP protocol so that the system can
seamlessly interact with the Neo4j ecosystem, including development tools such as [cypher-shell] (https://dist.neo4j.org/cypher-shell/cypher-shell-4.4.19.zip)
and [drivers] (https://neo4j.com/developer/language-guides/).

The following documentations will guide you through empowering the Neo4j ecosystem
with GIE's distributed capability for large-scale graph.

```{toctree} arguments
---
caption: GIE For Tinkerpop Ecosystem
maxdepth: 2
---
neo4j/cypher_sdk
neo4j/supported_cypher
```
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -17,9 +17,9 @@
3. [Aggregate(Group)](#aggregate-group)
4. [Limitations](#limitations)
## Introduction
This documentation guides you how to work with the [gremlin](https://tinkerpop.apache.org/docs/current/reference) graph traversal language in GraphScope. On the one hand we retain the original syntax of most steps from the standard gremlin, on the other hand the usages of some steps are further extended to denote more complex situations in real-world scenarios.
This documentation guides you how to work with the [Gremlin](https://tinkerpop.apache.org/docs/current/reference) graph traversal language in GraphScope. On the one hand we retain the original syntax of most steps from the standard Gremlin, on the other hand the usages of some steps are further extended to denote more complex situations in real-world scenarios.
## Standard Steps
We retain the original syntax of the following steps from the standard gremlin.
We retain the original syntax of the following steps from the standard Gremlin.
### Source
#### [V()](https://tinkerpop.apache.org/docs/current/reference/#v-step)
The V()-step is meant to iterate over all vertices from the graph. Moreover, `vertexIds` can be injected into the traversal to select a subset of vertices.
Expand Down Expand Up @@ -308,7 +308,7 @@ g.V().valueMap("name")
g.V().valueMap("name", "age")
```
#### [values()](https://tinkerpop.apache.org/docs/current/reference/#values-step)
The values()-step is meant to map the graph element to the values of the associated properties given the provide property keys. Here we just allow only one property key as the argument to the `values()` to implement the step as a map instead of a flat-map, which may be a little different from the standard gremlin.
The values()-step is meant to map the graph element to the values of the associated properties given the provide property keys. Here we just allow only one property key as the argument to the `values()` to implement the step as a map instead of a flat-map, which may be a little different from the standard Gremlin.
Parameters: </br>
propertyKey - the property to retrieve its value from.
Expand Down Expand Up @@ -504,7 +504,7 @@ g.V().union(out(), out().out())
The match()-step provides a declarative form of graph patterns to match with. With match(), the user provides a collection of "sentences," called patterns, that have variables defined that must hold true throughout the duration of the match(). For most of the complex graph patterns, it is usually much easier to express via match() than with single-path traversals.
Parameters: </br>
matchSentences - define a collection of patterns. Each pattern consists of a start tag, a serials of gremlin steps (binders) and an end tag.
matchSentences - define a collection of patterns. Each pattern consists of a start tag, a serials of Gremlin steps (binders) and an end tag.
Supported binders within a pattern: </br>
* Expand: in()/out()/both(), inE()/outE()/bothE(), inV()/outV()/otherV/bothV
Expand Down Expand Up @@ -709,7 +709,7 @@ gremlin> g.V().select(expr("@.name"))
==>peter
```
### Aggregate (Group)
The group()-step in standard gremlin has limited capabilities (i.e. grouping can only be performed based on a single key, and only one aggregate calculation can be applied in each group), which cannot be applied to the requirements of performing group calculations on multiple keys or values; Therefore, we further extend the capabilities of the group()-step, allowing multiple variables to be set and different aliases to be configured in key by()-step and value by()-step respectively.
The group()-step in standard Gremlin has limited capabilities (i.e. grouping can only be performed based on a single key, and only one aggregate calculation can be applied in each group), which cannot be applied to the requirements of performing group calculations on multiple keys or values; Therefore, we further extend the capabilities of the group()-step, allowing multiple variables to be set and different aliases to be configured in key by()-step and value by()-step respectively.
Usages of the key by()-step:
```bash
Expand Down
Original file line number Diff line number Diff line change
@@ -1,15 +1,22 @@
# GIE For Gremlin
# GIE for Gremlin
This document will provide you with step-by-step guidance on how to connect your gremlin applications to the GIE's
FrontEnd service, which offers functionalities similar to the official Tinkerpop service.

Your first step is to obtain the endpoint of GIE Frontend service:
- Follow the [instruction](./deployment.md#deploy-your-first-gie-service) while deploying GIE in a K8s cluster,
- Follow the [instruction](./dev_and_test.md#manually-start-the-gie-services) while starting GIE on a local machine.

## Connecting Gremlin within Python
## Connecting via Python SDK

GIE makes it easy to connect to a loaded graph with Tinkerpop's [Gremlin-Python](https://pypi.org/project/gremlinpython/).

You first install the dependency:
```bash
pip3 install gremlinpython
```

Then connect to the service and run queries:

```Python
import sys
from gremlin_python import statics
Expand Down Expand Up @@ -61,7 +68,7 @@ resultIterationBatchSize: 64

```

## Connecting Gremlin within Java
## Connecting via Java SDK
See [Gremlin-Java](https://tinkerpop.apache.org/docs/current/reference/#gremlin-java) for connecting Gremlin
within the Java language.

Expand All @@ -81,7 +88,7 @@ client.close();
cluster.close();
```

## Gremlin Console
## Connecting via Gremlin-Console
1. Download Gremlin console and unpack to your local directory.
```bash
# if the given version (3.6.4) is not found, try to access https://dlcdn.apache.org to
Expand All @@ -91,7 +98,7 @@ cluster.close();
cd apache-tinkerpop-gremlin-console-3.6.4
```

2. In the directory of gremlin console, modify the `hosts` and `port` in `conf/remote.yaml` to the GIE Frontend Service endpoint, as
2. In the directory of Gremlin console, modify the `hosts` and `port` in `conf/remote.yaml` to the GIE Frontend Service endpoint, as
```bash
hosts: [your_endpoint_address]
port: [your_endpoint_port]
Expand Down
Loading

0 comments on commit c1007cf

Please sign in to comment.