Skip to content

Commit

Permalink
feat(interactive): Adapt to new data type definition (#3749)
Browse files Browse the repository at this point in the history
Support parsing the data type from the new unified type definition.

```yaml
GS_DATA_TYPE:  # choose one
  # optional value: DT_SIGNED_INT32, DT_UNSIGNED_INT32, DT_SIGNED_INT64
  #                 DT_UNSIGNED_INT64, DT_BOOL, DT_FLOAT, DT_DOUBLE
  primitive_type:
  string:  # choose one
    long_text: # string with unlimited length
    var_char:  # string with variable length, bounded by max_length
      max_length: <uint32>
  temporal: # choose one
    date32:
    timestamp:
```

Also compatible with previous schema files.

migrate schema to new unified schema def

fix format and add default_value and nullable check

minor

fix

todo: fix-ci

[GIE Compiler] support new format of flex schema in compiler

minor fix

minor fix

minor

revert unneccessary changes

fix
  • Loading branch information
zhanglei1949 committed Apr 25, 2024
1 parent b709d54 commit 88f0b5b
Show file tree
Hide file tree
Showing 34 changed files with 1,394 additions and 468 deletions.
10 changes: 10 additions & 0 deletions .github/workflows/flex.yml
Original file line number Diff line number Diff line change
Expand Up @@ -156,6 +156,16 @@ jobs:
GLOG_v=10 ./bin/bulk_loader -g ${SCHEMA_FILE} -l ${BULK_LOAD_FILE} -d /tmp/csr-data-dir/
GLOG_v=10 ./tests/rt_mutable_graph/string_edge_property_test ${SCHEMA_FILE} /tmp/csr-data-dir/
- name: Test schema parsing and loading on modern graph
env:
FLEX_DATA_DIR: ${{ github.workspace }}/flex/interactive/examples/modern_graph/
run: |
rm -rf /tmp/csr-data-dir/
cd ${GITHUB_WORKSPACE}/flex/build/
SCHEMA_FILE=../tests/rt_mutable_graph/modern_graph_unified_schema.yaml
BULK_LOAD_FILE=../interactive/examples/modern_graph/bulk_load.yaml
GLOG_v=10 ./bin/bulk_loader -g ${SCHEMA_FILE} -l ${BULK_LOAD_FILE} -d /tmp/csr-data-dir/
- name: Test build empty graph
run: |
rm -rf /tmp/csr-data-dir/
Expand Down
9 changes: 9 additions & 0 deletions .github/workflows/hqps-db-ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -200,11 +200,20 @@ jobs:
--procedure_name=get_person_name \
--graph_schema_path=../interactive/examples/modern_graph/graph.yaml
./load_plan_and_gen.sh -e=hqps -i=../interactive/examples/modern_graph/count_vertex_num.cypher -w=/tmp/codegen \
--ir_conf=${GITHUB_WORKSPACE}/flex/tests/hqps/engine_config_test.yaml -o=${PLUGIN_DIR} \
--procedure_name=count_vertex_num \
--graph_schema_path=../interactive/examples/modern_graph/graph.yaml
cd ${GITHUB_WORKSPACE}/flex/tests/interactive/
bash test_plugin_loading.sh ./modern_graph_schema_v0_0.yaml \
../../interactive/examples/modern_graph/bulk_load.yaml \
${GITHUB_WORKSPACE}/flex/tests/hqps/engine_config_test.yaml
bash test_plugin_loading.sh ./modern_graph_schema_v0_1.yaml \
../../interactive/examples/modern_graph/bulk_load.yaml \
${GITHUB_WORKSPACE}/flex/tests/hqps/engine_config_test.yaml
- name: Run End-to-End cypher adhoc ldbc query test
env:
GS_TEST_DIR: ${{ github.workspace }}/gstest
Expand Down
11 changes: 6 additions & 5 deletions docs/flex/interactive/custom_graph_data.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,10 +56,12 @@ schema:
primitive_type: DT_SIGNED_INT32
- property_name: tagline
property_type:
primitive_type: DT_STRING
string:
long_text:
- property_name: title
property_type:
primitive_type: DT_STRING
string:
long_text:
primary_keys:
- id
- type_name: Person
Expand All @@ -72,7 +74,8 @@ schema:
primitive_type: DT_SIGNED_INT32
- property_name: name
property_type:
primitive_type: DT_STRING
string:
long_text:
primary_keys:
- id
edge_types:
Expand Down Expand Up @@ -120,8 +123,6 @@ Supported primitive data types for properties include:
- DT_BOOL
- DT_FLOAT
- DT_DOUBLE
- DT_STRING
- DT_DATE32
For a comprehensive list of supported types, please refer to the [data model](./data_model) page.
Expand Down
72 changes: 66 additions & 6 deletions docs/flex/interactive/data_model.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,8 @@ Within the `graph.yaml` file, vertices are delineated under the `vertex_types` s
primitive_type: DT_SIGNED_INT64
- property_name: name
property_type:
primitive_type: DT_STRING
string:
long_text:
primary_keys: # these must also be listed in the properties
- id
```
Expand Down Expand Up @@ -66,20 +67,79 @@ Entity data pertains to the properties associated with vertices and edges. In Gr
- DT_BOOL
- DT_FLOAT
- DT_DOUBLE
- DT_STRING
- DT_DATE32

In the `graph.yaml`, a primitive type, such as `DT_STRING`, can be written as:
In the `graph.yaml`, a primitive type, such as `DT_DOUBLE`, can be written as:
```yaml
property_type:
primitive_type: DT_STRING
primitive_type: DT_DOUBLE
```

### String Types


We categorize string type into three subtypes:
- `long_text`: A string type with no length limitation, allowing for unlimited text content.
- `char`: A string type with a fixed length. The field defines an attribute fixed_length, which specifies the desired length of the string. The string will be restricted to the specified length.
- `var_char`: A string type with variable length, bounded by a maximum length. The field defines an attribute max_length, which sets the maximum length of the string. The string will be limited to the specified maximum length.

These three string type can be written in yaml as:

```yaml
string:
long_text: # string with unlimited length
char: # string with fixed length
fixed_length: <uint32>
var_char: # string with variable length, bounded by max_length
max_length: <uint32>
```

Users can choose the appropriate string type based on their requirements. The long_text type is suitable for handling text with unlimited length. The char type is useful for scenarios that require fixed-length strings. The var_char type is ideal for situations where the string length needs to be restricted and a maximum length is specified.

Note: fixed-length char is currently not supported.


### Temporal types

Temporal types can be defined in the following ways:

```yaml
temporal:
date:
# optional value: DF_YYYY_MM_DD, means ISO fomat: 2019-01-01
date_format: <string>
time:
# optional value: TF_HH_MM_SS_SSS, means ISO format: 00:00:00.000
time_format: <string>
# optional value: TZF_UTC, TZF_OFFSET
time_zone_format: <string>
date_time:
# optional value: DTF_YYYY_MM_DD_HH_MM_SS_SSS,
# means ISO format: 2019-01-01 00:00:00.000
date_time_format: <string>
time_zone_format: <string> # optional value: TZF_UTC, TZF_OFFSET
date32: # int32 days since 1970-01-01
time32: # int32 milliseconds past midnight
timestamp: # int64 milliseconds since 1970-01-01 00:00:00.000000
```

Here is an explanation of each temporal type:
- `date`: Denotes a date value. Optionally, the field date_format can be specified to define the format of the date. The date_format attribute could take a value like DF_YYYY_MM_DD, indicating the ISO format: "2019-01-01".
- `time`: Represents a time value. Optionally, the field time_format can be used to specify the format of the time. The time_format attribute could take a value like TF_HH_MM_SS_SSS, indicating the ISO format: "00:00:00.000". The time_zone_format attribute can also be included to define the format of the time zone, with optional values of TZF_UTC or TZF_OFFSET.
- `date_time`: Signifies a combination of date and time. The field date_time_format is optional and can be used to specify the format of the date and time. For example, a date_time_format value of DTF_YYYY_MM_DD_HH_MM_SS_SSS would indicate the ISO format: "2019-01-01 00:00:00.000". The time_zone_format attribute can additionally be specified to define the format of the time zone, with optional values of TZF_UTC or TZF_OFFSET.
- `date32`: Represents the date as an integer, with the value being the number of days since January 1, 1970.
- `time32`: Represents the time as an integer, representing the number of milliseconds past midnight.
- `timestamp`: Denotes a timestamp as an integer, representing the number of milliseconds since January 1, 1970 at 00:00:00.000.

This YAML structure allows users to select the appropriate data type and format for handling temporal data, such as dates, times, and timestamps. The optional attributes provide flexibility to define the desired format or timezone representation.

Note:
- Currently we only support `date` and `timestamp`. The other types will be supported in the near future.

### Array Types

Array types are currently not supported, but are planned to be supported in the near future.
Once supported, albeit requiring that every element within the array adheres to one of the previously mentioned primitive types.
It's crucial that all elements within a single array share the same type. In `graph.yaml`, user can describe designating a property as an array of the `DT_STRING` type as:
It's crucial that all elements within a single array share the same type. In `graph.yaml`, user can describe designating a property as an array of the `DT_UNSIGNED_INT64` type as:

```yaml
property_type:
Expand Down
48 changes: 36 additions & 12 deletions docs/flex/interactive/development/admin_service.md
Original file line number Diff line number Diff line change
Expand Up @@ -74,14 +74,18 @@ curl -X GET -H "Content-Type: application/json" "http://[host]/v1/graph"
"property_id": "1",
"property_name": "name",
"property_type": {
"primitive_type": "DT_STRING"
"string":{
"long_text": {}
}
}
},
{
"property_id": "2",
"property_name": "age",
"property_type": {
"primitive_type": "DT_SIGNED_INT32"
"string":{
"long_text": {}
}
}
}
],
Expand All @@ -104,14 +108,18 @@ curl -X GET -H "Content-Type: application/json" "http://[host]/v1/graph"
"property_id": "1",
"property_name": "name",
"property_type": {
"primitive_type": "DT_STRING"
"string":{
"long_text": {}
}
}
},
{
"property_id": "2",
"property_name": "lang",
"property_type": {
"primitive_type": "DT_STRING"
"string":{
"long_text": {}
}
}
}
],
Expand Down Expand Up @@ -202,14 +210,18 @@ This API create a new graph according to the specified schema in request body.
"property_id": 1,
"property_name": "name",
"property_type": {
"primitive_type": "DT_STRING"
"string":{
"long_text": {}
}
}
},
{
"property_id": 2,
"property_name": "age",
"property_type": {
"primitive_type": "DT_SIGNED_INT32"
"string":{
"long_text": {}
}
}
}
],
Expand All @@ -232,14 +244,18 @@ This API create a new graph according to the specified schema in request body.
"property_id": 1,
"property_name": "name",
"property_type": {
"primitive_type": "DT_STRING"
"string":{
"long_text": {}
}
}
},
{
"property_id": 2,
"property_name": "lang",
"property_type": {
"primitive_type": "DT_STRING"
"string":{
"long_text": {}
}
}
}
],
Expand Down Expand Up @@ -380,14 +396,18 @@ curl -X GET -H "Content-Type: application/json" "http://[host]/v1/graph/{graph_
"property_id": 1,
"property_name": "name",
"property_type": {
"primitive_type": "DT_STRING"
"string":{
"long_text": {}
}
}
},
{
"property_id": 2,
"property_name": "age",
"property_type": {
"primitive_type": "DT_SIGNED_INT32"
"string":{
"long_text": {}
}
}
}
],
Expand All @@ -410,14 +430,18 @@ curl -X GET -H "Content-Type: application/json" "http://[host]/v1/graph/{graph_
"property_id": 1,
"property_name": "name",
"property_type": {
"primitive_type": "DT_STRING"
"string":{
"long_text": {}
}
}
},
{
"property_id": 2,
"property_name": "lang",
"property_type": {
"primitive_type": "DT_STRING"
"string":{
"long_text": {}
}
}
}
],
Expand Down
14 changes: 8 additions & 6 deletions flex/engines/graph_db/database/graph_db.cc
Original file line number Diff line number Diff line change
Expand Up @@ -108,17 +108,19 @@ Result<bool> GraphDB::Open(const GraphDBConfig& config) {
// is not serialized and deserialized.
auto& mutable_schema = graph_.mutable_schema();
mutable_schema.SetPluginDir(schema.GetPluginDir());
std::vector<std::string> plugin_paths;
std::vector<std::pair<std::string, std::string>> plugin_name_paths;
const auto& plugins = schema.GetPlugins();
for (auto plugin_pair : plugins) {
plugin_paths.emplace_back(plugin_pair.first);
plugin_name_paths.emplace_back(
std::make_pair(plugin_pair.first, plugin_pair.second.first));
}

std::sort(plugin_paths.begin(), plugin_paths.end(),
[&](const std::string& a, const std::string& b) {
return plugins.at(a).second < plugins.at(b).second;
std::sort(plugin_name_paths.begin(), plugin_name_paths.end(),
[&](const std::pair<std::string, std::string>& a,
const std::pair<std::string, std::string>& b) {
return plugins.at(a.first).second < plugins.at(b.first).second;
});
mutable_schema.EmplacePlugins(plugin_paths);
mutable_schema.EmplacePlugins(plugin_name_paths);

last_compaction_ts_ = 0;
MemoryStrategy allocator_strategy = MemoryStrategy::kMemoryOnly;
Expand Down
Loading

0 comments on commit 88f0b5b

Please sign in to comment.