Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(interactive): Support VarChar as yet another property type #3400

Open
zhanglei1949 opened this issue Dec 5, 2023 · 3 comments
Open
Assignees

Comments

@zhanglei1949
Copy link
Collaborator

Currently Interactive only using string as vertex property. We also need to support string as edge property.

To support string edge property on csr storage, we need to ensure the maximum length of string, to reserve enough space.

Need more detail design and explanation.

@zhanglei1949 zhanglei1949 self-assigned this Dec 5, 2023
@zhanglei1949
Copy link
Collaborator Author

zhanglei1949 commented Dec 5, 2023

Shall the previous String type be deprecated or it can be viewed as a varchar with a default maximum length as a fixed value?

@zhanglei1949
Copy link
Collaborator Author

zhanglei1949 commented Dec 5, 2023

A possible usage of VARCHAR can be

property_type:
  varchar: 
    max_length: 256
property_type:
   primtive_type: VARCHAR(256)

@zhanglei1949
Copy link
Collaborator Author

zhanglei1949 commented Dec 5, 2023

About data importing.

  • Arrow has no DataType corresponding to VARCHAR.
    • FixedSizeBinaryType: FixedSize, can random access.
    • LargeStringType: String
  • If string's length larger than max length, throw error when building graph.

zhanglei1949 added a commit that referenced this issue Dec 14, 2023
Import a new data type `VARCHAR` to GraphScope Interactive.
#3400 
#3405 

0. Use `VARCHAR` in schema. `VARCHAR` is not a primitive type.

The following Representation can be revised.
```yaml
property_type:
  var_char:
    max_length: 128
```



1. Refactor `PropertyType`

Previous `PropertyType` is just a enum.
```c++
enum class PropertyType {
  kEmpty,
  kBool,
  kUInt8,
  kUInt16,
  ...
};
```

To represent `VARCHAR`, an enum alone is not enough, because we also
need to express the maximum length of `VARCHAR`, i.e. max_length.

```c++
namespace impl {

enum class PropertyTypeImpl {
  kEmpty,
  kBool,
  kUInt8,
  kUInt16,
 ...
};

// Stores additional type information for PropertyTypeImpl
union AdditionalTypeInfo {
  uint16_t max_length;  // for varchar
};
}  // namespace impl

struct PropertyType {
  impl::PropertyTypeImpl type_enum;
  impl::AdditionalTypeInfo additional_type_info;

  PropertyType()
      : type_enum(impl::PropertyTypeImpl::kEmpty), additional_type_info() {}
  PropertyType(impl::PropertyTypeImpl type)
      : type_enum(type), additional_type_info() {}
  PropertyType(impl::PropertyTypeImpl type, uint16_t max_length)
      : type_enum(type), additional_type_info({.max_length = max_length}) {
    assert(type == impl::PropertyTypeImpl::kVarChar);
  }

  //get DataType object, like `arrow::bool()`(but arrow returns a shared_ptr)
  static PropertyType empty();
  static PropertyType bool_();
  static PropertyType uint8();
  static PropertyType uint16();

  // different from other functions
  static PropertyType var_char(uint16_t max_length); 
  //...

  static const PropertyType kEmpty;
  static const PropertyType kBool;
  static const PropertyType kUInt8;
  static const PropertyType kUInt16;
  //...

  bool operator==(const PropertyType& other) const;
  bool operator!=(const PropertyType& other) const;
};

```

VarChar is implemented but not fully provided by interactive. User shall be able to use VarChar type after VarChar is supported by compiler.

---------

Co-authored-by: liulx20 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant