-
Notifications
You must be signed in to change notification settings - Fork 763
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add Preliminary Vector Indexing Support #11318
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎ 1 Ignored Deployment
|
# Conflicts: # Cargo.lock # src/query/ast/src/ast/statements/mod.rs # src/query/service/src/interpreters/mod.rs # src/query/sql/src/executor/physical_plan_builder.rs # src/query/sql/src/planner/binder/table.rs # src/query/sql/src/planner/optimizer/heuristic/decorrelate.rs # src/query/sql/src/planner/optimizer/rule/factory.rs # src/query/sql/src/planner/optimizer/rule/rewrite/mod.rs # src/query/sql/src/planner/optimizer/rule/rule.rs # src/query/sql/src/planner/plans/ddl/index.rs # src/query/sql/src/planner/plans/plan.rs # src/query/sql/src/planner/plans/scan.rs
# Conflicts: # src/meta/proto-conv/src/lib.rs # src/query/ast/src/parser/statement.rs # src/query/service/Cargo.toml # src/query/sql/Cargo.toml # src/query/sql/src/planner/plans/plan.rs # src/query/storages/common/table-meta/src/meta/mod.rs # src/query/storages/fuse/src/operations/fuse_source.rs # src/query/storages/fuse/src/operations/read_data.rs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refer to src/meta/README.md
:
How to add new meta data types to store in meta-service
Databend meta-service stores raw bytes and does not understand what the bytes are.
Databend-query use rust types in its runtime, these types such as TableMeta
must be serialized to be stored in meta-service.
The serialization is implemented with protobuf
and a protobuf message provides
the backward compatibility, i.e., a newer version(version-B) protobuf message can be deserialized
from an older version(version-A) of serialized bytes, and version-B protobuf
message can be converted to version-B rust types.
-
Rust types are defined in
src/meta/app/src/
, such asTableMeta
that is
defined insrc/meta/app/src/schema/table.rs
. -
The corresponding protobuf message is defined in
src/meta/protos/proto/
,
such assrc/meta/protos/proto/table.proto
. -
The conversion between protobuf message and rust type is defined in
src/meta/proto-conv/
, such as
src/meta/proto-conv/src/table_from_to_protobuf_impl.rs
,
by implementing aFromToProto
trait.
To add a new feature(add new type or update an type), the developer should do:
-
Add the rust types, in one mod in the
src/meta/app/src/
; -
Add a new version in
src/meta/proto-conv/src/util.rs
. The versions track
change history and will be checked when converting protobuf message to rust
types:const META_CHANGE_LOG: &[(u64, &str)] = &[ // ( 1, "----------: Initial", ), ( 2, "2022-07-13: Add: share.proto", ), ( 3, "2022-07-29: Add: user.proto/UserOption::default_role", ), ... (37, "2023-05-05: Add: index.proto", ), (38, "2023-05-19: Rename: table.proto/TableCopiedFileLock to EmptyProto", ), (39, "2023-05-22: Add: data_mask.proto", ), ];
Note that only add new version to the bottom and remove old version from the
top. -
Add the conversion implementation to
src/meta/proto-conv/src/
, refer to
other files in this crate. -
Add a compatibility test to ensure that compatibility will always be kept in
future, a good example is:src/meta/proto-conv/tests/it/v039_data_mask.rs
I hereby agree to the terms of the CLA available at: https://databend.rs/dev/policies/cla/
Summary
This PR introduces the Ivf index, currently only supports cosine_distance as a similarity measure.
Syntax
The syntax design is similar to pgvector.
create index:
drop index:
set search param:
A column and a similarity metric uniquely identify an index. Therefore, when setting parameters, you need to specify the column name and similarity measurement type.
Implement
Build an index: build an index for each block of the table, and store it after compression.
ANN query: For queries of the
order by cosine_distance(column_name target) limit n
type, it will be rewritten in the RBO stage, and the execution stage will be divided into the following steps:Benchmark
run benchmark:
The benchmark results are written to
/benchmark/vector_index_benchmark/result.csv
.Next step
Closes #11054 #9699