-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
StarRocks Roadmap 2024 #39686
Comments
Hello, |
Welcome, you can check this #13300 first, we'll update more good-first-issues in 2024 later. Particularly regarding external catalog and connectors. |
How about incremental refresh materialized view for external table like Iceberg or Hudi? I think this feature can reduce the cost of refresh mv |
Yes. We are considering about it, there is a |
We prefer Iceberg, for better interface design and less bugs. incremental snapshot refresh is useful for non-partition table. |
On yesterday 2024 roadmap meeting,mention that will support tag on BE in shared nothing mode,it like multi warehouse mechanism like in shared data mode,could split into load data warehouse\adhoc query warehouse\ETL warehouse...? And when will release? |
#38833 It has finished already, will be published in the next version. |
I think there are different issues. Multi-warehouse enables different users to see different machines so as to get resource isolation at machine level. #38833 is about the replica location. In share-nothing deployment the data is all on HDFS/S3, we don't have replicas but we still need multi-warehouse cabability to isolate different machines to different resource group. |
Thanks, the following are our wanted features and improvements based on my tests on StarRocks and my company's business scenarios:
|
Expected to support deletion queries in the Parquet format of Iceberg |
We plan implementing it in v3.3 |
Thanks a lot for the extensive feedback:
|
OK, hope we have a detailed communication later. Happy Spring Festival! ^_^ |
Top priorities for our ability to migrate to StarRocks from Greenplum. These two issues:
Plus, Time Travel natively in StarRocks without using an external Data Lake. The 2024 Roadmap looks great. Hoping we can get migrated so we can contribute to the journey. |
This roadmap is exciting. Let's collaborate on supporting Databricks Unity Catalog as well as MAP & STRUCT types as well. Thanks a million. |
Connector: Directly reading data files instead of reading from BE. Do you have a more detailed plan or issue for this feature? thanks. |
Materialized view support kafka source && db cdc,Reduce dependencies on external components such as kafka connect, flink cdc, etc Using starrocks as the lake warehouse is to build a lightweight data platform. If you rely on flink to import data in real time, flink needs to run on yarn or k8s, it will no longer be lightweight. We have a lot of offshore business customers using aws. if user need to write redshift from kafka, use mv。ie: https://aws.amazon.com/cn/blogs/china/new-for-amazon-redshift-general-availability-of-streaming-ingestion-for-kinesis-data-streams-and-managed-streaming-for-apache-kafka/ |
Any plan to support Arrow flight SQL protocol for better data transportation? As data engineer & data scientist, we expect less overhead of converting from SQL (currently Mysql) protocol to pandas/pyarrow table |
Thanks for the explanation, @Dshadowzh. I have just a question about this part:
I think the documentation is confusing in this regard. On one hand, the limitations section of the Query rewrite with materialized views page says:
On the other hand, the Set extra sort keys section of the Synchronous materialized view page says:
If I understood it correctly, |
@chulucninh09 there's an open issue tracking support for Apache Arrow Flight SQL, but it would be great to get it on the 2024 roadmap. Interestingly, Apache Doris has this supported since 2.1, released on March 8th 2024. |
We'll add this to the roadmap and launch the project soon. Thank you for your attention. |
Is this year not the time for the functional development plan of a high-speed data transmission link based on Arrow Flight SQL。 |
@gaigaikuaipao Yes, we want to release it in 3.4. You can follow this PR #50199 |
Shared-data & StarOS
Performance
Easy to use
files
table functionData lake analytics
Data warehousing(batch and streaming)
Batch processing & ETL improvement
Streaming processing & real-time update
All-in-one scenarios
Release
The text was updated successfully, but these errors were encountered: