19 Jul 03:31

wangsimo0

2b87854

3.3.1

Release date: July 18, 2024

New Features

[Preview] Supports temporary tables.
[Preview] JDBC Catalog supports Oracle and SQL Server.
[Preview] Unified Catalog supports Kudu.
Loading data into Primary Key tables with INSERT INTO supports partial updates in column mode.
User-defined variables support the ARRAY type. #42631
Stream Load supports converting JSON-type data and loading it into columns of STRUCT/MAP/ARRAY types. #45406
Supports global dictionary cache.
Supports deleting partitions in batch. #44744
Supports queries on Iceberg views. #46273
Supports managing column-level permissions in Apache Ranger. (Column-level permissions for materialized views and views must be set under the table object.) #47702

Improvements

Optimized the IdChain hashcode implementation to reduce the FE restart time. #47599
Improved error messages for the csv.trim_space parameter in the FILES() function, checking for illegal characters and providing reasonable prompts. #44740
Stream Load supports using \t and \n as row and column delimiters. Users do not need to convert them to their hexadecimal ASCII codes. #47302

Bug Fixes

Fixed the following issues:

Schema Change failures due to file location changes caused by Tablet migration during the Schema Change process. #45517
Cross-cluster Data Migration Tool fails to create tables in the target cluster due to control characters such as \, \r in the default values of fields. #47861
Persistent bRPC failures after BE restarts. #40229
The user_admin role can change the root password using the ALTER USER command. #47801
Primary key index write failures cause data write errors. #48045

Behavior Changes

Intermediate result spilling is enabled by default when sinking data to Hive and Iceberg. #47118
Changed the default value of the BE configuration item max_cumulative_compaction_num_singleton_deltas to 500. #47621
When users create a partitioned table without specifying the bucket number, if the number of partitions exceeds 5, the rule for setting the bucket count is changed to max(2*BE or CN count, bucket number calculated based on the largest historical partition data volume). The previous rule was to calculate the bucket number based on the largest historical partition data volume). #47949

Downgrade notes

To downgrade a cluster from v3.3.1 or later to v3.2, users must clean all temporary tables in the cluster by following these steps:

Disallow users to create new temporary tables:

ADMIN SET FRONTEND CONFIG("enable_experimental_temporary_table"="false");

Check if there are any temporary tables in the cluster:
```
SELECT * FROM information_schema.temp_tables;
```
If there are temporary tables in the system, clean them up using the following command (the SYSTEM-level OPERATE privilege is required):
```
CLEAN TEMPORARY TABLE ON SESSION 'session';
```

Assets 2

11 Jul 12:23

yingtingdong

3.2.9

63ce1bd

3.2.9

New Features

Paimon tables now support DELETE Vectors. #45866
Supports Column-level access control through Apache Ranger. #47702
Stream Load can automatically convert JSON strings into STRUCT/MAP/ARRAY types during loading. #45406
JDBC Catalog now supports Oracle and SQL Server. #35691

Improvements

Improved privilege management by restricting user_admin role users from resetting the password of the root user. #47801
Stream Load now supports using \t and \n as row and column delimiters. Users do not need to convert them to their hexadecimal ASCII codes. #47302
Optimized memory usage during data loading. #47047
Supports masking authentication information for the Files() function in audit logs. #46893
Hive tables now support the skip.header.line.count property. #47001
JDBC Catalog supports more data types. #47618

Bug Fixes

Fixed the following issues:

BE crash caused by ALTER TABLE ADD COLUMN after upgrading a shared-data cluster from v3.2.x to v3.3.0 and then rolling it back. #47826
Tasks initiated through SUBMIT TASK showed a Running status indefinitely in the QueryDetail interface. #47619
Forwarding queries to the FE Leader node caused a null pointer exception. #47559
SHOW MATERIALIZED VIEWS with WHERE conditions caused a null pointer exception. #47811
Vertical Compaction fails for Primary Key tables in shared-data clusters. #47192
Improper handling of I/O Error when sinking data to Hive or Iceberg tables. #46979
Table properties do not take effect when whitespaces are added to their values. #47119
BE crash caused by concurrent migration and Index Compaction operations on Primary Key tables. #46675

Assets 2

21 Jun 12:11

Dshadowzh

3.3.0

19a3f66

3.3.0

New Features and Improvements

Shared-data Cluster

Optimized the performance of Schema Evolution in shared-data clusters, reducing the time consumption of DDL changes to a sub-second level. For more information, see Schema Evolution.
To satisfy the requirement for data migration from shared-nothing clusters to shared-data clusters, the community officially released the StarRocks Data Migration Tool. It can also be used for data synchronization and disaster recovery between shared-nothing clusters.
[Preview] AWS Express One Zone Storage can be used as storage volumes, significantly improving read and write performance. For more information, see CREATE STORAGE VOLUME.
Optimized the garbage collection (GC) mechanism in shared-data clusters. Supports manual compaction for data in object storage. For more information, see Manual Compaction.
Optimized the Publish execution of Compaction transactions for Primary Key tables in shared-data clusters, reducing I/O and memory overhead by avoiding reading primary key indexes.

Data Lake Analytics

Data Cache enhancements
- Added the Data Cache Warmup command CACHE SELECT to fetch hotspot data from data lakes, which speeds up queries and minimizes resource usage. CACHE SELECT can work with SUBMIT TASK to achieve periodic cache warmup. This feature supports both tables in external catalogs and internal tables in shared-data clusters.
- Added metrics and monitoring methods to enhance the observability of Data Cache.
Parquet reader performance enhancements
- Optimized Page Index, significantly reducing the data scan size.
- Reduced the occurrence of reading unnecessary pages when Page Index is used.
- Uses SIMD to accelerate the computation to determine whether data rows are empty.
ORC reader performance enhancements
- Uses column ID for predicate pushdown to read ORC files after Schema Change.
- Optimized the processing logic for ORC tiny stripes.
Iceberg table format enhancements
- Significantly improved the metadata access performance of the Iceberg Catalog by refactoring the parallel Scan logic. Resolved the single-threaded I/O bottleneck in the native Iceberg SDK when handling large volumes of metadata files. As a result, queries with metadata bottlenecks now experience more than a 10-fold performance increase.
- Queries on Parquet-formatted Iceberg v2 tables support equality deletes.
[Experimental] Paimon Catalog enhancements
- Materialized views created based on the Paimon external tables now support automatic query rewriting.
- Optimized Scan Range scheduling for queries against the Paimon Catalog, improving I/O concurrency.
- Support for querying Paimon system tables.
- Paimon external tables now support DELETE Vectors, enhancing query efficiency in update and delete scenarios.
Enhancements in collecting external table statistics
- ANALYZE TABLE can be used to collect histograms of external tables, which helps prevent data skews.
- Supports collecting statistics of STRUCT subfields.
Table sink enhancements
- The performance of the Sink operator is doubled compared to Trino.
- Data can be sunk to Textfile- and ORC-formatted tables in Hive catalogs and storage systems such as HDFS and cloud storage like AWS S3.
[Preview] Supports Alibaba Cloud MaxCompute catalogs, with which you can query data from MaxCompute without ingestion and directly transform and load the data from MaxCompute by using INSERT INTO.
[Experimental] Supports ClickHouse Catalog.
[Experimental] Supports Kudu Catalog.

Performance Improvement and Query Optimization

Optimized performance on ARM.
Significantly optimized performance for ARM architecture instruction sets. Performance tests under AWS Graviton instances showed that the ARM architecture was 11% faster than the x86 architecture in the SSB 100G test, 39% faster in the Clickbench test, 13% faster in the TPC-H 100G test, and 35% faster in the TPC-DS 100G test.
Spill to Disk is in GA. Optimized the memory usage of complex queries and improved spill scheduling, allowing large queries to run stably without OOM.
[Preview] Supports spilling intermediate results to object storage.
Supports more indexes.
- [Preview] Supports full-text inverted index to accelerate full-text searches.
- [Preview] Supports N-Gram bloom filter index to speed up LIKE queries and the computation speed of ngram_search and ngram_search_case_insensitive functions.
Improved the performance and memory usage of Bitmap functions. Added the capability to export Bitmap data to Hive by using Hive Bitmap UDFs.
[Preview] Supports Flat JSON. This feature automatically detects JSON data during data loading, extracts common fields from the JSON data, and stores these fields in a columnar manner. This improves JSON query performance, comparable to querying STRUCT data.
[Preview] Optimized global dictionary. provides a dictionary object to store the mapping of key-value pairs from a dictionary table in the BE memory. A new dictionary_get() function is now used to directly query the dictionary object in the BE memory, accelerating the speed of querying the dictionary table compared to using the dict_mapping() function. Furthermore, the dictionary object can also serve as a dimension table. Dimension values can be obtained by directly querying the dictionary object using dictionary_get(), resulting in faster query speeds than the original method of performing JOIN operations on the dimension table to obtain dimension values.
[Preview] Supports Colocate Group Execution. significantly reduces memory usage for executing Join and Agg operators on the colocated tables, which ensures that large queries can be executed more stably.
Optimized the performance of CodeGen. JIT is enabled by default, which achieves a 5X performance improvement for complex expression calculations.
Supports using vectorization technology to implement regular expression matching, which reduces the CPU consumption of the regexp_replace function.
Optimized Broadcast Join so that the Broadcast Join operation can be terminated in advance when the right table is empty.
Optimized Shuffle Join in scenarios of data skew to prevent OOM.
When an aggregate query contains Limit, multiple Pipeline threads can share the Limit condition to prevent compute resource consumption.

Storage Optimization and Cluster Management

Enhanced flexibility of range partitioning. Three time functions can be used as partitioning columns. These functions convert timestamps or strings in the partitioning columns into date values and then the data can be partitioned based on the converted date values.
FE memory observability. Provides detailed memory usage metrics for each module within the FE to better manage resources.
Optimized metadata locks in FE. Provides Lock manager to achieve centralized management for metadata locks in FE. For example, it can refine the granularity of metadata lock from the database level to the table level, which improves load and query concurrency. In a scenario of 100 concurrent load jobs, the load time can be reduced by 35%.
Supports adding labels on BEs. Supports adding labels on BEs based on information such as the racks and data centers where BEs are located. It ensures even data distribution among racks and data centers, and facilitates disaster recovery in case of power failures in certain racks or faults in data centers.
Optimized the sort key. Duplicate Key tables, Aggregate tables, and Unique Key tables all support specifying sort keys through the ORDER BY clause.
[Experimental] Optimized the storage efficiency of non-string scalar data. This type of data supports dictionary encoding, reducing storage space usage by 12%.
Supports size-tiered compaction for Primary Key tables. Reduces write I/O and memory overhead during compaction. This improvement is supported in both shared-data and shared-nothing clusters.
Optimized read I/O for persistent indexes in Primary Key tables. Supports reading persistent indexes by a smaller granularity (page) and improves the persistent index's bloom filter. This improvem...

Assets 2

26 Jun 04:50

jaogoy

3.1.13

d9d3ed7

Release notes 3.1.13

Release date: June 26, 2024

Improvements

The Broker process supports access to Tencent Cloud COS Posix buckets. Users can load data from COS Posix buckets using Broker Load or unload data to COS Posix buckets using the SELECT INTO OUTFILE statement. #46597
Supports viewing comments of Hive tables in Hive Catalogs using SHOW CREATE TABLE. #37686
Optimized the evaluation time of Conjunct in WHERE clauses, such as multiple LIKE clauses on the same column or CASE WHEN expressions. #46914

Bug Fixes

Fixed the following issues:

DELETE statements fail in shared-data clusters if there are excessive number of partitions to be deleted. #46229

Assets 2

21 Jun 01:50

jaogoy

2.5.22

5dffd65

Release notes 2.5.22

Release date: June 20, 2024

Improvements

Optimized a partition check logic used for building query execution plan, significantly reducing the time consumption of complex queries that involve multiple tables. #46781

Bug Fixes

Fixed the following issues:

Function Call does not handle child errors correctly. #42590
The internal data statistics were not cleaned up regularly, causing inaccurate estimated information and thereby inefficient query plans. This will cause a drop in query performance and a surge in memory usage. #45839
Using a stale column histogram may lead to the Division by Zero exception. #45614

Assets 2

07 Jun 09:12

yingtingdong

3.2.8

759cc78

Realease notes 3.2.8

Release date: June 7, 2024

New Features

Supports adding labels on BEs: Supports adding labels on BEs based on information such as the racks and data centers where BEs are located. It ensures even data distribution among racks and data centers, and facilitates disaster recovery in case of power failures in certain racks or faults in data centers. #38833

Bug Fixes

Fixed the following issues:

An error is returned when users DELETE data rows from tables that use the expression partitioning method with str2date. #45939
BEs in the destination cluster crash when the StarRocks Cross-cluster Data Migration Tool fails to retrieve the Schema information from the source cluster. #46068
The error Multiple entries with same key is returned to queries with non-deterministic functions. #46602

Assets 2

30 May 13:29

jaogoy

3.1.12

fc2b9c3

Release notes 3.1.12

Release date: May 30, 2024

New Features

Flink connector supports reading complex data types ARRAY, MAP, and STRUCT from StarRocks. #42932 #347

Improvements

When using the Broker process, Broker Load supports loading data from COS posix buckets, and SELECT ... FROM ... INTO OUTFILE supports unloading data to COS posix buckets. The format of the path parameter is cosn://some_bucket/xxx. #46090
Previously, when BE failed to communicate with FE via RPC, FE would return a generic error message: call frontend service failed reason=xxx, making it unclear what the specific issue was. The error messages are now optimized to include specific reasons, such as timeout or server busy. #44153
Improved error messages to indicate specific issues during data loading, such as the number of error data rows exceeding limits, mismatched column numbers, invalid column names, and no data in any partition.

Security

Upgraded Kafka client dependency to v3.4.0 to fix the CVE-2023-25194 security issue. #45382

Bug Fixes

Fixed the following issues:

If a materialized view definition includes multiple self-joins of the same table and incremental refreshes by partitions based on that table, incorrect results would occur due to wrong partition selection. #45936
FEs crash when a Bitmap index is created in a materialized view in shared-data clusters causes . #45665
BEs crash due to null pointer issues when FE follower is connected via ODBC and CREATE TABLE is executed. #45043
Querying information_schema.task_runs fails frequently when many asynchronous tasks exist. #45520
When a SQL statement contains multiple COUNT DISTINCT and includes LIMIT, LIMIT is wrongly processed, resulting in inconsistent data returned each time the statement is executed. #44749
Queries with ORDER BY LIMIT clauses on Duplicate Key tables and Aggregate tables produce incorrect results. #45037

Assets 2

25 May 09:38

yingtingdong

3.2.7

44058fe

Release notes 3.2.7

Release date: May 25, 2024

New Features

Stream Load supports data compression during transmission, reducing network bandwidth overhead. Users can specify different compression algorithms using parameters compression and Content-Encoding. Supported compression algorithms including GZIP, BZIP2, LZ4_FRAME, DEFLATE, and ZSTD. #43732
Optimized the garbage collection (GC) mechanism in shared-data clusters. Supports manual compaction for tables or partitions stored in object storage. #39532
Flink connector supports reading complex data types ARRAY, MAP, and STRUCT from StarRocks. #42932 #347
Supports populating Data Cache asynchronously during queries, reducing the impact of populating cache on query performance. #40489
ANALYZE TABLE supports collecting histograms for external tables, effectively addressing data skews. For more information, see CBO statistics. #42693
Lateral Join with UNNEST supports LEFT JOIN. #43973
Query Pool supports configuring memory usage threshold that triggers spilling via BE static parameter query_pool_spill_mem_limit_threshold. Once the threshold is reached, intermediate results of queries will be spilled to disks to reduce memory usage, thus avoiding OOM.#44063
Supports creating asynchronous materialized views based on Hive views.#45085

Improvements

Optimized the error message returned for Broker Load tasks when there is no data under the specified HDFS paths. #43839
Optimized the error message returned when the Files function is used to read data from AWS S3 without Access Key and Secret Key specified. #42450
Optimized the error message returned for Broker Load tasks that load no data to any partitions. #44292
Optimized the error message returned for INSERT INTO SELECT tasks when the column count of the destination table does not match that in the SELECT statement. #44331

Bug Fixes

Fixed the following issues:

Concurrent read or write of the BITMAP-type data may cause BE to crash. #44167
Primary key indexes may cause BE to crash. #43793 #43569 #44034
Under high query concurrency scenarios, the str_to_map function may cause BE to crash. #43901
When the Masking policy of Apache Ranger is used, an error is returned when table aliases are specified in queries. #44445
In shared-data clusters, query execution cannot be routed to a backup node when the current node encounters exceptions. The corresponding error message is optimized for this issue. #43489
Memory information is incorrect in the container environment. #43225
An exception is thrown when INSERT tasks are canceled. #44239
Expression-based dynamic partitions cannot be automatically created. #44163
Creating partitions may cause FE deadlock. #44974

Assets 2

15 May 09:38

jaogoy

2.5.21

dc2bcdb

Release notes 2.5.21

Release date: May 15, 2024

Parameter changes

Decreased the default value of BE parameter update_compaction_size_threshold from 256 MB to 64 MB to increase the trigger frequency for Primary Key table compaction.

Improvements

Optimized the usage of database locks for materialized view refresh to prevent deadlock. #42801
Both s3a:// and s3:// can be used to access data in AWS S3. #42460

Bug Fixes

Fixed the following issues:

Schema change may cause issues in prefix index sorting, leading to incorrect results for queries based on prefix indexes. #44941
After a Routine Load task is paused due to Kafka cluster abnormalities, the background still attempts to connect to this abnormal Kafka cluster, which prevents other Routine Load tasks in this StarRocks cluster from consuming normal Kafka messages. #45029
When querying views in information_schema, the database lock is held for an unexpectedly long time, which prolongs the overall query time. #45392
Enabling Query Cache may cause BEs to crash if the SQL query contains a HAVING clause. This issue can be resolved by disabling Query Cache using set enable_query_cache=false. #43823
When Query Cache is enabled, some queries may return an error message All slotIds should be remapped. #42861

Assets 2

28 Apr 11:55

jaogoy

3.1.11

34f131b

Release notes 3.1.11

Release date: April 28, 2024

Behavior Changes

Users are not allowed to drop views in the system database information_schema using DROP TABLE. #43556
Users are not allowed to specify duplicate keys in the ORDER BY clause when creating a Primary Key table. #43374

Improvements

Queries on Parquet-formatted Iceberg v2 tables support equality deletes.

Bug Fixes

Fixed the following issues:

When a user queries data from an external table in an external catalog, access to this table is denied even when the user has the SELECT privilege on this table. SHOW GRANTS also shows that the user has this privilege. #44061
str_to_map may cause BEs to crash. #43930
When a Routine Load job is going on, running show proc '/routine_loads' is stuck due to deadlock. #44249
Persistent Index of Primary Key tables may cause BEs to crash due to issues in concurrency control. #43720
The pending_task_run_count displayed on the page of leaderFE_IP:8030 is incorrect. The displayed number is the sum of Pending and Running tasks, not Pending tasks. In addition, the information of the metric refresh_pending cannot be displayed using followerFE_IP:8030. #43052
Some SQL queries that contain CTEs may encounter the Invalid plan: PhysicalTopNOperator error. #44185

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New Features

Improvements

Bug Fixes

Behavior Changes

Downgrade notes

New Features

Improvements

Bug Fixes

New Features and Improvements

Shared-data Cluster

Data Lake Analytics

Performance Improvement and Query Optimization

Storage Optimization and Cluster Management

Improvements

Bug Fixes

Improvements

Bug Fixes

New Features

Bug Fixes

New Features

Improvements

Security

Bug Fixes

New Features

Improvements

Bug Fixes

Parameter changes

Improvements

Bug Fixes

Behavior Changes

Improvements

Bug Fixes

Releases: StarRocks/starrocks

3.3.1

New Features

Improvements

Bug Fixes

Behavior Changes

Downgrade notes

3.2.9

New Features

Improvements

Bug Fixes

3.3.0

New Features and Improvements

Shared-data Cluster

Data Lake Analytics

Performance Improvement and Query Optimization

Storage Optimization and Cluster Management

Release notes 3.1.13

Improvements

Bug Fixes

Release notes 2.5.22

Improvements

Bug Fixes

Realease notes 3.2.8

New Features

Bug Fixes

Release notes 3.1.12

New Features

Improvements

Security

Bug Fixes

Release notes 3.2.7

New Features

Improvements

Bug Fixes

Release notes 2.5.21

Parameter changes

Improvements

Bug Fixes

Release notes 3.1.11

Behavior Changes

Improvements

Bug Fixes