Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change string sort order #6406

Closed
wants to merge 8 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,12 @@
### Fixed
* <How do the end-user experience this issue? what was the impact?> ([#????](https://github.com/realm/realm-core/issues/????), since v?.?.?)
* Align dictionaries to Lists and Sets when they get cleared. ([#6205](https://github.com/realm/realm-core/issues/6205), since v10.4.0)
* Fixed equality queries on a Mixed property with an index possibly returning the wrong result if values of different types happened to have the same StringIndex hash.

### Breaking changes
* Support for upgrading from Realm files produced by RealmCore v5.23.9 or earlier is no longer supported.
* Sorting order of strings has changed to use standard unicode codepoint order instead of grouping similar english letters together. A noticeable change will be from "aAbBzZ" to "ABZabz". ([2573](https://github.com/realm/realm-core/issues/2573))
* BinaryData and StringData are now strongly typed for comparisons and queries. This change is especially relevant when querying for a string constant on a Mixed property, as now only strings will be returned. If searching for BinaryData is desired, then that type must be specified by the constant. In RQL the new way to specify a binary constant is to use `mixed = bin('xyz')` or `mixed = binary('xyz')`. ([6118](https://github.com/realm/realm-core/issues/6118)).

### Compatibility
* Fileformat: Generates files with format v24. Reads and automatically upgrade from fileformat v10. If you want to upgrade from an earlier file format version you will have to use RealmCore v13.x.y or earlier.
Expand Down
2 changes: 1 addition & 1 deletion src/realm/group.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -777,7 +777,7 @@ class Group : public ArrayParent {
///
/// 23 Layout of Set and Dictionary changed.
///
/// 24 Variable sized arrays for Decimal128.
/// 24 Variable sized arrays for Decimal128. Sort order of Strings changed.
///
/// IMPORTANT: When introducing a new file format version, be sure to review
/// the file validity checks in Group::open() and DB::do_open, the file
Expand Down
12 changes: 5 additions & 7 deletions src/realm/index_string.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1061,7 +1061,7 @@ bool StringIndex::leaf_insert(ObjKey obj_key, key_type key, size_t offset, Strin
if (index_data == index_data_2 || suboffset > s_max_offset) {
// These strings have the same prefix up to this point but we
// don't want to recurse further, create a list in sorted order.
bool row_ndx_first = value.compare_signed(v2) < 0;
bool row_ndx_first = value < v2;
Array row_list(alloc);
row_list.create(Array::type_Normal); // Throws
row_list.add(row_ndx_first ? obj_key.value : obj_key2.value);
Expand Down Expand Up @@ -1553,11 +1553,9 @@ bool SortedListComparator::operator()(int64_t key_value, Mixed needle) // used i
if (a == needle)
return false;

// The Mixed::operator< uses a lexicograpical comparison, therefore we
// cannot use our utf8_compare here because we need to be consistent with
// using the same compare method as how these strings were they were put
// into this ordered column in the first place.
return a.compare_signed(needle) < 0;
// We need to be consistent with using the same compare method as how these strings
// were they were put into this ordered column in the first place.
return a < needle;
}


Expand Down Expand Up @@ -1621,7 +1619,7 @@ void StringIndex::verify() const
while (it != it_end) {
Mixed it_data = get(ObjKey(*it));
size_t it_row = to_size_t(*it);
REALM_ASSERT(previous.compare_signed(it_data) <= 0);
REALM_ASSERT(previous <= it_data);
if (it != sub.cbegin() && previous == it_data) {
REALM_ASSERT_EX(it_row > last_row, it_row, last_row);
}
Expand Down
1 change: 0 additions & 1 deletion src/realm/index_string.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -315,7 +315,6 @@ class StringIndex {
void do_delete(ObjKey key, StringData, size_t offset);

Mixed get(ObjKey key) const;

void node_add_key(ref_type ref);

#ifdef REALM_DEBUG
Expand Down
62 changes: 38 additions & 24 deletions src/realm/mixed.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -35,29 +35,30 @@ static const int sorting_rank[19] = {
0, // type_Bool = 1,
2, // type_String = 2,
-1,
2, // type_Binary = 4,
3, // type_Binary = 4,
-1, // type_OldTable = 5,
-1, // type_Mixed = 6,
-1, // type_OldDateTime = 7,
3, // type_Timestamp = 8,
4, // type_Timestamp = 8,
1, // type_Float = 9,
1, // type_Double = 10,
1, // type_Decimal = 11,
7, // type_Link = 12,
8, // type_Link = 12,
-1, // type_LinkList = 13,
-1,
4, // type_ObjectId = 15,
6, // type_TypedLink = 16
5, // type_UUID = 17
5, // type_ObjectId = 15,
7, // type_TypedLink = 16
6, // type_UUID = 17

// Observe! Changing these values breaks the file format for Set<Mixed>
};

int compare_string(StringData a, StringData b)
{
// Observe! Changing these values breaks the file format for Set<Mixed> and the StringIndex
if (a == b)
return 0;
return utf8_compare(a, b) ? -1 : 1;
return a < b ? -1 : 1;
}

int compare_binary(BinaryData a, BinaryData b)
Expand Down Expand Up @@ -181,9 +182,6 @@ bool Mixed::data_types_are_comparable(DataType l_type, DataType r_type)
if (is_numeric(l_type, r_type)) {
return true;
}
if ((l_type == type_String && r_type == type_Binary) || (r_type == type_String && l_type == type_Binary)) {
return true;
}
if (l_type == type_Mixed || r_type == type_Mixed) {
return true; // Mixed is comparable with any type
}
Expand Down Expand Up @@ -224,7 +222,7 @@ bool Mixed::accumulate_numeric_to(Decimal128& destination) const noexcept

int Mixed::compare(const Mixed& b) const noexcept
{
// Observe! Changing this function breaks the file format for Set<Mixed>
// Observe! Changing this function breaks the file format for Set<Mixed> and the StringIndex

if (is_null()) {
return b.is_null() ? 0 : -1;
Expand Down Expand Up @@ -258,9 +256,9 @@ int Mixed::compare(const Mixed& b) const noexcept
case type_String:
if (b.get_type() == type_String)
return compare_string(get<StringData>(), b.get<StringData>());
[[fallthrough]];
break;
case type_Binary:
if (b.get_type() == type_String || b.get_type() == type_Binary)
if (b.get_type() == type_Binary)
return compare_binary(get<BinaryData>(), b.get<BinaryData>());
break;
case type_Float:
Expand Down Expand Up @@ -343,17 +341,7 @@ int Mixed::compare(const Mixed& b) const noexcept
// Using rank table will ensure that all numeric values are kept together
return (sorting_rank[m_type] > sorting_rank[b.m_type]) ? 1 : -1;

// Observe! Changing this function breaks the file format for Set<Mixed>
}

int Mixed::compare_signed(const Mixed& b) const noexcept
{
if (is_type(type_String) && b.is_type(type_String)) {
auto a_val = get_string();
auto b_val = b.get_string();
return a_val == b_val ? 0 : a_val < b_val ? -1 : 1;
}
return compare(b);
// Observe! Changing this function breaks the file format for Set<Mixed> and the StringIndex
}

template <>
Expand Down Expand Up @@ -420,6 +408,32 @@ Decimal128 Mixed::export_to_type() const noexcept
return {};
}

template <>
StringData Mixed::export_to_type() const noexcept
{
REALM_ASSERT(m_type);
if (is_type(type_String)) {
return string_val;
}
else if (is_type(type_Binary)) {
return StringData(binary_val.data(), binary_val.size());
}
return {};
}

template <>
BinaryData Mixed::export_to_type() const noexcept
{
REALM_ASSERT(m_type);
if (is_type(type_String)) {
return BinaryData(string_val.data(), string_val.size());
}
else if (is_type(type_Binary)) {
return binary_val;
}
return {};
}

template <>
util::Optional<int64_t> Mixed::get<util::Optional<int64_t>>() const noexcept
{
Expand Down
11 changes: 3 additions & 8 deletions src/realm/mixed.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -211,10 +211,8 @@ class Mixed {
bool accumulate_numeric_to(Decimal128& destination) const noexcept;
bool is_unresolved_link() const noexcept;
bool is_same_type(const Mixed& b) const noexcept;
// Will use utf8_compare for strings
// Will use unsigned lexicographical comparison for strings
int compare(const Mixed& b) const noexcept;
// Will compare strings as arrays of signed chars
int compare_signed(const Mixed& b) const noexcept;
friend bool operator==(const Mixed& a, const Mixed& b) noexcept
{
return a.compare(b) == 0;
Expand Down Expand Up @@ -656,11 +654,8 @@ inline BinaryData Mixed::get<BinaryData>() const noexcept
{
if (is_null())
return BinaryData();
if (get_type() == type_Binary) {
return binary_val;
}
REALM_ASSERT(get_type() == type_String);
return BinaryData(string_val.data(), string_val.size());
REALM_ASSERT(get_type() == type_Binary);
return binary_val;
}

inline BinaryData Mixed::get_binary() const noexcept
Expand Down
2 changes: 1 addition & 1 deletion src/realm/object-store/sectioned_results.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -343,7 +343,7 @@ void create_buffered_key(Mixed& key, std::list<std::string>& buffer, StringType
key = StringType("", 0);
}
else {
key = buffer.emplace_back(value.data(), value.size());
key = StringType(buffer.emplace_back(value.data(), value.size()).data(), value.size());
}
}

Expand Down
47 changes: 28 additions & 19 deletions src/realm/parser/driver.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -975,6 +975,21 @@ std::unique_ptr<Subexpr> AggrNode::aggregate(Subexpr* subexpr)
return agg;
}

void ConstantNode::decode_b64(util::UniqueFunction<void(StringData)> callback)
{
const size_t encoded_size = text.size() - 5;
size_t buffer_size = util::base64_decoded_size(encoded_size);
std::string decode_buffer(buffer_size, char(0));
StringData window(text.c_str() + 4, encoded_size);
util::Optional<size_t> decoded_size = util::base64_decode(window, decode_buffer.data(), buffer_size);
if (!decoded_size) {
throw SyntaxError("Invalid base64 value");
}
REALM_ASSERT_DEBUG_EX(*decoded_size <= encoded_size, *decoded_size, encoded_size);
decode_buffer.resize(*decoded_size); // truncate
callback(StringData(decode_buffer.data(), decode_buffer.size()));
}

std::unique_ptr<Subexpr> ConstantNode::visit(ParserDriver* drv, DataType hint)
{
std::unique_ptr<Subexpr> ret;
Expand Down Expand Up @@ -1087,25 +1102,9 @@ std::unique_ptr<Subexpr> ConstantNode::visit(ParserDriver* drv, DataType hint)
break;
}
case Type::BASE64: {
const size_t encoded_size = text.size() - 5;
size_t buffer_size = util::base64_decoded_size(encoded_size);
std::string decode_buffer(buffer_size, char(0));
StringData window(text.c_str() + 4, encoded_size);
util::Optional<size_t> decoded_size = util::base64_decode(window, decode_buffer.data(), buffer_size);
if (!decoded_size) {
throw SyntaxError("Invalid base64 value");
}
REALM_ASSERT_DEBUG_EX(*decoded_size <= encoded_size, *decoded_size, encoded_size);
decode_buffer.resize(*decoded_size); // truncate
if (hint == type_String) {
ret = std::make_unique<ConstantStringValue>(StringData(decode_buffer.data(), decode_buffer.size()));
}
if (hint == type_Binary) {
ret = std::make_unique<ConstantBinaryValue>(BinaryData(decode_buffer.data(), decode_buffer.size()));
}
if (hint == type_Mixed) {
ret = std::make_unique<ConstantBinaryValue>(BinaryData(decode_buffer.data(), decode_buffer.size()));
}
decode_b64([&](StringData decoded) {
ret = std::make_unique<ConstantStringValue>(decoded);
});
break;
}
case Type::TIMESTAMP: {
Expand Down Expand Up @@ -1297,6 +1296,16 @@ std::unique_ptr<Subexpr> ConstantNode::visit(ParserDriver* drv, DataType hint)
}
break;
}
case BINARY_STR: {
std::string str = text.substr(1, text.size() - 2);
ret = std::make_unique<ConstantBinaryValue>(BinaryData(str.data(), str.size()));
break;
}
case BINARY_BASE64:
decode_b64([&](StringData decoded) {
ret = std::make_unique<ConstantBinaryValue>(BinaryData(decoded.data(), decoded.size()));
});
break;
}
if (!ret) {
throw InvalidQueryError(
Expand Down
5 changes: 5 additions & 0 deletions src/realm/parser/driver.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -157,6 +157,8 @@ class ConstantNode : public ValueNode {
FLOAT,
STRING,
BASE64,
BINARY_STR,
BINARY_BASE64,
TIMESTAMP,
UUID_T,
OID,
Expand Down Expand Up @@ -194,6 +196,9 @@ class ConstantNode : public ValueNode {
std::unique_ptr<Subexpr> visit(ParserDriver*, DataType) override;
util::Optional<ExpressionComparisonType> m_comp_type;
std::string target_table;

private:
void decode_b64(util::UniqueFunction<void(StringData)>);
};

class ListNode : public ValueNode {
Expand Down
Loading