Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mixed comparison is not a valid total ordering #6118

Closed
RedBeard0531 opened this issue Dec 14, 2022 · 2 comments
Closed

Mixed comparison is not a valid total ordering #6118

RedBeard0531 opened this issue Dec 14, 2022 · 2 comments
Assignees
Labels

Comments

@RedBeard0531
Copy link
Contributor

RedBeard0531 commented Dec 14, 2022

It tries to compare binary and strings in the same group, however it uses a different ordering for string/string comparisons (our utf8_compare) than for bin/bin and bin/string comparisons (the latter two both use memcmp order). This breaks the normal mathematical laws of total orderings such as transitivity (a < b && b < c implies a < c), that a lot of code relies on for correctness. We solved this for Sets with a special case making all strings compare less than all binaries, but that only applied to the comparison internal to Sets. Among other issues, that means that the order of a Set<Mixed> now differs from the order of a sorted List<Mixed> which broke a test in realm-swift. We should solve this problem for all users of Mixed once and for all.

I think there are 3 reasonable options, but only the first avoids a file format bump, assuming we want Set to use the new order:

  1. Make Mixed comparison always consider strings less than binaries
  2. Use memcmp order for strings (which matches codepoint order because UTF8 is great) rather than our utf8_compare (which is broken)
  3. Do both (change the order for strings, but still keep them separate from binaries)

As proof that the existing order is broken, and not just odd, here is a test case showing that it violates transitivity:

TEST(Mixed_string_binary_compare) {
    const auto a = Mixed("a");
    const auto b = Mixed("B");
    const auto c = Mixed(BinaryData("C", 1));

    // For a total order at least one of these must be fail!
    CHECK_LESS(a.compare(b), 0);
    CHECK_LESS(b.compare(c), 0);
    CHECK_LESS(c.compare(a), 0);
}

And here is the result of passing all 6 permutations of those values to std::sort():

["a","B","C"] -> ["C","a","B"]
["a","C","B"] -> ["B","C","a"]
["B","a","C"] -> ["C","a","B"]
["B","C","a"] -> ["a","B","C"]
["C","a","B"] -> ["B","C","a"]
["C","B","a"] -> ["a","B","C"]

Because no values are equal, all 6 outputs should be the same. (Note that "C" is binary data even though we print it like a string)

@jedelbo
Copy link
Contributor

jedelbo commented Dec 14, 2022

Related to #2573. Option 3 will probably solve that as well.

@ironage ironage mentioned this issue Mar 22, 2023
3 tasks
This was referenced May 24, 2023
@ironage
Copy link
Contributor

ironage commented Oct 6, 2023

Fixed by #6670.
Will be released with next-major.

@ironage ironage closed this as completed Oct 6, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 21, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

3 participants