Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Preserve dotted-key ordering #808

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

mxndtaylor
Copy link

@mxndtaylor mxndtaylor commented Nov 10, 2024

Fixes #163

Based on @epage's WIP work from #165.

Admittedly not the most efficient solution as it adds another Vec::sort_by_key to the Table::append_values and InlineTable::append_values functions, sorting the Vecs after the values have just been inserted into them.

The slightly more efficient Vec::sort_by_key is not necessarily possible since it might move non-dotted keys all together, thus again breaking the original document's order. I was incorrect about this, and have used Vec::sort_by_key instead

A better solution might be to insert the values into the values Vecs in the intended order, but this would be much more involved.

Note:
This change failed some tests from the edit functionality, specifically, Table::{sort_values, sort_values_by} and the InlineTable versions. So, I created a targeted fix for that, where it just sets the Key.position property to be the index in the iterator over its keys.

The implication here is that any future changes that involve key re-orderings on Tables will have to modify Key.position value, instead of just their order in the items: IndexMap.

In fact, because of this, we could remove the ordering of the IndexMap completely now, and instead rely on the position value, thereby reducing the chances of future confusion.

Without a more comprehensive change like that, I have a feeling there might be an untested use case here somewhere, if something else I'm not aware of is modifying the Key order in Table.items or InlineTable.items, then this will now fail.

crates/toml_edit/src/key.rs Fixed Show fixed Hide fixed
crates/toml_edit/src/key.rs Fixed Show fixed Hide fixed
crates/toml_edit/src/key.rs Fixed Show fixed Hide fixed
crates/toml_edit/src/key.rs Fixed Show fixed Hide fixed
crates/toml_edit/src/table.rs Fixed Show fixed Hide fixed
crates/toml_edit/src/table.rs Fixed Show fixed Hide fixed
@coveralls
Copy link

coveralls commented Nov 10, 2024

Pull Request Test Coverage Report for Build 12060000582

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

Details

  • 70 of 102 (68.63%) changed or added relevant lines in 7 files are covered.
  • 2 unchanged lines in 2 files lost coverage.
  • Overall coverage increased (+0.1%) to 68.211%

Changes Missing Coverage Covered Lines Changed/Added Lines %
crates/toml_edit/tests/testsuite/edit.rs 2 3 66.67%
crates/toml_edit/src/inline_table.rs 1 3 33.33%
crates/toml_edit/src/key.rs 7 9 77.78%
crates/toml_edit/src/internal_table.rs 51 78 65.38%
Files with Coverage Reduction New Missed Lines %
crates/serde_spanned/src/spanned.rs 1 56.45%
crates/toml_datetime/src/datetime.rs 1 79.72%
Totals Coverage Status
Change from base Build 11667065238: 0.1%
Covered Lines: 3783
Relevant Lines: 5546

💛 - Coveralls

@mxndtaylor
Copy link
Author

Just to add a note about the additional changes, I've switched the Vec::sort_by to Vec::sort_by_key using usize::MAX for the key when position=None.

Originally, I used Vec::sort_by because I thought that any constant value for position=None would cause problems as it wouldn't consider the insertion order, and just move those Keys to the end of the output. And this is largely true, however:

  1. every item parsed by this library will have position=Some
  2. all items manually constructed with this library will have position=None by default
  3. returning Ordering::Equal from Vec::sort_by for comparing position=Some to position=None does not result in a preserving of the order (nor does returning any of the other Ordering values)

If the Key objects are either all from the parser or all manually constructed, then the original Vec::sort_by_key and the previous Vec::sort_by implementations result in identical output.

If the Key objects are mixed, where some are from the parser and some are manually constructed, we would get behavior like this:

let doc = r#"foo.bar = 1

foo.baz.qwer = "a"
goodbye = { forever="no" }

foo.pub = 2
foo.baz.asdf = "b"
"#.parse<MutDocument>().unwrap();

let root = doc.as_table_mut();
root["foo"]["baz"]["zxcv"] = "CCC"
assert_data_eq!(doc.to_string(), r#"foo.bar = 1

foo.baz.qwer = "a"
foo.baz.asdf = "b"
foo.baz.zxcv = "CCC"
goodbye = { forever="no" }

foo.pub = 2
"#);

where foo.baz.asdf and foo.baz.qwer were moved when we made no modification to them. The fix is to consider position=None as simply "greater" than position=Some which can more easily be implemented with Vec::sort_by_key.


The only remaining problem is a somewhat odd one, that would not normally come up, which is to merge tables by iterating through the second table's keys manually like so:

let mut origin_doc = r#"
foo = 1
bar = 2
"#.parse::<DocumentMut>().unwrap();
let mut additional_doc = r#"
baz = 3
qux = 4
"#.parse::<DocumentMut>().unwrap();

let origin_table = origin_doc.as_table_mut();

// if we use origin_table.extend(addition_doc.iter()) this does not occur
// it only occurs when we directly get the key from its parsed 
additional_doc.iter().for_each(|(k, v)| {
    // we get the actually `Key` here with its position
    let (key, _) = additional_doc.get_key_value(k).unwrap();
    origin_table.insert_formatted(key, v.clone());
});

// baz (with position=Some(0) from the second table) gets inserted before 
// bar (with position=Some(1) from the first table)
assert_data_eq!(origin_doc.to_string(), r#"
foo = 1
baz = 3
bar = 2
qux = 4
"#);

But as noted, the more canonical process of using Table.extend(Table.iter()) does not have this problem, so I think this is fine.

crates/toml_edit/src/table.rs Fixed Show fixed Hide fixed
crates/toml_edit/src/table.rs Fixed Show fixed Hide fixed
crates/toml_edit/src/table.rs Fixed Show fixed Hide fixed
crates/toml_edit/src/table.rs Fixed Show fixed Hide fixed
crates/toml_edit/src/table.rs Fixed Show fixed Hide fixed
crates/toml_edit/src/table.rs Fixed Show fixed Hide fixed
@mxndtaylor
Copy link
Author

not really sure if that comment should be a doc or not, maybe I should have just made it into a non-doc comment, but I just did what the linter said to instead, so hopefully that fixes it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, because of this, we could remove the ordering of the IndexMap completely now, and instead rely on the position value, thereby reducing the chances of future confusion.

Table and InlineTable have the semantics of an IndexMap. We need to preserve the order of content as it gets inserted. We could set a position on insertion.

Also, I'm unsure about allowing key positions to be moved between tables because the position is dependent on the other values within the same table and you can't easily translate that from one table to the next.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could set a position on insertion.

I think I missed this when I was initially reading your feedback, I did not investigate this as an option yet, I'll take a look at this in the next week or so when I can.

@epage
Copy link
Member

epage commented Nov 12, 2024

Could you clean up the commits for how you would want me to review and merge this?

My recommendation is to split up commits so that

  • The first commit is a test (that passes), showing the current behavior
  • Any code changes from the fix you can pull out into refactoring commits
  • The actual fix which would also include updating the test to now pass

@epage
Copy link
Member

epage commented Nov 12, 2024

If the Key objects are mixed, where some are from the parser and some are manually constructed, we would get behavior like this:

This is a very common case and is one the motivating reasons for this crate. Grouping dotted keys, rather than treating a dotted key as something to add at the end, is a more natural behavior. We currently get it for free. We don't always guarantee a more natural way of doing things though. However, if we were to change it, it would likely be a breaking behavior change.

Note:
- user-defined ordering of dotted keys for related options make comments about them useless after processing
- reformatting dotted keys in inline tables has side effects on their decor
- when grouping keys, they are not sorted, making the potentially helpful reformatting incomplete
- adding better failure reporting in testsuite table parsing
- centralize ordering logic into internal table to improve code re-use between Table and InlineTable
@mxndtaylor
Copy link
Author

mxndtaylor commented Nov 28, 2024

@epage I've reorganized the commits, and also added a fair bit of refactoring, to move the impacted duplicated code blocks between table.rs and inline_table.rs into internal_table.rs.

My intention with internal_table is that it is not part of the public api for the crate and that for any future changes that affect both the Table and InlineTable implementations of methods with the same name, that method's body could be moved to internal_table.rs and amended there.

Hopefully, that's matching with the appearances there.

Grouping dotted keys, rather than treating a dotted key as something to add at the end, is a more natural behavior.

I see your point here, and I tried to keep the behavior of inserted dotted keys being grouped while only parsed dotted keys were kept where they were, but I could not find a way that I was happy with:

Sorting Method position=None behavior Result
Vec::sort_by Ordering::Equal any key with position=None groups all (or most in some cases, which was even stranger) dotted keys in that table
Vec::sort_by Ordering::Greater or Ordering::Less all keys with position=None go to top or bottom of the document
Vec::sort_by_key constant usize::MAX all keys with position=None go to the bottom of the document

So, options 1 or 3 are probably the best in my eyes, so if 3 (what is currently implemented) is a breaking change I'll look into going with option 1 and see if I can narrow down some the weirdness, or find a novel approach.

Copy link

@github-advanced-security github-advanced-security bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clippy found more than 20 potential problems in the proposed changes. Check the Files changed tab for more details.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this commit type would be test

Comment on lines 235 to 236
}
#[test]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

newline

Comment on lines +4 to +7
use crate::table::KeyValuePairs;

/// GetTableValues provides the logic for displaying a table's items in their parsed order
pub(crate) trait GetTableValues {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this is a roundabout way to share code for manipulating KeyValuePairs. Seems like the functionality should just be writtenm for KeyValuePairs and shared

imo if I were to do this, I would

  • Move KeyValuePairs into here
  • Either rename the file to be about KeyValuePairs or rename the type (though not a fan of InternalTable though InnerTable might work)
  • Either make get_values and sort_values either non-member functions that take KeyValuePairs or make KeyValuePairs a newtype (which might add its own complexity)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to split this refactor out into its own commit. It'll likely make it easier to review and get merged which can speed up reviewing both PRs.

Also, please consider if there are intermediate steps to the above that should be broken out into their own commit (e.g. adding a newtype of we go that route)

@epage
Copy link
Member

epage commented Dec 13, 2024

So, options 1 or 3 are probably the best in my eyes, so if 3 (what is currently implemented) is a breaking change I'll look into going with option 1 and see if I can narrow down some the weirdness, or find a novel approach.

In case you were waiting on an answer, yes it is a breaking change. I plan to make a breaking change "soon" but haven't had time to organize it and doubt you want to wait on it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Dotted key ordering isn't always preserved
3 participants