Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ORM: Use skip_orm as the default implementation for SqlaGroup.add_nodes and SqlaGroup.remove_nodes #6720

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

rabbull
Copy link
Contributor

@rabbull rabbull commented Jan 21, 2025

This PR removes the skip_orm flag and adopts the optimized approach behind it as the default behavior for the add_nodes and remove_nodes methods.

As discussed in #5453, the ORM-based implementations of add_nodes and remove_nodes were found to be inefficient, and to address this, the skip_orm flag was introduced in b40b2ca and bced84e. This flag enabled a faster, core API-based approach for these operations.

Over the years, no users have reported bugs related to this method, and the existing tests are sufficiently comprehensive to ensure its safety. Consequently, it is a reasonable decision to make the skip_orm implementation the default for bulk insertion and removal of elements within a group.

Closes #5453.

@rabbull
Copy link
Contributor Author

rabbull commented Jan 21, 2025

The existing tests related are:

    def test_add_nodes(self):
        """Test different ways of adding nodes."""
        node_01 = orm.Data().store()
        node_02 = orm.Data().store()
        node_03 = orm.Data().store()
        nodes = [node_01, node_02, node_03]

        group = orm.Group(label='test_adding_nodes').store()
        # Single node
        group.add_nodes(node_01)
        # List of nodes
        group.add_nodes([node_02, node_03])

        # Check
        assert set(_.pk for _ in nodes) == set(_.pk for _ in group.nodes)

        # Try to add a node that is already present: there should be no problem
        group.add_nodes(node_01)
        assert set(_.pk for _ in nodes) == set(_.pk for _ in group.nodes)

    def test_remove_nodes(self):
        """Test node removal."""
        node_01 = orm.Data().store()
        node_02 = orm.Data().store()
        node_03 = orm.Data().store()
        node_04 = orm.Data().store()
        nodes = [node_01, node_02, node_03]
        group = orm.Group(label=uuid.uuid4().hex).store()

        # Add initial nodes
        group.add_nodes(nodes)
        assert set(_.pk for _ in nodes) == set(_.pk for _ in group.nodes)

        # Remove a node that is not in the group: nothing should happen
        group.remove_nodes(node_04)
        assert set(_.pk for _ in nodes) == set(_.pk for _ in group.nodes)

        # Remove one orm.Node
        nodes.remove(node_03)
        group.remove_nodes(node_03)
        assert set(_.pk for _ in nodes) == set(_.pk for _ in group.nodes)

        # Remove a list of Nodes and check
        nodes.remove(node_01)
        nodes.remove(node_02)
        group.remove_nodes([node_01, node_02])
        assert set(_.pk for _ in nodes) == set(_.pk for _ in group.nodes)

This looks sufficient to me.

Copy link

codecov bot commented Jan 21, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 78.07%. Comparing base (f43a510) to head (5516af8).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #6720      +/-   ##
==========================================
- Coverage   78.09%   78.07%   -0.01%     
==========================================
  Files         564      564              
  Lines       42544    42524      -20     
==========================================
- Hits        33219    33198      -21     
- Misses       9325     9326       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@rabbull rabbull marked this pull request as ready for review January 21, 2025 23:28
@rabbull
Copy link
Contributor Author

rabbull commented Jan 21, 2025

A test case related to cmd tools found that skip_orm doesn't support adding nothing very well, so a few more test cases are added and the bug is fixed.

@rabbull rabbull requested a review from GeigerJ2 January 21, 2025 23:41
Copy link
Contributor

@GeigerJ2 GeigerJ2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I'll give it another review, and then I think we can merge it. Though, I would leave this in main for now, and not directly include it in the upcoming v2.7.0 release, and see if we encounter any issues. As the skip_orm flag was never really used anywhere or documented, we don't really have experience with the implementation over the years (although the implementation looks solid, indeed).

@@ -177,6 +181,10 @@ def test_remove_nodes(self):
group.remove_nodes([node_01, node_02])
assert set(_.pk for _ in nodes) == set(_.pk for _ in group.nodes)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wondering as, at this point, the nodes list is already an empty list and the group.nodes already an empty Group, if it's really necessary to compare the sets of pks, and if it might be worth initially adding another node to avoid comparing two empty collections here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You got a point. I'd say that another case, where at least one node is remained in the group, should be added, and meanwhile this case is also worth being kept, because it covers the scenario of removing all elements of a group (boundary testing).

assert set(_.pk for _ in nodes) == set(_.pk for _ in group.nodes)


def test_add_nodes_skip_orm_batch():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering if we still should have this check? Did you just remove all _skip_orm_ tests here, and keep the default tests, assuming that if they pass, then the updated implementation should be fine, as well, or was there a deeper meaning to removing this one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's just that I found the function of both test group are overlapping. I didn't find a scenario that is only covered the skip_orm tests but in default tests. However, we could add a benchmark for large batches insertion/removal, as the main reason we switch to this implementation is to increase the performance.

Copy link
Contributor Author

@rabbull rabbull Jan 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops I just notice that I ignored the batch_size arguments, as this argument seems to be passed through directly to the lower layers. I'll look into it tonight.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's why I asked bc I didn't see the batch_size argument anywhere in the remaining tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use/remove SqlaGroup.add_nodes/remove_nodess skip_orm flag?
2 participants