-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DBE-3085 added Dockerfile and files, scripts for testing #4
Conversation
Will add some example of usage to help with the reviews |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes are good as POC to demonstrate the concept, however
- complete implication should involve modifying data-diff code , so it would be possible to use it as
data-diff DB1 TABLE1 DB2 TABLE2 [options]
or
# Optional: Set logging to display the progress of the diff
import logging
logging.basicConfig(level=logging.INFO)
from data_diff import connect_to_table, diff_tables
table1 = connect_to_table("postgresql:///", "table_name", "id")
table2 = connect_to_table("mysql:///", "table_name", "id")
for different_row in diff_tables(table1, table2):
plus_or_minus, columns = different_row
print(plus_or_minus, columns)
and the --stop-at-top-level
flag should be possible to add as one of the command line options or class parameters, or config file option.
Current implementation involves creating a separate utility data-check.py
that adds its own parameters parsing and uses subset of data-diff functionality.
In short, this should be implemented as a change to data-diff utility itself, not as an additional utility.
- Testing
The data-diff code is covered by tests, for example parsing config file options is covered in test_config.py and actual checking of the difference in tables is implemented in test_diff_tables.py . We testes should be extended to cover new functionality that is being added.
PS: caveat is that tests currently do not run on our repo, perhaps we need to work on a PR to add workflow that would enable tests
Modified the test file to use HashDiffer instead of the non-existent diff_json function. import pytest
from data_diff.hashdiff_tables import HashDiffer
def test_stop_at_top_level():
differ = HashDiffer(stop_at_top_level=True)
# Test data with nested structures
data1 = [
{"top1": {"nested": "value1"},
"top2": {"nested": "value2"}}
]
data2 = [
{"top1": {"nested": "changed1"},
"top2": {"nested": "value2"}}
]
# Test without stop_at_top_level
differ.stop_at_top_level = False
full_diff = differ.diff_tables(data1, data2)
assert any("nested" in str(d) for d in full_diff)
# Test with stop_at_top_level
differ.stop_at_top_level = True
top_level_diff = differ.diff_tables(data1, data2)
assert any("top1" in str(d) for d in top_level_diff)
assert not any("nested" in str(d) for d in top_level_diff)
def test_stop_at_top_level_equal_nested():
differ = HashDiffer(stop_at_top_level=True)
# Test case where nested values are equal
data1 = [
{"top": {"nested": "same"}}
]
data2 = [
{"top": {"nested": "same"}}
]
diff = differ.diff_tables(data1, data2)
assert len(diff) == 0 # Should show no differences Key changes made:
|
@ChaosHour , good news I was able to fix the tests and they actually work |
60d060f
to
0edc927
Compare
@@ -0,0 +1,38 @@ | |||
import pytest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-import pytest
+import unittest
DBE-3085
Add_functionality_chunk_level