Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XML:Lang and attribute type as an unique key? #3

Open
pedro93 opened this issue Mar 15, 2016 · 3 comments
Open

XML:Lang and attribute type as an unique key? #3

pedro93 opened this issue Mar 15, 2016 · 3 comments

Comments

@pedro93
Copy link

pedro93 commented Mar 15, 2016

Hello, i've been going over the code of this tool which seems extremely useful and I came across a part of the code which does not seem to make much sense.

https://github.com/fdintino/xydiff/blob/master/src/Diff_NodesManager.cpp#L243

Why when registering a sub tree, if a node has the attribute type then it is classified as a unique key to that node. If using this tool to diff html files for instance, we can define multiple nodes with an attribute type with the same value: "text".

A simple html form would create this situation.

Shouldn't the if merely test if an id attribute exists, that is the only guaranteed unique attribute in the whole document, assuming a valid and html compliant file.

Kind regards :)

@fdintino
Copy link
Owner

xydiff is written to work for any xml document whatsoever. Theoretically it would be possible to pull an xml schema off of an xsi:schemaLocation attribute, parse the XML Schema, figure out what attributes have type xsd:ID, and potentially take advantage of that knowledge to construct a more optimal delta.

But that isn't what's happening here. The commit that introduced this change occurred six years ago, and I honestly can't recall my thinking. You are correct that it looks wrong. I'll see if I can't dig deeper and then possibly remove this code. Do you happen to have a test case for a bug that this causes?

@pedro93
Copy link
Author

pedro93 commented Mar 16, 2016

I do not know if this constitutes a bug, I am using this tool to detect differences in html files scrapped from the internet. Due to the nature of such files, a xml schema is rarely associated. I tried to create a test case for the bug when comparing the following to files:

<!DOCTYPE html>
<html>
<head>
    <title>Text Input Control</title>
</head>
<body>
    <form >
        First name:  <input type="text" name="first_name" />
        <br>
        Last name:  <input type="text" name="last_name" />
    </form>
</body>
</html>

against

<!DOCTYPE html>
<html>
<head>
    <title>Text Input Control</title>
</head>
<body>
    <form >
        Last name:  <input type="text" name="last_name" />
        <br>
        First name:  <input type="text" name="first_name" />
    </form>
</body>
</html>

The given output is:

<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
<xy:unit_delta xmlns:xy="urn:schemas-xydiff:xydelta">

  <xy:t fromXidMap="(1-3;6-7;12-13;8-11|14)">
    <xy:d move="yes" par="8" pos="2" xm="(7)"/>
    <xy:d move="yes" par="8" pos="1" xm="(6)"/>
    <xy:d par="9" pos="2" xm="(5)">
      <input name="first_name" type="text"/>
    </xy:d>
    <xy:d par="9" pos="1" xm="(4)">
        First name:  </xy:d>
    <xy:i move="yes" par="9" pos="1" xm="(6)"/>

    <xy:i move="yes" par="9" pos="2" xm="(7)"/>
    <xy:i par="8" pos="1" xm="(12)">
        First name:  </xy:i>
    <xy:i par="8" pos="2" xm="(13)">
      <input name="first_name" type="text"/>
    </xy:i>
  </xy:t>

</xy:unit_delta>

Isn't this output of inserts and deletes a consequence of the algorithm trying to exactly match 2 seemingly different nodes because of the assumed uniqueness of attribute "type"?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants