XML:Lang and attribute type as an unique key? #3

pedro93 · 2016-03-15T11:58:36Z

Hello, i've been going over the code of this tool which seems extremely useful and I came across a part of the code which does not seem to make much sense.

https://github.com/fdintino/xydiff/blob/master/src/Diff_NodesManager.cpp#L243

Why when registering a sub tree, if a node has the attribute type then it is classified as a unique key to that node. If using this tool to diff html files for instance, we can define multiple nodes with an attribute type with the same value: "text".

A simple html form would create this situation.

Shouldn't the if merely test if an id attribute exists, that is the only guaranteed unique attribute in the whole document, assuming a valid and html compliant file.

Kind regards :)

fdintino · 2016-03-15T17:41:20Z

xydiff is written to work for any xml document whatsoever. Theoretically it would be possible to pull an xml schema off of an xsi:schemaLocation attribute, parse the XML Schema, figure out what attributes have type xsd:ID, and potentially take advantage of that knowledge to construct a more optimal delta.

But that isn't what's happening here. The commit that introduced this change occurred six years ago, and I honestly can't recall my thinking. You are correct that it looks wrong. I'll see if I can't dig deeper and then possibly remove this code. Do you happen to have a test case for a bug that this causes?

pedro93 · 2016-03-16T09:48:35Z

I do not know if this constitutes a bug, I am using this tool to detect differences in html files scrapped from the internet. Due to the nature of such files, a xml schema is rarely associated. I tried to create a test case for the bug when comparing the following to files:

<!DOCTYPE html>
<html>
<head>
    <title>Text Input Control</title>
</head>
<body>
    <form >
        First name:  <input type="text" name="first_name" />
        <br>
        Last name:  <input type="text" name="last_name" />
    </form>
</body>
</html>

against

<!DOCTYPE html>
<html>
<head>
    <title>Text Input Control</title>
</head>
<body>
    <form >
        Last name:  <input type="text" name="last_name" />
        <br>
        First name:  <input type="text" name="first_name" />
    </form>
</body>
</html>

The given output is:

<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
<xy:unit_delta xmlns:xy="urn:schemas-xydiff:xydelta">

  <xy:t fromXidMap="(1-3;6-7;12-13;8-11|14)">
    <xy:d move="yes" par="8" pos="2" xm="(7)"/>
    <xy:d move="yes" par="8" pos="1" xm="(6)"/>
    <xy:d par="9" pos="2" xm="(5)">
      <input name="first_name" type="text"/>
    </xy:d>
    <xy:d par="9" pos="1" xm="(4)">
        First name:  </xy:d>
    <xy:i move="yes" par="9" pos="1" xm="(6)"/>

    <xy:i move="yes" par="9" pos="2" xm="(7)"/>
    <xy:i par="8" pos="1" xm="(12)">
        First name:  </xy:i>
    <xy:i par="8" pos="2" xm="(13)">
      <input name="first_name" type="text"/>
    </xy:i>
  </xy:t>

</xy:unit_delta>

Isn't this output of inserts and deletes a consequence of the algorithm trying to exactly match 2 seemingly different nodes because of the assumed uniqueness of attribute "type"?

pedro93 · 2016-03-31T14:54:52Z

This can be changed by modifying the lines:
https://github.com/fdintino/xydiff/blob/master/src/Diff_NodesManager.cpp#L233
https://github.com/fdintino/xydiff/blob/master/src/Diff_NodesManager.cpp#L235 (change length of buffer aswell)
and
https://github.com/fdintino/xydiff/blob/master/src/Diff_NodesManager.cpp#L243

You may close this issue

pedro93 mentioned this issue Mar 29, 2016

NodesManager::FullBottomUp #4

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

XML:Lang and attribute type as an unique key? #3

XML:Lang and attribute type as an unique key? #3

pedro93 commented Mar 15, 2016

fdintino commented Mar 15, 2016

pedro93 commented Mar 16, 2016

pedro93 commented Mar 31, 2016

XML:Lang and attribute type as an unique key? #3

XML:Lang and attribute type as an unique key? #3

Comments

pedro93 commented Mar 15, 2016

fdintino commented Mar 15, 2016

pedro93 commented Mar 16, 2016

pedro93 commented Mar 31, 2016