Skip to content
This repository has been archived by the owner on Nov 19, 2020. It is now read-only.

DecisionTrees\Pruning\ErrorBasedPruning: The subset of observations corresponding to a decision node may contains duplicates (2) #237

Closed
YaronK opened this issue May 25, 2016 · 2 comments

Comments

@YaronK
Copy link

YaronK commented May 25, 2016

There's possibly issue is with the ErrorBasedPruning class - The subset of observations corresponding to a decision node may contains duplicates. See a similar issue here.

This PDF referenced in the comments contains the algorithm.

In the implementation, when considering a specific node, after computing the baseline error, the error resulting from pruning the tree at the node, and the error resulting from replacing the nodes with its maxChild, if pruning the subtree or replacing the subtree with subtree rooted at the max child is advantageous, all the subsets of nodes descendant from said node are cleared and the observations are re-tracked.
This is a (partial) snippet of the 'compute' method:

if (Math.Abs(prune - baseline) < limit ||
    Math.Abs(replace - baseline) < limit)
{
    if (replace < prune)
    {
        // We should replace the subtree with its maximum child
        /* ... replacing ... */
    }
    else
    {
        // We should prune the subtree
        /* ... pruning ... */
    }

    foreach (var child in node)
        subsets[child].Clear();

    for (int i = 0; i < inputs.Length; i++)
        trackDecisions(node, inputs[i], i);
}

I think the last part may be faulty: the subsets of all node descendant from 'node', including 'node' itself, are cleared - that's all dandy. But instead of only re-tracking decisions of observations that are routed through 'node', the decisions of all the observations are re-tracked, even those that don't reach node (or its descendants).
Observations that are incorrectly added to added to subsets of 'node' and its descendants affect following computations (affect getMaxChild, for example).

@cesarsouza
Copy link
Member

Hi Yaronk,

Many thanks for reporting those issues and most importantly, for providing a pull request with the fixes! It was immensely appreciated. I will work to integrate them as soon as possible!

Regards,
Cesar

cesarsouza added a commit that referenced this issue Jun 1, 2016
Merging pull request for ErrorBasedPruning issues: 

Fixes GH-234, GH-235, GH-236. GH-237
@cesarsouza
Copy link
Member

Fixed on release 3.2.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants