Issue in LogisticRegression.Transform() returns true for all inputs #282

snives · 2016-08-21T21:10:32Z

In release 3.2.0.5706, when testing the LogisticRegression class in the trivial GeneralizedLinearRegression code sample it appears that the Transform() function is assigning every input to the same classification label.

To recreate the issue:

// Suppose we have the following data about some patients.
// The first variable is continuous and represent patient
// age. The second variable is dichotomic and give whether
// they smoke or not (This is completely fictional data).
double[][] input =
{
    new double[] { 55, 0 }, // 0 - no cancer
    new double[] { 28, 0 }, // 0
    new double[] { 65, 1 }, // 0
    new double[] { 46, 0 }, // 1 - have cancer
    new double[] { 86, 1 }, // 1
    new double[] { 56, 1 }, // 1
    new double[] { 85, 0 }, // 0
    new double[] { 33, 0 }, // 0
    new double[] { 21, 1 }, // 0
    new double[] { 42, 1 }, // 1
};

// We also know if they have had lung cancer or not, and 
// we would like to know whether smoking has any connection
// with lung cancer (This is completely fictional data).
double[] output =
{
    0, 0, 0, 1, 1, 1, 0, 0, 0, 1
};


// To verify this hypothesis, we are going to create a GLM
// regression model for those two inputs (age and smoking).
var regression = new GeneralizedLinearRegression(new ProbitLinkFunction(), inputs: 2);

// Next, we are going to estimate this model. For this, we
// will use the Iteratively Reweighted Least Squares method.
var teacher = new IterativeReweightedLeastSquares(regression);

// Now, we will iteratively estimate our model. The Run method returns
// the maximum relative change in the model parameters and we will use
// it as the convergence criteria.

double delta = 0;
do
{
    // Perform an iteration
    delta = teacher.Run(input, output);

} while (delta > 0.001);

//Test cases
double[][] samples =
{
    new double[] { 1, 0 },
    new double[] { 1, 1 },
    new double[] { 20, 0 },
    new double[] { 20, 1 },
    new double[] { 40, 0 },
    new double[] { 40, 1 },
    new double[] { 60, 0 },
    new double[] { 60, 1 },
    new double[] { 80, 0 },
    new double[] { 80, 1 },
};
for (int i = 0; i < samples.Length; i++)
{
    Console.WriteLine("Input({0},{1}) => Predicted label {2}",
        samples[i][0], samples[i][1], regression.Transform(samples[i]));
}

Output:

Input(1,0) => Predicted label True
Input(1,1) => Predicted label True
Input(20,0) => Predicted label True
Input(20,1) => Predicted label True
Input(40,0) => Predicted label True
Input(40,1) => Predicted label True
Input(60,0) => Predicted label True
Input(60,1) => Predicted label True
Input(80,0) => Predicted label True
Input(80,1) => Predicted label True

I've traced it back to the GeneralizedLinearRegression class, Decide() method, which adds 0.5 to the link function's inverse and passes that to the Classes.Decide() method. This causes all the distances to falling the range of [0.5 .. 1.5], which always end up being coerced to a (bool) true. It seems to label correctly if we instead subtract 0.5 from the link function's inverse. Alternatively, it could also be that the Classes.Decide() method should be comparing the distance to 1.0 instead of to 0.0, which would achieve the same result. However, I don't know what other implications this would have to the greater abstract nature of this class, otherwise I would commit the fix.

If anyone is intimately familiar with this part of code, please comment with your recommendation.

The text was updated successfully, but these errors were encountered:

cesarsouza · 2016-08-21T22:21:38Z

Oops, thanks for catching that, it is very likely that the addition of 0.5 is indeed incorrect. Normally the Decide function should cast a decision based on the sign of the input (negatives are false, positives are true). I will take a look at it tomorrow and continue fixing a few other bugs so we could have updated NuGet packages by the next weekend.

Additionally, the documentation also needs to be updated with the new preferred interface for creation regressions and classifiers... Thanks for pointing it out as well!

Regards,
Cesar

…inputs

cesarsouza · 2016-09-16T18:51:42Z

Fixed on release 3.3.0.

cesarsouza added a commit that referenced this issue Aug 22, 2016

GH-282: Issue in LogisticRegression.Transform() returns true for all …

eee2a28

…inputs

cesarsouza added the pending-release label Aug 25, 2016

cesarsouza closed this as completed Sep 16, 2016

cesarsouza removed the pending-release label Jul 9, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue in LogisticRegression.Transform() returns true for all inputs #282

Issue in LogisticRegression.Transform() returns true for all inputs #282

snives commented Aug 21, 2016

cesarsouza commented Aug 21, 2016

cesarsouza commented Sep 16, 2016

Issue in LogisticRegression.Transform() returns true for all inputs #282

Issue in LogisticRegression.Transform() returns true for all inputs #282

Comments

snives commented Aug 21, 2016

cesarsouza commented Aug 21, 2016

cesarsouza commented Sep 16, 2016