Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TaxBrain produces incorrect results because it allows invalid reforms #630

Closed
martinholmer opened this issue Aug 20, 2017 · 8 comments
Closed
Labels

Comments

@martinholmer
Copy link
Contributor

martinholmer commented Aug 20, 2017

Consider the following reform:

// bad-reform.json contains a logically incorrect attempt to completely
// eliminate the income tax (leaving refundable credits unchanged)
{"policy": {
    "_II_rt1": {"2020": [0.0]},
    "_II_brk1": {"2020": [[9e99, 9e99, 9e99, 9e99, 9e99]]},
    "_CG_rt1": {"2020": [0.0]},
    "_CG_rt2": {"2020": [0.0]},
    "_CG_rt3": {"2020": [0.0]},
    "_CG_rt4": {"2020": [0.0]},
    "_AMT_rt1": {"2020": [0.0]},
    "_AMT_rt2": {"2020": [0.0]},
    "_AMT_CG_rt1": {"2020": [0.0]},
    "_AMT_CG_rt2": {"2020": [0.0]},
    "_AMT_CG_rt2": {"2020": [0.0]},
    "_AMT_CG_rt3": {"2020": [0.0]},
    "_AMT_CG_rt4": {"2020": [0.0]}
}}

When using Tax-Calculator (version at HEAD of master branch), the validation checking detects the logical inconsistency in the reform.

$ tc puf.csv 2020 --reform bad-reform.json
Traceback (most recent call last):
  File "/Users/mrh/anaconda/bin/tc", line 11, in <module>
    sys.exit(cli_tc_main())
  File "/Users/mrh/anaconda/lib/python2.7/site-packages/taxcalc/cli/tc.py", line 126, in cli_tc_main
    exact_calculations=args.exact)
  File "/Users/mrh/anaconda/lib/python2.7/site-packages/taxcalc/taxcalcio.py", line 200, in init
    pol.implement_reform(param_dict['policy'])
  File "/Users/mrh/anaconda/lib/python2.7/site-packages/taxcalc/policy.py", line 192, in implement_reform
    self._validate_parameter_values()
  File "/Users/mrh/anaconda/lib/python2.7/site-packages/taxcalc/policy.py", line 367, in _validate_parameter_values
    pvalue[idx], vvalue[idx]))
ValueError: 2020 _II_brk1 value 9e+99 > max value 40605.4
$

The error message ValueError: 2020 _II_brk1 value 9e+99 > max value 40605.4 gives a strong hint about what the problem is.

But if you hand enter this reform into the TaxBrain GUI input page, you would be given a vague warning about some unspecified parameters having questionable values and given an opportunity to click on the "Show Me the Results!" button a second time to simulate the reform. And here would be the totally incorrect results you would get:

screen shot 2017-08-20 at 3 28 02 pm

Notice that the top expanded income decile is simulated to pay over $1.3 trillion in income taxes, even though the user thought s/he was eliminating the income tax by setting the first bracket's rate to zero and making the top of that first bracket be (essentially) infinity.

The reason why we put these tax bracket validations in the current_law_policy.json file becomes apparent when looking at the code that computes tax liability:

    return (rate1 * min(income, brk1) +
            rate2 * min(brk2 - brk1, max(0., income - brk1)) +
            rate3 * min(brk3 - brk2, max(0., income - brk2)) +
            rate4 * min(brk4 - brk3, max(0., income - brk3)) +
            rate5 * min(brk5 - brk4, max(0., income - brk4)) +
            rate6 * min(brk6 - brk5, max(0., income - brk5)) +
            rate7 * min(brk7 - brk6, max(0., income - brk6)) +
            rate8 * max(0., income - brk7))

So, by letting users make this kind of mistake, TaxBrain is producing absurdly incorrect results.
At least this is what the user would think (because TaxBrain users are not being requested to understand the Python source code before using TaxBrain). The user was clear: the first bracket is infinitely big and has a zero rate. So, the user is going to think the TaxBrain results are absurd.

It seems like we need to make a fundamental philosophical decision about Tax-Calculator and TaxBrain. Do we want to remove all validation testing or not? Essentially, the current version of TaxBrain removes all validation testing (because the warning message is so vague, most users are not going to realize their mistake and go ahead and click the execute button a second time). This philosophy of minimal or no validation checking (which has a long tradition in software development) is often described in computer programming discourse as the "give them enough rope to hang themselves" approach.

@MattHJensen @feenberg @Amy-Xu @andersonfrailey @hdoupe @talumbau @brittainhard

@MattHJensen
Copy link
Contributor

I advocate for providing a good warning and no error in both tax-calculator and TaxBrain.

I left a few related comments on the merged PR: PSLmodels/Tax-Calculator#1502.

First I said:

I think it is better to show warnings rather than errors.

A common scenario where someone would want to bypass the validations is when they reduce the standard deduction while eliminating itemized deductions. The error from not having imputed itemized deductions to non-itemizers would be small in those situations, and I think the user should be able to decide whether to accept that error.

I then left a longer explanation:

I think the ideal approach would be for the user to see a warning and explanation before being allowed to continue with the reform if they choose. This is in line with our long standing goal of allowing our users, wherever possible, to make key modeling decisions for themselves.

I have come across several situations myself where I have chosen to bypass the warning, but I also think it is our responsibility to communicate the limitation to our users.

As @martinholmer points out, we don't currently attain the ideal because we don't offer an adequate explanation. I've received this feedback before from Kevin Hassett, but I failed to do anything about it at the time.

One option would be to stick with the warning system (albeit now in Tax-Calculator rather than TaxBrain) and simply make the explanation more informative.

Another option would be to move the warning to the documentation and code comments. My fear is that those would be very easy to overlook.

I think it is too limiting to strictly enforce the rules.

@martinholmer, is there something that I am missing that means we must have no validation or strict validation? What about improving the quality of the warning?

--

Relatedly, the user interface may be nicer if the warnings are displayed at the top of the output page rather than on the input page with a required resubmission.

@martinholmer
Copy link
Contributor Author

martinholmer commented Aug 20, 2017

@MattHJensen said in TaxBrain issue #620:

I think it is too limiting to strictly enforce the rules.

So, you think the situation described in #620 is OK? What would the "warning" at the top of the results page say? Something like "The results from this TaxBrain run are completely non-sensical, but we're showing them to you anyway"? My point in #620 is that there is a fundamental difference between the current validations: some are logical requirements (like the one in #620) and others (related to to deductions) are data weaknesses that we haven't fixed since you asked for that data fix a year ago.

Is there something that I am missing that means we must have no validation or strict validation? What about improving the quality of the warning?

In my view, yes we should have "strict validation" (resulting in a fatal error and no simulation results) when the validation involves a logical requirement (like in #620). If you want the deduction related data weaknesses to not be validated, I have no problem with that as I already mentioned in Tax-Calculator 1502. But why not solve these data weakness issues by adding itemized-deduction expenses for non-itemizers? This is the most direct way to deal with this second class of validations.

You seem to favor "warnings", but I don't see how that works. Can you explain exactly how Tax-Calculator would communicate a "warning" to TaxBrain?

@MattHJensen
Copy link
Contributor

In my view, yes we should have "strict validation" (resulting in a fatal error and no simulation results) when the validation involves a logical requirement (like in #620).

That makes sense to me, too, as does bifurcating our approach for handling logical requirements and data weaknesses.

You seem to favor "warnings" [for data weaknesses], but I don't see how that works. Can you explain exactly how Tax-Calculator would communicate a "warning" to TaxBrain?

The Python logging module was on my mind. I imagine we'd add a special log level for this purpose. There was a brief discussion of the benefits of the logging module in PSLmodels/Tax-Calculator#160 (comment).

That said, I see your point about how solving the data weaknesses themselves would eliminate our (current) need for warnings. I asked @GoFroggyRun to focus on the webapp earlier this summer, which has been very productive, but it disrupted his progress on the imputation of deductions for non itemizers (PSLmodels/taxdata#32):

@GoFroggyRun, could you leave an update over at PSLmodels/taxdata#32 on the state of the project and what you think the next steps should be when you pick it up again?

--

Given that a) we haven't yet built the capacity to provide informative warnings, and b) our current need for those warnings might go away, one approach might look something like this:

  1. Remove validations for "data weaknesses," and move their discussion to documentation that could be linked to from the top of ospc.org/taxbrain. (This would solve the initial problem I pointed out, of strict validations for data weaknesses being too limiting.)
  2. Implement strict validation for logical errors on TaxBrain in a manner that relies on the new Tax-Calculator checking in Add logic to enforce policy parameter 'validations' PSLmodels/Tax-Calculator#1502. (probably the initial purpose for this issue TaxBrain produces incorrect results because it allows invalid reforms #630.)
  3. Revisit the decision about providing warnings via the logging module or some other means after we know more about Impute itemized deduction amounts to non-itemizers PSLmodels/taxdata#32 or when there is some new need for it in the future.

@martinholmer
Copy link
Contributor Author

@MattHJensen said about warnings and errors in #630:

Given that a) we haven't yet built the capacity to provide informative warnings, and b) our current need for those warnings might go away, one approach might look something like this:

  1. Remove validations for "data weaknesses," and move their discussion to documentation that could be linked to from the top of ospc.org/taxbrain. (This would solve the initial problem I pointed out, of strict validations for data weaknesses being too limiting.)

  2. Implement strict validation for logical errors on TaxBrain in a manner that relies on the new Tax-Calculator checking in Add logic to enforce policy parameter 'validations' PSLmodels/Tax-Calculator#1502. (probably the initial purpose for this issue TaxBrain produces incorrect results because it allows invalid reforms #630.)

  3. Revisit the decision about providing warnings via the logging module or some other means after we know more about Impute itemized deduction amounts to non-itemizers PSLmodels/taxdata#32 or when there is some new need for it in the future.

I am in general agreement with this kind of approach to improving our handling of warnings and errors. But I'd like to suggest a slight variation on this basic approach.

Assuming we agree to maintain a distinction between warnings (that do not stop Tax-Calculator execution) and errors (that do stop Tax-Calculator execution), then I think we should add warnings for parameter values being out of range for all policy parameters. Among the benefits of this approach is that we can continue to include the "data weakness" warnings which are very useful to point out to users.

I can begin preparing a pull request as soon as we are in agreement on this basic approach:
valid range warnings will be collected for all parameters in a reform and Tax-Calculator execution will continue, while valid range errors will be collected for all parameters in a reform and Tax-Caculator execution will stop before trying to undertake tax calculations.

Does this (slightly modified) approach make sense?

@MattHJensen
Copy link
Contributor

That sounds great. Thanks @martinholmer.

@hdoupe
Copy link
Collaborator

hdoupe commented Aug 21, 2017

@martinholmer said

I can begin preparing a pull request as soon as we are in agreement on this basic approach:
valid range warnings will be collected for all parameters in a reform and Tax-Calculator execution will continue, while valid range errors will be collected for all parameters in a reform and Tax-Caculator execution will stop before trying to undertake tax calculations.

@MattHJensen said

Revisit the decision about providing warnings via the logging module or some other means after we know more about PSLmodels/taxdata#32 or when there is some new need for it in the future.

What if we had some function in policy.py that did something along the lines of the pseudocode below?

def validation_msg(reform):
    msg = {'error': {}, 'warning': {}}
    for param in reform:
         if param raises error:
             msg['error'][param] = some error message
        if param raises warning:
             msg['warning'][param] = some warning message
    return msg

This function could be called in TaxBrain and an action could be taken depending on the contents of msg.

@martinholmer
Copy link
Contributor Author

@hdoupe said in #630:

What if we had some function in policy.py that did something along the lines of the pseudocode below?

def validation_msg(reform):
    msg = {'error': {}, 'warning': {}}
    for param in reform:
         if param raises error:
             msg['error'][param] = some error message
        if param raises warning:
             msg['warning'][param] = some warning message
    return msg

This function could be called in TaxBrain and an action could be taken depending on the contents of msg.

Thanks for the creative thinking. I'll think about this kind of approach.

@martinholmer
Copy link
Contributor Author

Issue #630 has been resolved in TaxBrain 1.0.3 with the implementation of the GUI input processing strategy outlined in issue #619.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants