Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Vectorize all tax calculation functions #2585

Closed
wants to merge 26 commits into from

Conversation

chusloj
Copy link
Contributor

@chusloj chusloj commented Apr 6, 2021

This PR does the following:

  1. Stores all tax variables as columns of a pandas dataframe attached to the underlying Records object
  2. Decouples all functions in calcfunctions.py from the iterate_jit decorator and implements either (a) a function that takes advantage of numpy/pandas broadcasting or (b) a function that numpy will manually vectorize through the @np.vectorize decorator.

The tax logic within some functions may be less clear due to the introduction of np.where and np.select.

Thanks @hdoupe for updating the Records and Data classes from being numpy- to pandas- dependent


TODO:

  • Clean up calcfunctions by creating a function that collects each tax function's relevant arguments, parameters and outputs. (causes loss of speed)
  • Make the test suite run.
  • Clean up unnecessary comments.

@chusloj chusloj marked this pull request as draft April 6, 2021 20:19
@chusloj
Copy link
Contributor Author

chusloj commented Apr 27, 2021

I may not get the chance to finish this PR, so here's the current status of the test suite (last thing to take care of before merging). The following tests are still failing:

test_calcfunctions.py::test_calc_and_used_vars
test_calcfunctions.py::test_function_args_usage
test_cpscsv.py::test_agg
test_utils.py::test_create_tables
test_utils.py::test_table_columns_labels

The most common problems have been the following:

  1. Missing or duplicate MARS indexing. There might be missing terms of [MARS - 1] in calcfunctions.py or there might be an unnecessary addition of the same term in the function calls within calculator.py.
  2. Misspecification of np.where statements. For example, in a previous commit the following was added:

eitc = np.where(np.logical_or(earnings > phaseout_start, agi > phaseout_start),
np.minimum(eitc, eitcx),
eitc)

In line 2188, eitc is specified as the return value in the 3rd argument whereas before it was mistakenly misspecified as 0.

  1. Issues with conditions in np.select. For example, a previous commit made the following change (previous code is commented out):

# calculate deduction for dependents
# if DSI == 1:
# c15100 = max(350. + earned, STD_Dep)
# basic_stded = min(STD[MARS - 1], c15100)
# else:
# c15100 = 0.
# if MIDR == 1:
# basic_stded = 0.
# else:
# basic_stded = STD[MARS - 1]
c15100 = np.where(DSI == 1, np.maximum(350. + earned, STD_Dep), 0.)
condlist = [DSI == 1, np.logical_and(DSI != 1, MIDR == 1), np.logical_and(DSI != 1, MIDR != 1)]
choicelist = [np.minimum(STD[MARS - 1], c15100), 0., STD[MARS - 1]]
basic_stded = np.select(condlist, choicelist)

Line 846 previously did not specify the nested if statement using a DSI != 1 like so:

np.logical_and(DSI != 1, MIDR == 1)

What was missing was the fact that DSI != 1 in this nested if statement.

@chusloj
Copy link
Contributor Author

chusloj commented Apr 27, 2021

@MattHJensen @rickecon @jdebacker I think that test_calc_and_used_vars and test_function_args_usage can be removed from the test suite. I believe these just check that all arguments, parameters, and returned variables are accounted for in functions used by the @iterate_jit decorator. With the decorator's removal, Python can know normally scan for misspecified function calls.

Additionally, some additional arguments have been added to certain functions (they're in the function body but weren't previously specified in function arguments) and all output variables have been removed from function arguments.

@MattHJensen MattHJensen changed the title Vectorize all tax calculation functions [WIP] Vectorize all tax calculation functions May 4, 2021
@jdebacker
Copy link
Member

Closing for lack of interest in moving the code in this direction.

@jdebacker jdebacker closed this Mar 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants