-
-
Notifications
You must be signed in to change notification settings - Fork 159
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Document rationale for numba #2427
Comments
@MaxGhenis It seems like the main advantages of using numba are performance and code-readability. @MattHJensen has mentioned in the past that the code was difficult to work with and reason about before the ref: #88 and where I've done a couple experiments that re-work how the JIT-ing is applied. Neither of them are complete, but I made it far enough along that both seemed likely to work.
I worked on One last thing: a lot of the |
Thanks for the background @hdoupe. I'm not sure about the readability argument (that screenshot could be implemented as this vectorized function calculating tax liability from a MTR structure), but it does seem like JIT-ing improves performance. These are nice experiments, I'll DM you. |
This issue highlights another benefit of the work in PR #2585 |
@jdebacker said before closing this issue:
I don't understand this comment given that @MaxGhenis reports in this issue that:
Given Max's test results, why would you expect the work in PR #2585 to do anything other than slow down Tax-Calculator? |
@martinholmer you know that one has to balance:
Max's provides some evidence re (1) (although it may be biased towards Numba given the small example dataset and by not using Dask DataFrames). At the same time, Max references (2), but noting that the API documentation is limited in that it can't see into the jitted functions. This affects programming time because it makes it harder for contributors/users to understand how the model works. Other discussion suggests that when looking at the source code, the logic may be easier to understand for jitted than for vectorized functions, but there's still a question how much more clear the code is. I any case, I'd welcome you opening a PR to make the documentation more clear (Max's original comment). |
@jdebacker said in closed issue #2427:
Not sure what he means by "can't see into the jitted functions". The pre-jitted functions @jdebacker continues:
I simply don't understand what you're saying. "How the model works" to compute taxes is clearly laid out in the pre-jitted calcfunctions.py code. If you want to see something much less easy to understand, take a look at parts of the vectorized code in PR #2585. And @jdebacker concludes by saying:
I don't think it is my role to document something that I did not create. The jitting decorators used in the calcfunctions.py module and specified in the decorators.py module are the creation of @talumbau who, like me, is no longer part of the Tax-Calculator team. |
@martinholmer I don't know what you mean by "team" or "role". This is a public repository for an open source project. The project would welcome contributions from you (or anyone else) that help with documentation (or other aspects of the project). |
With documentation moving to
jupyter-book
(#2420), we'll have a pretty home for documenting the taxcalc API (#2421, #2422). Much of this will be documenting the user-facing parts of the API, e.g.calc_all()
, butcalcfunctions.py
has many other functions that could be useful. However (IME), the design of usingnumba
with functions that take scalars limits the ability to reproduce underlying logic, compared to vectorized functions that take aDataFrame
with the right columns and return aSeries
(e.g. for calculating AGI or some particular deduction). The chosen design is also more verbose in many cases. I'd suggest documenting the rationale for this choice in case users are curious about it, and in case there's a way to run those functions (I didn't fully follow #292).In this test, I found that
numba
reduced the runtime by 75%, once it compiled, compared to a vectorized function taking aDataFrame
. This is a basic reproduction of theALD_InvInc_ec_base
function only, withouttaxcalc
's jitting, celery as mentioned here to avoid the need to warm up, etc., so I don't know if it's representative of overall time savings.This could relate to the general
taxcalc
speed discussion in #2376.The text was updated successfully, but these errors were encountered: