Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should the Tax-Calculator be 'runnable' without numba? #292

Closed
talumbau opened this issue Jun 23, 2015 · 18 comments
Closed

Should the Tax-Calculator be 'runnable' without numba? #292

talumbau opened this issue Jun 23, 2015 · 18 comments

Comments

@talumbau
Copy link
Member

The benefits might be:

  • easier debugging
  • able to run in environments that have pandas and numpy, but not numba

After a minute or two of searching, it seems like we could support testing this case using the env feature of Travis CI. We would basically just have an extra build environment without numba, so that our test matrix would be: [python 2.7, python 3.4] X [with numba, without numba]

http://docs.travis-ci.com/user/customizing-the-build/#Build-Matrix

@talumbau
Copy link
Member Author

It turns out that this may be pretty simple. Looking at this section of code:

https://github.com/OpenSourcePolicyCenter/Tax-Calculator/blob/master/taxcalc/decorators.py#L223

There is a boolean flag do_jit in one of the helper functions for iterate_jit. It is always True. Presumably I did this to debug issues in the development of iterate_jit. It seems likely the only thing one would have to do to enable this behavior is:

  • come up with a reasonable way on how to turn this off/on
  • modify the TravisCI testing matrix to test w/o jitting.

@MattHJensen
Copy link
Contributor

@martinholmer, would the enhancement outlined in this issue have made things easier in #467?

@martinholmer
Copy link
Collaborator

@MattHJensen said:

would the enhancement outlined in this issue have made things easier in #467?

Honestly, I don't know. My mistake was to assign a value to records.FLPDYR instead of incrementing the array like records.FLPDYR += 4 in order to get from 2009 to 2013 without extrapolations, blowups, or reweighting. My error was caused, I guess, by my lack of experience with numpy arrays. But the disappointing thing about my experience is that the Python error traceback gave me no clue about what I was doing wrong. The traceback info did lead me to the educated guess that some variable in the AMTI function was miss-specified. But that wasn't much help because there look to be as many as sixty arguments to the AMTI function. Everything is fine now; but it was a frustrating experience.

@MattHJensen
Copy link
Contributor

It looked to me like there might have been more useful python traceback info if numba hadn't kicked in. If you were willing to replicate the problem, you could try running things with do_jit = False and compare the tracebacks. Posting them here would provide good insight about how much we should prioritize this enhancement.

@feenberg
Copy link
Contributor

On Wed, 18 Nov 2015, Martin Holmer wrote:

@MattHJensen said:

  would the enhancement outlined in this issue have made things easier in
  #467?

Honestly, I don't know. My mistake was to assign a value to records.FLPDYR instead of
incrementing the array like records.FLPDYR += 4 in order to get from 2009 to 2013
without extrapolations, blowups, or reweighting. My error was caused, I guess, by my
lack of experience with numpy arrays. But the disappointing thing about my experience

I hope that users of TB won't need to understand numpy to write
extensions.

is that the Python error traceback gave me no clue about what I was doing wrong. The
traceback info did lead me to the educated guess that some variable in the AMTI
function was miss-specified. But that wasn't much help because there look to be as
many as sixty arguments to the AMTI function. Everything is fine now; but it was a

The large number of required arguments to functions was the reason I made
little use of them in SAS.

dan

frustrating experience.


Reply to this email directly or view it on
GitHub.[AHvQVUnO-2GoKBKXyqYWUbDnOlk8hUl3ks5pHJ7WgaJpZM4FKFFi.gif]

@martinholmer
Copy link
Collaborator

@MattHJensen said:

It looked to me like there might have been more useful python traceback info if numba hadn't kicked in. If you were willing to replicate the problem, you could try running things with do_jit = False and compare the tracebacks. Posting them here would provide good insight about how much we should prioritize this enhancement.

As I said above, I have fixed my problem (which was caused by my poor understanding of numpy array syntax) and I have moved on. I went back and put in a statement that generates the same Python error traceback as I was getting originally. With do_jit = True, the error traceback was this:

iMac2:tax-calculator mrh$ python inctax.py puf.csv 2013
Traceback (most recent call last):
  File "inctax.py", line 77, in <module>
    sys.exit(main())
  File "inctax.py", line 70, in main
    inctax.calculate()
  File "/Users/mrh/work/OSPC/tax-calculator/taxcalc/incometaxio.py", line 119, in calculate
    self._calc.calc_all()
  File "/Users/mrh/work/OSPC/tax-calculator/taxcalc/calculate.py", line 156, in calc_all
    self.calc_one_year()
  File "/Users/mrh/work/OSPC/tax-calculator/taxcalc/calculate.py", line 111, in calc_one_year
    self.TaxInc_to_AMTI()
  File "/Users/mrh/work/OSPC/tax-calculator/taxcalc/calculate.py", line 91, in TaxInc_to_AMTI
    AMTI(self.policy, self.records)
  File "/Users/mrh/work/OSPC/tax-calculator/taxcalc/decorators.py", line 355, in wrapper
    ans = high_level_fn(*args, **kwargs)
  File "<string>", line 6, in hl_func
  File "/Users/mrh/anaconda/lib/python2.7/site-packages/numba/dispatcher.py", line 171, in _compile_for_args
    return self.compile(sig)
  File "/Users/mrh/anaconda/lib/python2.7/site-packages/numba/dispatcher.py", line 348, in compile
    flags=flags, locals=self.locals)
  File "/Users/mrh/anaconda/lib/python2.7/site-packages/numba/compiler.py", line 637, in compile_extra
    return pipeline.compile_extra(func)
  File "/Users/mrh/anaconda/lib/python2.7/site-packages/numba/compiler.py", line 358, in compile_extra
    return self.compile_bytecode(bc, func_attr=self.func_attr)
  File "/Users/mrh/anaconda/lib/python2.7/site-packages/numba/compiler.py", line 367, in compile_bytecode
    return self._compile_bytecode()
  File "/Users/mrh/anaconda/lib/python2.7/site-packages/numba/compiler.py", line 624, in _compile_bytecode
    return pm.run(self.status)
  File "/Users/mrh/anaconda/lib/python2.7/site-packages/numba/compiler.py", line 250, in run
    raise patched_exception
numba.errors.TypingError: Caused By:
Traceback (most recent call last):
  File "/Users/mrh/anaconda/lib/python2.7/site-packages/numba/compiler.py", line 242, in run
    res = stage()
  File "/Users/mrh/anaconda/lib/python2.7/site-packages/numba/compiler.py", line 455, in stage_nopython_frontend
    self.locals)
  File "/Users/mrh/anaconda/lib/python2.7/site-packages/numba/compiler.py", line 752, in type_inference_stage
    infer.propagate()
  File "/Users/mrh/anaconda/lib/python2.7/site-packages/numba/typeinfer.py", line 510, in propagate
    raise errors[0]
TypingError: Invalid usage of getitem with parameters (int64, int64)
 * parameterized
File "<string>", line 3

Failed at nopython (nopython frontend)
Invalid usage of getitem with parameters (int64, int64)
 * parameterized
File "<string>", line 3
iMac2:tax-calculator mrh$ 

With do_jit = False in the decorator.py file, the error traceback is noticeably shorter and has one line at the end that is a somewhat more helpful. However, the fact that one of the arguments of the AMTI function is an int, and therefore, does not have a get_item attribute, is not very helpful when there are dozens and dozens of AMTI function arguments. It is true that the above longer traceback does have similar information: Invalid usage of getitem with parameters (int64, int64). But I found that message confusing because I didn't understand the double int64 business.

Here is the error traceback with do_jit = False:

iMac2:tax-calculator mrh$ python inctax.py puf.csv 2013
Traceback (most recent call last):
  File "inctax.py", line 77, in <module>
    sys.exit(main())
  File "inctax.py", line 70, in main
    inctax.calculate()
  File "/Users/mrh/work/OSPC/tax-calculator/taxcalc/incometaxio.py", line 119, in calculate
    self._calc.calc_all()
  File "/Users/mrh/work/OSPC/tax-calculator/taxcalc/calculate.py", line 156, in calc_all
    self.calc_one_year()
  File "/Users/mrh/work/OSPC/tax-calculator/taxcalc/calculate.py", line 111, in calc_one_year
    self.TaxInc_to_AMTI()
  File "/Users/mrh/work/OSPC/tax-calculator/taxcalc/calculate.py", line 91, in TaxInc_to_AMTI
    AMTI(self.policy, self.records)
  File "/Users/mrh/work/OSPC/tax-calculator/taxcalc/decorators.py", line 356, in wrapper
    ans = high_level_fn(*args, **kwargs)
  File "<string>", line 6, in hl_func
  File "<string>", line 3, in ap_func
TypeError: 'int' object has no attribute '__getitem__'
iMac2:tax-calculator mrh$ 

This experience left me asking the following questions: (1) Why can't the Python interpreter tell me which of the arguments doesn't have a get_item attribute? It certainly knows that information, otherwise it couldn't raise an error; and (2), Why wasn't the code in decorators.py written with the Python try syntax, which would allow writing of an informative error message?

The stupid statement that generates both these error tracebacks is as follows:

recs.FLPDYR = policy.current_year

which seemed reasonable to a numpy novice who sees statements like this in the code:

recs.FLPDYR += 1

I now understand that the correct approach is to write something like this:

recs.FLPDYR.fill(policy.current_year)

You can decide for yourself, but to me these results do not imply that Issue #292 should get a very high priority on scarce resources (at least not now).

@talumbau
Copy link
Member Author

just a couple of quick comments:

  • Regarding code like this: recs.FLPDYR.fill(policy.current_year), you could also do

recs.FLPDYR[:] = policy.current_year,

which works as follows:

>>> import numpy as np
>>> x = np.array([3,4,5])
>>> x[:] = 1
>>> x
array([1, 1, 1])

so it is essentially a broadcast operation.

  • the trouble with generating good error messages when inside decorators.py is that what is happening is that we are generating special strings that are then compiled into Python functions, and then those are compiled via the numba jit function. It turns out that if anything in those strings has a problem, the traceback feature is not so good in Python. If Python has a problem with a real file, it can point you at a line. If Python has a problem compiling/interpreting a string, it has difficulty pointing you to the line in the string where the problem occurs. It is true we could try to catch errors and attempt to inspect a problem when it occurs, but I don't see a general way to catch errors and report properly here.

@martinholmer
Copy link
Collaborator

@talumbau, Your work in pull request #619 resolved this issue #292, right?
If so, then should this issues be closed?

@MattHJensen

@talumbau
Copy link
Member Author

@martinholmer thanks for revisiting this issue. It turns out that this discussion is no longer relevant, because I just discovered that in records.py we now have this function:

https://github.com/open-source-economics/Tax-Calculator/blob/master/taxcalc/records.py#L609

def imputed_cmbtp_itemizer(e17500, e00100, e18400,
                           e62100, e00700,
                           p04470, e21040,
                           e18500, e20800):

which is decorated with the @vectorize decorator in numba, so now numba is a hard dependency for the Tax-Calculator. Prior to this change, it was always possible to run the Tax-Calculator without numba installed, but this is no longer possible. The only thing one can do is turn off the jitting behavior, but numba must still be installed. I was actually working towards making it possible to import taxcalc without Numba installed, but this no longer seems like a good idea due to this hard dependency. If there's no more discussion here in the next few days I can close this issue.

cc @MattHJensen

@martinholmer
Copy link
Collaborator

T.J. said:

Thanks for revisiting this issue #292. It turns out that this discussion is no longer relevant, because I just discovered that in records.py we now have this function:

def imputed_cmbtp_itemizer(e17500, e00100, e18400,
                           e62100, e00700,
                           p04470, e21040,
                           e18500, e20800):

which is decorated with the @vectorize decorator in numba, so now numba is a hard dependency for the Tax-Calculator. Prior to this change, it was always possible to run the Tax-Calculator without numba installed, but this is no longer possible.

The @vectorize decorator was added on June 1, 2015, as part of pull request #279 by mmessick in consultation with you and others on the team. Tax-Calculator was presumably executing just fine before the addition of the @vectorize decorator in the records.py file, so it would seem that we could achieve your goal by dropping it. Is there a noticeable performance hit from removing it?

@talumbau @MattHJensen @Amy-Xu

@feenberg
Copy link
Contributor

On Tue, 12 Apr 2016, Martin Holmer wrote:

T.J. said:

  Thanks for revisiting this issue #292. It turns out that this discussion
  is no longer relevant, because I just discovered that in records.py we now
  have this function:

def imputed_cmbtp_itemizer(e17500, e00100, e18400,
e62100, e00700,
p04470, e21040,
e18500, e20800):

  which is decorated with the @vectorize decorator in numba, so now numba is
  a hard dependency for the Tax-Calculator. Prior to this change, it was
  always possible to run the Tax-Calculator without numba installed, but
  this is no longer possible.

More dependencies is not a desirable change. Can't we make numba optional?

dan

The @vectorize decorator was added on June 1, 2015, as part of pull request #279 by
mmessick in consultation with you and others on the team. Tax-Calculator was
presumably executing just fine before the addition of the @vectorize decorator in the
records.py file, so it would seem that we could achieve your goal by dropping it. Is
there a noticeable performance hit from removing it?

@talumbau @MattHJensen @Amy-Xu


You are receiving this because you commented.
Reply to this email directly or view it on
GitHub[AHvQVfu2AAjjAfKiYM4HBjyUQKzcsUHgks5p2_30gaJpZM4FKFFi.gif]

@talumbau
Copy link
Member Author

@martinholmer said:

Tax-Calculator was presumably executing just fine before the addition of the @vectorize decorator in the records.py file, so it would seem that we could achieve your goal by dropping it. Is there a noticeable performance hit from removing it?

Unfortunatetly, it's not that simple. The vectorize decorator allows one to build a "ufunc" that takes as its arguments numpy arrays, but operates on each element individually in the context of the function. So, one passes a vector quantity, but one treats the incoming arguments as scalar within the function. So if you wrote:

@vectorize
def f(a, b, c):
    c = a + b
    return c

It is essentially the same as:

def f(a, b, c):
    for i in range(len(a)):
        c[i] = a[i] + b[i]
    return c

Except for numba handles type checking, bounds checking, etc., and the iteration is at the C level, not the Python level. So, we'd have to refactor that function to not use numba.

With our custom decorator iterate_jit, we avoid this issue and the functions work whether or not they are "numba"-ized. With the "out of the box" decorator vectorize, you have to write the function assuming a certain context.

We could re-write the function and use something like @jit which does not suffer from this issue. With @jit, you get no change in semantics. In other words, you would write the "for" loop version as above, but you could force it to compile it all the way down to machine code. So you could write the function once, and you would either get Python level execution if numba is not available or machine-level speed if Numba is available. That's probably a good way to go. How does that sound?

@martinholmer
Copy link
Collaborator

T.J., I'm not sure I followed every detail of what you just said in issue #292, but I'm in agreement with Dan in that we should strive for fewer hard dependencies and see if we can make the @vectorize or @jit decorator on the function in the Records class optional. Optional like you did for the @iterate_jit decorator in the functions.py file. There is an advantage to be able to debug the slower code. And, as you found in functions.py, turning off the @iterate_jit decorator exposed some things that needed to be fixed.

@talumbau @feenberg @MattHJensen @Amy-Xu

@martinholmer
Copy link
Collaborator

@talumbau, has your issue #292 been resolved by the merger of pull request #754? If not, can you explain what aspects of this issue remain unresolved? If #292 has been resolved, perhaps you can close it.

@talumbau
Copy link
Member Author

This is a good development. With the merge of #754, I can now conclude my work and make taxcalc completely runnable without numba installed. I consider this a good development. I think you should get a message at log level "warning" if you are using taxcalc without numba installed, but it should run just fine. So, I'd like to keep this issue open until I can post my PR that implements this functionality.

@martinholmer
Copy link
Collaborator

T.J. (@talumbau) said:

This is a good development. With the merge of #754, I can now conclude my work and make taxcalc completely runnable without numba installed. I consider this a good development. I think you should get a message at log level "warning" if you are using taxcalc without numba installed, but it should run just fine. So, I'd like to keep this issue open until I can post my PR that implements this functionality.

Great! We look forward to a pull request that makes this possible.

@martinholmer
Copy link
Collaborator

Given the recent enhancements made by @talumbau to make Tax-Calculator 'runnable' without the numba package (merged pull request #799), it would seem as if issue #292 has been resolved. If there are no remaining unresolved issues, #292 should be closed with a big thank-you to T.J.

@MattHJensen @feenberg @Amy-Xu @GoFroggyRun @zrisher

@martinholmer
Copy link
Collaborator

Issue #292 has been resolved by merged pull request #799, which was developed by @talumbau. Thank you @talumbau for adding this flexibility to Tax-Calculator.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants