Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Namespaces: Infrastructure #780

Open
wants to merge 225 commits into
base: collect-components-of-namespaces
Choose a base branch
from

Conversation

lars-reimann
Copy link
Collaborator

@lars-reimann lars-reimann commented Jul 17, 2024

What problem do you want to solve?

This PR introduces namespaces to GETTSIM's infrastructure.

  • Write policy_function decorator (rename policy_info and change behavior so that a PolicyFunction instance is returned). Apply to all TT functions. (that should be part of renamings)
  • Check that functions in module with same simple_name have the correct start_date, end_date specs (this was removed from the policy_info decorator).
  • Remove doubled levels in the functions tree automatically (to avoid writing functions in __init__.py).
  • Go over type hints for aggregation functions.
  • Refactor interface module.
  • Implement some safety checks
    • No function should have the same name as a module in the same directory
    • No trailing underscores in module names (for DAGS PR)

@lars-reimann
Copy link
Collaborator Author

lars-reimann commented Jul 17, 2024

@MImmesberger The nested function dictionary is currently structured like this:

image

Which levels of nesting should be removed? Based on our previous discussion, it's probably the first two, i.e. _gettsim and social_insurance_contributions/transfers/taxes/demographic_vars, right?

Copy link

codecov bot commented Jul 17, 2024

Codecov Report

Attention: Patch coverage is 78.15483% with 206 lines in your changes missing coverage. Please review.

Please upload report for BASE (collect-components-of-namespaces@31bf89f). Learn more about missing BASE report.
Report is 1 commits behind head on collect-components-of-namespaces.

Files with missing lines Patch % Lines
src/_gettsim/combine_functions_in_tree.py 62.56% 67 Missing ⚠️
src/_gettsim/interface.py 81.76% 31 Missing ⚠️
src/_gettsim/functions/loader.py 72.05% 19 Missing ⚠️
src/_gettsim/shared.py 78.57% 15 Missing ⚠️
src/_gettsim/policy_environment.py 70.21% 14 Missing ⚠️
src/_gettsim/visualization.py 30.00% 14 Missing ⚠️
src/_gettsim/gettsim_typing.py 0.00% 11 Missing ⚠️
src/_gettsim/aggregation.py 0.00% 10 Missing ⚠️
src/_gettsim/functions/policy_function.py 72.72% 9 Missing ⚠️
src/_gettsim/groupings.py 0.00% 7 Missing ⚠️
... and 3 more
Additional details and impacted files
@@                         Coverage Diff                         @@
##             collect-components-of-namespaces     #780   +/-   ##
===================================================================
  Coverage                                    ?   48.77%           
===================================================================
  Files                                       ?       55           
  Lines                                       ?     4012           
  Branches                                    ?        0           
===================================================================
  Hits                                        ?     1957           
  Misses                                      ?     2055           
  Partials                                    ?        0           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@MImmesberger
Copy link
Collaborator

I think that, for example, transfers.arbeitsl_geld.betrag is perfect, so I would vote for keeping the social_insurance_contributions/transfers/taxes/demographic_vars level. The _gettsim should not be part of the target call.

@hmgaudecker
Copy link
Collaborator

hmgaudecker commented Jul 17, 2024

Looking at that, I'd almost think that transfers and taxes are superfluous, but not demographics (I'd suggest renaming this) and social_insurance_contributions, although we might be able to come up with a shorter name.

What do you think, @MImmesberger ?

In any case, not super-important for the moment, the good thing is that it will be fairly easy to do bulk-renamings / removals (granted it seems more difficult to insert a level back than to remove it, so I'd be fine with an approach applying a bit more caution at the moment).

@MImmesberger
Copy link
Collaborator

I agree, there should be no overlap between the elements one level below the tax/transfer level, so no need to explicitly distinguish. In that case, however, we should make sure that the module docstring what the module is about (e.g. it's not obvious whether the "Kinderbonus" is a tax deduction or a transfer).

Regarding the naming of social_insurance_contributions: In the google doc, I called this "sozialversicherungsbeitraege". But I just realized that I put this in the taxes namespace which is wrong. Should be its own (as it is the case currently in Lars' proposal).

@hmgaudecker
Copy link
Collaborator

Agreed, let's use sozialversicherungsbeitraege. At some point we might rename demographics yet again, but that really should be a simple Search & Replace.

I'm all for good docstrings, but any tax deduction should go into the tax component it is deducted from, right?

@MImmesberger
Copy link
Collaborator

I'm all for good docstrings, but any tax deduction should go into the tax component it is deducted from, right?

Yes definitely. Was just thinking about making it as obvious as possible. In our case "Kinderbonus" is a transfer.

@ChristianZimpelmann
Copy link
Member

sozialversicherungsbeitraege

More in line with the naming of modules might be sozialv_beitraege

@hmgaudecker
Copy link
Collaborator

Good catch! Those are scheduled to be changed to

  • arbeitslosenversicherung
  • einkommensgrenzen
  • krankenversicherung
  • pflegeversicherung
  • rentenversicherung

But that was not impossible to know, ofc!

@lars-reimann
Copy link
Collaborator Author

lars-reimann commented Jul 17, 2024

Should the renamings/removals be done programmatically in the new function or by changing the directory structure under _gettsim?

Edit: For now, remove transfers/taxes in the new function. Later, change the directory structure, so it matches the dictionary.

@hmgaudecker
Copy link
Collaborator

Just came across #533 -- might that be fixed in passing here?

lars-reimann and others added 16 commits August 9, 2024 09:08
- Add testing dependencies to default environment
- Make sure the correct kaleido dependency is installed on Windows/Unix
- Add task `tests`, so that `pixi run tests` gives one the option to run
the tests on Python 3.11 and 3.12
- Set 3.12 as the upper bound for the default environment Python version
(as long as we don't test 3.13, we should probably not use it in the
development environment?).
- Use pdbp in pytest
- Remove artifacts from previous packaging workflow

---------

Co-authored-by: Hans-Martin von Gaudecker <[email protected]>
Copy link
Collaborator

@hmgaudecker hmgaudecker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent! Some very small comments left, no show-stoppers though.

src/_gettsim/combine_functions_in_tree.py Outdated Show resolved Hide resolved
pyproject.toml Outdated Show resolved Hide resolved
pyproject.toml Outdated Show resolved Hide resolved
docs/rtd_environment.yml Outdated Show resolved Hide resolved
src/_gettsim/combine_functions_in_tree.py Show resolved Hide resolved
src/_gettsim/interface.py Outdated Show resolved Hide resolved
src/_gettsim/interface.py Outdated Show resolved Hide resolved
src/_gettsim/shared.py Outdated Show resolved Hide resolved
src/_gettsim/shared.py Outdated Show resolved Hide resolved
src/_gettsim/shared.py Outdated Show resolved Hide resolved
@MImmesberger
Copy link
Collaborator

In particular, optree does not support unflattening without a treespec that is easily done using flatten_dict.

I don't have a good intuition about the extent to which we should rely on flatten_dict. In principle, the new functions in shared.py emulate functionality of flatten_dict.

i) One obvious candidate for the unflattening functionality of flatten_dict is tree_structure_from_paths because we only ever need it to create a dict from qualified names of DAG nodes.

ii) In the other cases, we never really unflatten in the optree version, but create a new tree from an empty dict using the tree upsert functions in shared.py. This wouldn't be necessary with flatten_dict. (Take for example partition_tree_by_reference_tree)

However, if we don't want to touch (ii), I would vote against using flatten_dict for (i) because of the complexity it entails to "learn" a new library. Just not worth it for the few lines it would save us.

@hmgaudecker
Copy link
Collaborator

hmgaudecker commented Feb 14, 2025

i) One obvious candidate for the unflattening functionality of flatten_dict is tree_structure_from_paths because we only ever need it to create a dict from qualified names of DAG nodes.

Yes, let's use it then. It is a dependency, anyhow, via dags. As discussed there -- easy enough to emulate should need arise.

ii) In the other cases, we never really unflatten in the optree version, but create a new tree from an empty dict using the tree upsert functions in shared.py. This wouldn't be necessary with flatten_dict. (Take for example partition_tree_by_reference_tree)

I think it would be much clearer there if we did something like this untested snippet:

    ref_paths = set(flatten_dict.flatten(tree_to_partition).keys())
    flat = flatten_dict.flatten(tree_to_partition)
    intersection = flatten_dict.unflatten({p: l for p, l in flat.items() if p in ref_paths})
    difference = flatten_dict.unflatten({p: l for p, l in flat.items() if p not in ref_paths})

However, if we don't want to touch (ii), I would vote against using flatten_dict for (i) because of the complexity it entails to "learn" a new library. Just not worth it for the few lines it would save us.

If this was done forever at this point, I would agree. However, it is more about the precedent. Let's not rewrite functions again.

@MImmesberger
Copy link
Collaborator

We use flatten_dict now in some places where I thought the code gets simpler by doing that. LMK if you disagree for some function.

@hmgaudecker
Copy link
Collaborator

Excellent, thanks! Just had a brief look on the phone — why do you go back to qualified names? flatten would just return the tree path we need, no?

@MImmesberger
Copy link
Collaborator

MImmesberger commented Feb 14, 2025

Yes, in some places we need the qualified name, in other places we could use the qualified name or the path (just doesn't matter there). I thought that consistency would be nice, i.e. knowing that an object flattened by flatten_dict will always have qualified names as keys. That way one doesn't have to look that up.

But if you disagree, I can easily change that.

@MImmesberger
Copy link
Collaborator

MImmesberger commented Feb 14, 2025

As far as I can tell, the only unresolved issue are the function attributes of PolicyFunction/DerivedFunction and the partialled functions. I didn't manage to do this today, but as we put the rounding step outside of the critical function, the priority is somewhat lower I'd say.

I will create an issue for that such that we can tackle it later.

Edit: Also, Github Actions fails and I can't make sense of it. Can you have a look?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants