Namespaces: Infrastructure #780

lars-reimann · 2024-07-17T09:16:23Z

What problem do you want to solve?

This PR introduces namespaces to GETTSIM's infrastructure.

Write policy_function decorator (rename policy_info and change behavior so that a PolicyFunction instance is returned). ~~Apply to all TT functions.~~ (that should be part of renamings)
Check that functions in module with same simple_name have the correct start_date, end_date specs (this was removed from the policy_info decorator).
Remove doubled levels in the functions tree automatically (to avoid writing functions in __init__.py).
Go over type hints for aggregation functions.
Refactor interface module.
Implement some safety checks
- No function should have the same name as a module in the same directory
- No trailing underscores in module names (for DAGS PR)

lars-reimann · 2024-07-17T09:26:35Z

@MImmesberger The nested function dictionary is currently structured like this:

Which levels of nesting should be removed? Based on our previous discussion, it's probably the first two, i.e. _gettsim and social_insurance_contributions/transfers/taxes/demographic_vars, right?

codecov · 2024-07-17T09:37:21Z

Codecov Report

Attention: Patch coverage is 78.15483% with 206 lines in your changes missing coverage. Please review.

Please upload report for BASE (collect-components-of-namespaces@31bf89f). Learn more about missing BASE report.
Report is 1 commits behind head on collect-components-of-namespaces.

Files with missing lines	Patch %	Lines
src/_gettsim/combine_functions_in_tree.py	62.56%	67 Missing ⚠️
src/_gettsim/interface.py	81.76%	31 Missing ⚠️
src/_gettsim/functions/loader.py	72.05%	19 Missing ⚠️
src/_gettsim/shared.py	78.57%	15 Missing ⚠️
src/_gettsim/policy_environment.py	70.21%	14 Missing ⚠️
src/_gettsim/visualization.py	30.00%	14 Missing ⚠️
src/_gettsim/gettsim_typing.py	0.00%	11 Missing ⚠️
src/_gettsim/aggregation.py	0.00%	10 Missing ⚠️
src/_gettsim/functions/policy_function.py	72.72%	9 Missing ⚠️
src/_gettsim/groupings.py	0.00%	7 Missing ⚠️
... and 3 more

Additional details and impacted files

@@                         Coverage Diff                         @@
##             collect-components-of-namespaces     #780   +/-   ##
===================================================================
  Coverage                                    ?   48.77%           
===================================================================
  Files                                       ?       55           
  Lines                                       ?     4012           
  Branches                                    ?        0           
===================================================================
  Hits                                        ?     1957           
  Misses                                      ?     2055           
  Partials                                    ?        0

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

MImmesberger · 2024-07-17T11:16:32Z

I think that, for example, transfers.arbeitsl_geld.betrag is perfect, so I would vote for keeping the social_insurance_contributions/transfers/taxes/demographic_vars level. The _gettsim should not be part of the target call.

hmgaudecker · 2024-07-17T11:55:33Z

Looking at that, I'd almost think that transfers and taxes are superfluous, but not demographics (I'd suggest renaming this) and social_insurance_contributions, although we might be able to come up with a shorter name.

What do you think, @MImmesberger ?

In any case, not super-important for the moment, the good thing is that it will be fairly easy to do bulk-renamings / removals (granted it seems more difficult to insert a level back than to remove it, so I'd be fine with an approach applying a bit more caution at the moment).

MImmesberger · 2024-07-17T12:21:57Z

I agree, there should be no overlap between the elements one level below the tax/transfer level, so no need to explicitly distinguish. In that case, however, we should make sure that the module docstring what the module is about (e.g. it's not obvious whether the "Kinderbonus" is a tax deduction or a transfer).

Regarding the naming of social_insurance_contributions: In the google doc, I called this "sozialversicherungsbeitraege". But I just realized that I put this in the taxes namespace which is wrong. Should be its own (as it is the case currently in Lars' proposal).

hmgaudecker · 2024-07-17T12:50:05Z

Agreed, let's use sozialversicherungsbeitraege. At some point we might rename demographics yet again, but that really should be a simple Search & Replace.

I'm all for good docstrings, but any tax deduction should go into the tax component it is deducted from, right?

MImmesberger · 2024-07-17T12:54:07Z

I'm all for good docstrings, but any tax deduction should go into the tax component it is deducted from, right?

Yes definitely. Was just thinking about making it as obvious as possible. In our case "Kinderbonus" is a transfer.

ChristianZimpelmann · 2024-07-17T13:00:19Z

sozialversicherungsbeitraege

More in line with the naming of modules might be sozialv_beitraege

hmgaudecker · 2024-07-17T14:20:47Z

Good catch! Those are scheduled to be changed to

arbeitslosenversicherung
einkommensgrenzen
krankenversicherung
pflegeversicherung
rentenversicherung

But that was not impossible to know, ofc!

lars-reimann · 2024-07-17T14:47:31Z

Should the renamings/removals be done programmatically in the new function or by changing the directory structure under _gettsim?

Edit: For now, remove transfers/taxes in the new function. Later, change the directory structure, so it matches the dictionary.

hmgaudecker · 2024-07-19T18:22:44Z

Just came across #533 -- might that be fixed in passing here?

for more information, see https://pre-commit.ci

- Add testing dependencies to default environment - Make sure the correct kaleido dependency is installed on Windows/Unix - Add task `tests`, so that `pixi run tests` gives one the option to run the tests on Python 3.11 and 3.12 - Set 3.12 as the upper bound for the default environment Python version (as long as we don't test 3.13, we should probably not use it in the development environment?). - Use pdbp in pytest - Remove artifacts from previous packaging workflow --------- Co-authored-by: Hans-Martin von Gaudecker <[email protected]>

…ut data of concatenated function.

src/_gettsim/functions/policy_function.py

hmgaudecker

Excellent! Some very small comments left, no show-stoppers though.

src/_gettsim/combine_functions_in_tree.py

pyproject.toml

docs/rtd_environment.yml

src/_gettsim/combine_functions_in_tree.py

src/_gettsim/interface.py

src/_gettsim/shared.py

MImmesberger · 2025-02-14T10:43:22Z

In particular, optree does not support unflattening without a treespec that is easily done using flatten_dict.

I don't have a good intuition about the extent to which we should rely on flatten_dict. In principle, the new functions in shared.py emulate functionality of flatten_dict.

i) One obvious candidate for the unflattening functionality of flatten_dict is tree_structure_from_paths because we only ever need it to create a dict from qualified names of DAG nodes.

ii) In the other cases, we never really unflatten in the optree version, but create a new tree from an empty dict using the tree upsert functions in shared.py. This wouldn't be necessary with flatten_dict. (Take for example partition_tree_by_reference_tree)

However, if we don't want to touch (ii), I would vote against using flatten_dict for (i) because of the complexity it entails to "learn" a new library. Just not worth it for the few lines it would save us.

…r-economics/gettsim into namespaces

hmgaudecker · 2025-02-14T11:04:04Z

i) One obvious candidate for the unflattening functionality of flatten_dict is tree_structure_from_paths because we only ever need it to create a dict from qualified names of DAG nodes.

Yes, let's use it then. It is a dependency, anyhow, via dags. As discussed there -- easy enough to emulate should need arise.

ii) In the other cases, we never really unflatten in the optree version, but create a new tree from an empty dict using the tree upsert functions in shared.py. This wouldn't be necessary with flatten_dict. (Take for example partition_tree_by_reference_tree)

I think it would be much clearer there if we did something like this untested snippet:

    ref_paths = set(flatten_dict.flatten(tree_to_partition).keys())
    flat = flatten_dict.flatten(tree_to_partition)
    intersection = flatten_dict.unflatten({p: l for p, l in flat.items() if p in ref_paths})
    difference = flatten_dict.unflatten({p: l for p, l in flat.items() if p not in ref_paths})

However, if we don't want to touch (ii), I would vote against using flatten_dict for (i) because of the complexity it entails to "learn" a new library. Just not worth it for the few lines it would save us.

If this was done forever at this point, I would agree. However, it is more about the precedent. Let's not rewrite functions again.

…ations tests.

…update docstring.

MImmesberger · 2025-02-14T18:10:58Z

We use flatten_dict now in some places where I thought the code gets simpler by doing that. LMK if you disagree for some function.

hmgaudecker · 2025-02-14T18:26:10Z

Excellent, thanks! Just had a brief look on the phone — why do you go back to qualified names? flatten would just return the tree path we need, no?

MImmesberger · 2025-02-14T18:37:08Z

Yes, in some places we need the qualified name, in other places we could use the qualified name or the path (just doesn't matter there). I thought that consistency would be nice, i.e. knowing that an object flattened by flatten_dict will always have qualified names as keys. That way one doesn't have to look that up.

But if you disagree, I can easily change that.

MImmesberger · 2025-02-14T18:41:51Z

As far as I can tell, the only unresolved issue are the function attributes of PolicyFunction/DerivedFunction and the partialled functions. I didn't manage to do this today, but as we put the rounding step outside of the critical function, the priority is somewhat lower I'd say.

I will create an issue for that such that we can tackle it later.

Edit: Also, Github Actions fails and I can't make sense of it. Can you have a look?

feat: load functions into a nested dictionary

1b72ec6

lars-reimann force-pushed the namespaces branch from e204b3e to 1b72ec6 Compare July 17, 2024 09:24

feat: drop _gettsim level

c550526

lars-reimann and others added 16 commits August 9, 2024 09:08

feat: omit "taxes"/"transfers" levels

98d9fb5

build: development version of dags

93d34f0

refactor: create a flat dict first

76bc36e

[pre-commit.ci] auto fixes from pre-commit.com hooks

6192955

for more information, see https://pre-commit.ci

Merge branch 'main' into namespaces

07b0bf6

Add dags branch to pixi dependencies.

f3a0bfa

Fix pixi environment.

75d1bed

Specifiy commit for dags package.

07a3f59

Convert target input to tree dict.

d857c8c

Convert data to nested dict.

3946b42

Return functions tree when loading functions. Still work in progress.

45ffe7f

Merge branch 'main' into namespaces

fe24651

Revert 45ffe7f.

9822a51

Add tree to policy environment. Start to adjust methods.

df92d60

Fix get_function_by_name.

d741e97

MImmesberger added 7 commits February 13, 2025 18:35

Further review progress.

7aaa7b1

Switch from policy_functions_tree to functions_tree.

196d180

Update and harmonize docstrings.

46efe3c

Use fixture in test and test for constant groups in data only for inp…

6e883d7

…ut data of concatenated function.

Change order of operations in interface.

ae8078e

Test only required input data.

e88e228

Use same defaults for start and end dates for class and decorator.

dd56484

hmgaudecker reviewed Feb 13, 2025

View reviewed changes

src/_gettsim/functions/policy_function.py Outdated Show resolved Hide resolved

hmgaudecker and others added 2 commits February 14, 2025 10:14

Small updates to loader and interface.

710874f

Use defaults in decorator only.

02f7385

hmgaudecker approved these changes Feb 14, 2025

View reviewed changes

Merge branch 'namespaces' of https://github.com/iza-institute-of-labo…

8e08469

…r-economics/gettsim into namespaces

MImmesberger added 2 commits February 14, 2025 14:43

Don't create qualified name in combine_functions module. Update annot…

b2c7ee0

…ations tests.

Remove unnecessary comment.

c93eb97

MImmesberger mentioned this pull request Feb 14, 2025

ENH: Use dags features instead of relying on own implementation #816

Open

MImmesberger added 7 commits February 14, 2025 15:02

Update pixi and conda env.

f8ac3b2

Use correct root path in _convert_path_to_importable_module_name and …

5ffa262

…update docstring.

Renamings of functions args.

7fd23c8

Rename partial function and put rounding in function call.

2b56265

Update docstrings.

39cb371

Some review comments I missed previously.

1227738

Move to flatten_dict in some places.

d5ce43f

MImmesberger mentioned this pull request Feb 14, 2025

ENH: Rounding cannot be applied to partial functions #817

Open

Update pixi env: flatten_dict only works with pip/pypi.

b67bbac

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Namespaces: Infrastructure #780

Namespaces: Infrastructure #780

lars-reimann commented Jul 17, 2024 •

edited by MImmesberger

Loading

lars-reimann commented Jul 17, 2024 •

edited

Loading

codecov bot commented Jul 17, 2024 •

edited

Loading

MImmesberger commented Jul 17, 2024

hmgaudecker commented Jul 17, 2024 •

edited

Loading

MImmesberger commented Jul 17, 2024

hmgaudecker commented Jul 17, 2024

MImmesberger commented Jul 17, 2024

ChristianZimpelmann commented Jul 17, 2024

hmgaudecker commented Jul 17, 2024

lars-reimann commented Jul 17, 2024 •

edited

Loading

hmgaudecker commented Jul 19, 2024

hmgaudecker left a comment

MImmesberger commented Feb 14, 2025

hmgaudecker commented Feb 14, 2025 •

edited

Loading

MImmesberger commented Feb 14, 2025

hmgaudecker commented Feb 14, 2025

MImmesberger commented Feb 14, 2025 •

edited

Loading

MImmesberger commented Feb 14, 2025 •

edited

Loading

Namespaces: Infrastructure #780

Are you sure you want to change the base?

Namespaces: Infrastructure #780

Conversation

lars-reimann commented Jul 17, 2024 • edited by MImmesberger Loading

What problem do you want to solve?

lars-reimann commented Jul 17, 2024 • edited Loading

codecov bot commented Jul 17, 2024 • edited Loading

Codecov Report

MImmesberger commented Jul 17, 2024

hmgaudecker commented Jul 17, 2024 • edited Loading

MImmesberger commented Jul 17, 2024

hmgaudecker commented Jul 17, 2024

MImmesberger commented Jul 17, 2024

ChristianZimpelmann commented Jul 17, 2024

hmgaudecker commented Jul 17, 2024

lars-reimann commented Jul 17, 2024 • edited Loading

hmgaudecker commented Jul 19, 2024

hmgaudecker left a comment

Choose a reason for hiding this comment

MImmesberger commented Feb 14, 2025

hmgaudecker commented Feb 14, 2025 • edited Loading

MImmesberger commented Feb 14, 2025

hmgaudecker commented Feb 14, 2025

MImmesberger commented Feb 14, 2025 • edited Loading

MImmesberger commented Feb 14, 2025 • edited Loading

lars-reimann commented Jul 17, 2024 •

edited by MImmesberger

Loading

lars-reimann commented Jul 17, 2024 •

edited

Loading

codecov bot commented Jul 17, 2024 •

edited

Loading

hmgaudecker commented Jul 17, 2024 •

edited

Loading

lars-reimann commented Jul 17, 2024 •

edited

Loading

hmgaudecker commented Feb 14, 2025 •

edited

Loading

MImmesberger commented Feb 14, 2025 •

edited

Loading

MImmesberger commented Feb 14, 2025 •

edited

Loading