Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support keyword-only arguments #116

Merged
merged 22 commits into from
Feb 7, 2024
Merged

Support keyword-only arguments #116

merged 22 commits into from
Feb 7, 2024

Conversation

jl-wynen
Copy link
Member

@jl-wynen jl-wynen commented Feb 2, 2024

Fixes #76

@@ -422,12 +406,12 @@ def set_param_table(self, params: ParamTable) -> None:
for index, label in zip(params.index, values):
self._set_provider(
Item((Label(tp=params.row_dim, index=index),), param_name),
lambda label=label: label,
_provider_for_constant(label),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note the logic in ArgSpec.from_provider, implements a hard requirement for not allowing arguments without type hints as per #115. Thus, the old lambda does not work any more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense to expand the use of these fixtures or control the default scheduler globally. As it stands, tests either use dask or the naive scheduler depending on whether dask is installed. This makes it hard to understand how the tests are executed.

@SimonHeybrock
Copy link
Member

There are a bunch of places in visualize and iirc also in the html formatting that check for the internal lambda to detect certain things. That needs updating.

Copy link
Member

@SimonHeybrock SimonHeybrock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Minor questions (aside from the required fixes for visualize, etc.).

apply,
provider,
list(args.args),
(dict, [[key, val] for key, val in args.kwargs]),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A comment would be useful, it is not obvious for a reader what is going on here.

@@ -1208,3 +1208,43 @@ def process(
def test_html_repr() -> None:
pipeline = sl.Pipeline([make_int], params={float: 5.0})
assert isinstance(pipeline._repr_html_(), str)


def test_pipeline_keyword_argument_and_param(scheduler: sl.scheduler.Scheduler) -> None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you name this (and below) with something like keyword_only in the test name? I think keyword args have always been supported?

Comment on lines 80 to 85
def call(self, provider: Provider, values: dict[Key, Any]) -> Any:
"""Call a compatible provider with arguments extracted from ``values``."""
return provider(
*(values[arg] for arg in self._args.values()),
**{key: values[arg] for key, arg in self._kwargs.items()},
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this belong here? It seems it is used only by NaiveScheduler?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, I don't like this one either

@SimonHeybrock SimonHeybrock changed the title Support keyword arguments Support keyword-only arguments Feb 5, 2024
@jl-wynen
Copy link
Member Author

jl-wynen commented Feb 5, 2024

There are a bunch of places in visualize and iirc also in the html formatting that check for the internal lambda to detect certain things. That needs updating.

I considered also introducing a dedicated Provider class. This would encapsulate these checks. E.g., when the class is instantiated, we could specify whether it is a regular provider, parameter, or series provider. Then we don't have to mess with detecting lambdas. Would this be a suitable change here?

@SimonHeybrock
Copy link
Member

There are a bunch of places in visualize and iirc also in the html formatting that check for the internal lambda to detect certain things. That needs updating.

I considered also introducing a dedicated Provider class. This would encapsulate these checks. E.g., when the class is instantiated, we could specify whether it is a regular provider, parameter, or series provider. Then we don't have to mess with detecting lambdas. Would this be a suitable change here?

Sounds reasonable!

@@ -7,7 +7,8 @@

from .scheduler import DaskScheduler, NaiveScheduler, Scheduler
from .typing import Graph, Item
from .utils import keyname, kind_of_provider

# from .utils import keyname, kind_of_provider
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To remove?

if qualname(p) == f'{qualname(Pipeline.set_param_table)}.<locals>.<lambda>':
return 'table'
return 'function'
# def kind_of_provider(p: Callable[..., Any]) -> ProviderKind:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove?

f'{_qualname(Pipeline.set_param_table)}.<locals>.<lambda>',
unsatisfied,
):
if formatted_p.kind in ('parameter', 'table', 'sentinel'):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is sentinel here? And why are we still using unsatisfied above but not here?

@@ -350,6 +350,7 @@ def test_can_groupby_by_requesting_series_of_series() -> None:
Param1,
{1: sl.Series(Row, {0: 4, 1: 5}), 3: sl.Series(Row, {2: 6})},
)
print(list(pl._providers.values()))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove?



ToProvider = Callable[..., Any]
ProviderKind = Literal['function', 'parameter', 'series', 'table', 'sentinel']
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason why we use literals instead of, e.g., an enum?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was like this and I didn't want to change too much. I don't mind either way

)

@classmethod
def table(cls, param: Any) -> Provider:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it is more like a table-cell than a table?

arg_spec=ArgSpec.null(),
kind='table',
location=ProviderLocation(
name=f'table({type(param).__name__})', module='sciline'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question about the module as above.


@property
def qualname(self) -> str:
# TODO merge with location
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

todo

Comment on lines +342 to +343
provider = Provider.from_function(provider)
self._set_provider(provider.deduce_key(), provider)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why allow creation of the provider if the key cannot be deduced? Should the check be moved into from_function?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from_function can, and probably should do this check. But we need to check again in deduce_key because we cannot deduce non-function provider keys.

@@ -406,12 +418,12 @@ def set_param_table(self, params: ParamTable) -> None:
for index, label in zip(params.index, values):
self._set_provider(
Item((Label(tp=params.row_dim, index=index),), param_name),
_provider_for_constant(label),
Provider.table(label),
)
for index, label in zip(params.index, params.index):
self._set_provider(
Item((Label(tp=params.row_dim, index=index),), params.row_dim),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessary, but could this clumsy key setup be absorbed into the Provider class, similar to the function case? Provider.table would obviously need more arguments.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you store the key in the provider? This would duplicate the key as we still need it in the graph / pipeline.

@jl-wynen
Copy link
Member Author

jl-wynen commented Feb 6, 2024

I think I addressed all you comments.

Comment on lines 83 to 84
def table_row(cls, param: Any) -> Provider:
"""Construct a provider that returns the label for a table row."""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is not a row that is being returned, just a single cell (i.e., specific row and column).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a single value. E.g. in https://scipp.github.io/sciline/user-guide/parameter-tables.html#Computing-results-for-series-of-parameters it is a single int, i.e., the RunID, but not the Filename.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, i.e., rename?

"""Callable that can be converted to a provider."""

ProviderKind = Literal[
'function', 'parameter', 'series', 'table', 'sentinel', 'unsatisfied'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is 'sentinel' still in use?

Copy link
Member Author

@jl-wynen jl-wynen Feb 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, in

graph[tp] = _ParamSentinel(self._param_name_to_table_key[tp])
and
graph[tp] = _ParamSentinel(tp)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, but it will never make it to, e.g., visualize, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It shouldn't.

@@ -27,7 +27,7 @@
"""Callable that can be converted to a provider."""

ProviderKind = Literal[
'function', 'parameter', 'series', 'table', 'sentinel', 'unsatisfied'
'function', 'parameter', 'series', 'table_label', 'sentinel', 'unsatisfied'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why label? I thought that would refer to columns, or it at least feels unclear?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because this is what it's called in the code:

Provider.table_label(label),

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but shouldn't that be renamed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't know. Pipeline seems to use 'label' and 'index' interchangeably for tables. So we could rename it to index?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this is two different things. If you rename it to index it would mean an entry of the index column:

  • A param table has an "index" column, and one or more data columns. Columns are labeled using a type.
  • The index is a bit like an index in a pandas.DataFrame (and it also has a column label, the type of the index).
  • A cell in the table is identified using an index value, i.e., row index (an index, in the alternate meaning of the word index, i.e., an entry of the index column) and a column label.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You renamed the method to table_cell, but here it is still table_label.

@jl-wynen jl-wynen merged commit c725afd into main Feb 7, 2024
5 checks passed
@jl-wynen jl-wynen deleted the support-kwargs branch February 7, 2024 12:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support keyword arguments in providers
2 participants