Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve parse_expr and use in process_dimension_pragmas #292

Merged
merged 11 commits into from
Apr 25, 2024

Conversation

MichaelSt98
Copy link
Collaborator

Improve parse_expr and use in process_dimension_pragmas.

  • add correct parsing of : to RangeIndex((None, None))
  • support DerivedType s

Copy link

Documentation for this branch can be viewed at https://sites.ecmwf.int/docs/loki/292/index.html

Copy link

codecov bot commented Apr 18, 2024

Codecov Report

Attention: Patch coverage is 98.21429% with 5 lines in your changes are missing coverage. Please review.

Project coverage is 94.96%. Comparing base (44bccfd) to head (2b8c5cb).
Report is 33 commits behind head on main.

Files Patch % Lines
loki/expression/parser.py 94.59% 4 Missing ⚠️
loki/ir/pragma_utils.py 98.11% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #292      +/-   ##
==========================================
+ Coverage   94.94%   94.96%   +0.01%     
==========================================
  Files         153      153              
  Lines       32008    32427     +419     
==========================================
+ Hits        30391    30795     +404     
- Misses       1617     1632      +15     
Flag Coverage Δ
lint_rules 96.39% <ø> (ø)
loki 95.13% <98.21%> (+0.01%) ⬆️
transformations 92.17% <ø> (+0.06%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@MichaelSt98
Copy link
Collaborator Author

It's probably possible to combine the logic of

_get_pragma_parameters_re = re.compile(r'(?P<command>[\w-]+)\s*(?:\((?P<arg>.+?)\))?')

and

_get_pragma_dim_parameter_re = re.compile(r'(?P<command>[\w-]+)\s*(?:\((?P<arg>.*)\))?')

However, I think it makes more sense to introduce the pattern argument in get_pragma_parameters() ...

@MichaelSt98 MichaelSt98 requested a review from reuterbal April 18, 2024 12:21
@MichaelSt98
Copy link
Collaborator Author

Some improvements to the expression parser included in this PR:

Executing:

from loki import parse_expr

class Foo:
    val3 = 1
    arr = [[1, 2], [3, 4]]
    def __init__(self, _val1, _val2):
        self.val1 = _val1
        self.val2 = _val2
    def some_func(self, a, b):
        return a + b
    @staticmethod
    def static_func(a):
        return 2*a

context = {'foo': Foo(2, 3)}
test_str = 'foo%val1 + foo%val2 + foo%val3'
parsed = parse_expr(f'{test_str}')
print(f"parsing: {test_str} with context: '{context}'\n  results in '{parsed}'")
parsed = parse_expr(f'{test_str}', evaluate=True, context=context)
print(f"parsing and evaluating: {test_str} with context: '{context}'\n  results in '{parsed}'")
test_str = 'foo%val1 + foo%some_func(1, 2) + foo%static_func_2(3)'
parsed = parse_expr(f'{test_str}', evaluate=True, context=context)
print(f"parsing and evaluating: {test_str} with context: '{context}'\n  results in '{parsed}'")
test_str = 'foo%val1 + foo%some_func(1, 2) + foo%static_func(3) + foo%arr(1, 2)'
parsed = parse_expr(f'{test_str}', evaluate=True, context=context, strict=True)
print(f"parsing and evaluating: {test_str} with context: '{context}'\n  results in '{parsed}'")
test_str = 'foo%val1 + foo%some_func(1, b=2) + foo%static_func(a=3) + foo%arr(1, 2)'
parsed = parse_expr(f'{test_str}', evaluate=True, context=context, strict=True)
print(f"parsing and evaluating: {test_str} with context: '{context}'\n  results in '{parsed}'")

gives

parsing: foo%val1 + foo%val2 + foo%val3 with context: '{'foo': <__main__.Foo object at 0x1494ff19c3d0>}'
  results in 'foo%val1 + foo%val2 + foo%val3'
parsing and evaluating: foo%val1 + foo%val2 + foo%val3 with context: '{'foo': <__main__.Foo object at 0x1494ff19c3d0>}'
  results in '6'
parsing and evaluating: foo%val1 + foo%some_func(1, 2) + foo%static_func_2(3) with context: '{'foo': <__main__.Foo object at 0x1494ff19c3d0>}'
  results in '5 + foo%static_func_2(3)'
parsing and evaluating: foo%val1 + foo%some_func(1, 2) + foo%static_func(3) + foo%arr(1, 2) with context: '{'foo': <__main__.Foo object at 0x1494ff19c3d0>}'
  results in '13'
parsing and evaluating: foo%val1 + foo%some_func(1, b=2) + foo%static_func(a=3) + foo%arr(1, 2) with context: '{'foo': <__main__.Foo object at 0x1494ff19c3d0>}'
  results in '13'

Copy link
Collaborator

@reuterbal reuterbal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many thanks, the improvements to parse_expr and how it can deal with derived types is absolutely fantastic! I'm really impressed by the degree of functionality you have created in a short amount of time.

Just out of curiosity but not necessarily something we need: would the evaluation also work for nested derived types? Say something like

class Bar:
    baz = 4

class Foo:
    bar = Bar()

parse_expr('1 + foo%bar%baz', context={'foo': Foo()}, evaluate=True)

I am, however, not convinced by the integration in process_dimension_pragmas. The workaround with the expanded regex pattern breaks down as soon as there's more than one set of nested parentheses, and it's fundamentally a problem that cannot be solved by regex matching.

Therefore, I would propose to re-implement the get_pragma_parameters utility to provide this functionality. But we can also do this as a separate PR and leave out the support for parentheses in this PR.


map_range = map_slice
map_range_index = map_slice
map_loop_range = map_slice

def map_variable(self, expr, *args, **kwargs):
return sym.Variable(name=expr.name)
parent = kwargs.pop('parent', None)
return sym.Variable(name=expr.name, parent=parent) # , **kwargs)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return sym.Variable(name=expr.name, parent=parent) # , **kwargs)
return sym.Variable(name=expr.name, parent=parent)

Comment on lines 272 to 275
kwargs = {
k: self.rec(v)
for k, v in expr.name.kw_parameters.items()}
kwargs = CaseInsensitiveDict(kwargs)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could build the CaseInsensitiveDict directly like

Suggested change
kwargs = {
k: self.rec(v)
for k, v in expr.name.kw_parameters.items()}
kwargs = CaseInsensitiveDict(kwargs)
kwargs = CaseInsensitiveDict(
(k, self.rec(v))
for k, v in expr.name.kw_parameters.items()
)

loki/expression/parser.py Show resolved Hide resolved
Comment on lines +2104 to +2108
context = {'arr': [[1, 2], [3, 4]]}
test_str = '1 + arr(1, 2)'
parsed = parse_expr(convert_to_case(f'{test_str}', mode=case), evaluate=True, context=context)
assert parsed == 3

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure my brain is fully awake, but do we not have a row-major vs. column-major conflict here? From the definition, arr should look like this to me:

1  2
3  4

And I would expect arr(1, 2) to be interpreted as column-major, which should yield 3.
So, the evaluated result should be 4, unless I'm misunderstanding/missing something of course.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed offline ...

test_str = 'foo%val1 + foo%val2 + foo%val3'
parsed = parse_expr(convert_to_case(f'{test_str}', mode=case))
assert str(parsed).lower().replace(' ', '') == 'foo%val1+foo%val2+foo%val3'
with pytest.raises(Exception):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we narrow down the expected exception type here?

test_str = 'foo%val1 + foo%some_func(1, 2) + foo%static_func_2(3)'
parsed = parse_expr(convert_to_case(f'{test_str}', mode=case), evaluate=True, context=context)
assert str(parsed).lower().replace(' ', '') == '5+foo%static_func_2(3)'
with pytest.raises(Exception):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment about exception type

parameters[match.group('command')].append(match.group('arg'))
parameters = {k: v if len(v) > 1 else v[0] for k, v in parameters.items()}
return parameters


def process_dimension_pragmas(ir):
_get_pragma_dim_parameter_re = re.compile(r'(?P<command>[\w-]+)\s*(?:\((?P<arg>.*)\))?')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't work reliably, unfortunately. Counterexample:

!$loki something key1((val1 + 1)/2) key2((var1+1/(var2-5))/(2-3))

Generally, regex break down on recursive tasks like matching parentheses.

Using a REGEX for this utility was a lazy way of solving the problem at the time, but it's likely no longer the correct way of dealing with this. Instead, we should scan the string, break the chunks down but match parentheses in the process.

A similar utility exists in the REGEX frontend to match parentheses in strings:
https://github.com/ecmwf-ifs/loki/blob/main/loki/frontend/regex.py#L280-L344

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. I did what you suggested based on the link you provided.

@@ -14,9 +14,8 @@
from loki.ir.find import FindNodes
from loki.ir.transformer import Transformer
from loki.ir.visitor import Visitor

# from loki.expression.parser import parse_expr
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# from loki.expression.parser import parse_expr

@MichaelSt98
Copy link
Collaborator Author

Regarding your nested derived types question:
It was not possible, but I've added a commit to allow nested derived types.

With this, executing:

from loki import parse_expr

class Bar:
    baz = 4

class Foo:
    bar = Bar()

ir = parse_expr('1 + foo%bar%baz', context={'foo': Foo()}, evaluate=True)
print(ir)

yields 5.

There are also more sophisticated tests for nested derived types in expression/tests/test_expression.py

Copy link
Collaborator

@reuterbal reuterbal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many thanks!
The much improved get_pragma_parameters is excellent! And the support for nested derived types was purely curiousity, the fact that this works now with a small set of changes really emphasizes how incredibly great this is!

Just a final request for a small docstring, and I'm a little puzzled by the NotImplementedError in the evaluator.

loki/expression/parser.py Show resolved Hide resolved
pragma = as_tuple(pragma)
parameters = defaultdict(list)
for p in pragma:
parameter = None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is redundant, no?


def get_pragma_parameters(pragma, starts_with=None, only_loki_pragmas=True,
pattern=_get_pragma_parameters_re):
class PragmaParameters:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fantastic, many thanks!
Would you mind adding just a two-line docstring explaining what it's for to both the class and the find method?

Comment on lines +289 to +290
if self.strict:
raise NotImplementedError
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand the control-flow here. What's not implemented when we run through cleanly without an exception?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can discuss about that. However, I thought if strict = True one wants to have an evaluated expression which in this case is not true as it would return the unchanged expression itself?! What do you think?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, you're right of course. I missed the fact that we won't reach this point if it's one of the known "lookup" nodes. Thanks for clarifying!

Copy link
Collaborator

@reuterbal reuterbal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent, many thanks!

@reuterbal reuterbal added the ready to merge This PR has been approved and is ready to be merged label Apr 25, 2024
@reuterbal reuterbal merged commit 5673795 into main Apr 25, 2024
12 checks passed
@reuterbal reuterbal deleted the nams_process_dimension_pragmas_parse_expr branch April 25, 2024 14:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready to merge This PR has been approved and is ready to be merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants