Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transformations: "Parallel" sub-module with driver-level parallelilsation utilities #415

Merged
merged 18 commits into from
Nov 15, 2024

Conversation

mlange05
Copy link
Collaborator

Note: This sits on top of #414 , as it requires the new Dimension overloading mechanism for block-loop dimensions.

This change brings a set of utilities that form the basis of "driver"-level control flow manipulation for IFS-style block loops, OpenMP parallel regions and FIELD API boilerplate code. They are loose packaged in a new sub-module loki.transformations.parallel and are intended to be extended at a later stage. The come in remove / add pairs, and currently only pertain to host-side driver loops with FIELD-API view updates.

I've also piggy-backed two small IR-level fixes (logging string and a weird corner case for pragma region detection), the latter of which was needed for some of the testing of the new functionalities.

In a little more detail, the new sub-package provides:

  • Utilities to remove !$omp parallel sections and optionally inject !$loki parallel regions in their place.
  • A utility to re-inject !$omp parallel sections for !$loki parallel markers, where "private" and "firstprivate" clauses are derived using dataflow analysis
  • Utilities to remove and create explicit copies for derived types that would trigger a known "firstprivate" bug. This common workarounds is inserted for explicitly named variables and is thus user-driven.
  • Utilities to remove and create the common FIELD API object%update_view(ibl) for derived-type objects that have specifically named types. This can also insert the slightly deviating dimension%update(ibl, icst, icend) updates.

The transformations that require detection of IFS-style "driver" block loops depend on a specific overloading of the primary/secondary lower/upper/step properties of the re-written Dimension object - a very specific convention that I'm open to discussing further (still seems hacky!). 😞

@mlange05 mlange05 requested review from reuterbal and awnawab October 25, 2024 08:04
Copy link

Documentation for this branch can be viewed at https://sites.ecmwf.int/docs/loki/415/index.html

@mlange05 mlange05 force-pushed the naml-transformations-parallel branch from 8d95562 to dd545a0 Compare November 7, 2024 19:40
]


def remove_field_api_view_updates(routine, field_group_types, dim_object=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is case-sensitive, but it may be good to make it case-insensitive.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch! The transformation is actual case-insensitive, but the warning checks are not. I've added come camel-case testing and checking of the generated warnings now.

Copy link

codecov bot commented Nov 11, 2024

Codecov Report

Attention: Patch coverage is 99.06542% with 4 lines in your changes missing coverage. Please review.

Project coverage is 93.23%. Comparing base (358fb60) to head (2d361bc).
Report is 19 commits behind head on main.

Files with missing lines Patch % Lines
loki/transformations/parallel/openmp_region.py 97.89% 2 Missing ⚠️
loki/ir/pragma_utils.py 0.00% 1 Missing ⚠️
loki/transformations/parallel/field_api.py 98.33% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #415      +/-   ##
==========================================
+ Coverage   93.16%   93.23%   +0.07%     
==========================================
  Files         205      212       +7     
  Lines       39832    40259     +427     
==========================================
+ Hits        37109    37535     +426     
- Misses       2723     2724       +1     
Flag Coverage Δ
lint_rules 96.39% <ø> (ø)
loki 93.18% <99.06%> (+0.07%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@mlange05 mlange05 marked this pull request as ready for review November 11, 2024 14:54
Copy link
Contributor

@wertysas wertysas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some comments, mostly on typos in docstrings. It was useful for me to see some good examples of the visitor pattern in action in Loki! The caplog functionality of pytests seems very useful, I didn't know that it was this easy to check the warning messages.

loki/transformations/parallel/block_loop.py Outdated Show resolved Hide resolved
loki/transformations/parallel/block_loop.py Outdated Show resolved Hide resolved
loki/transformations/parallel/block_loop.py Outdated Show resolved Hide resolved
loki/transformations/parallel/block_loop.py Outdated Show resolved Hide resolved
loki/transformations/parallel/field_api.py Outdated Show resolved Hide resolved
loki/transformations/parallel/field_api.py Show resolved Hide resolved
loki/transformations/parallel/block_loop.py Outdated Show resolved Hide resolved
loki/transformations/parallel/openmp_region.py Outdated Show resolved Hide resolved
loki/transformations/parallel/openmp_region.py Outdated Show resolved Hide resolved
loki/transformations/parallel/tests/test_field_api.py Outdated Show resolved Hide resolved
Copy link
Contributor

@awnawab awnawab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many many thanks for these great utilities 🙏 The lack of driver layer transformations has led to a not insignificant amount of code duplication in certain places, which of course causes conflicts when merging science changes. So these will really help cleanup loki integration with the IFS 🥳

I've left a bunch of comments that I think should be addressed, mostly about extending test coverage and typos. There are also some more nitpicky ones which I've marked as optional.

Finally, these utilities in their current guise won't quite work out-of-the-box on ecWAM or the IFS dy-core, but I will address that in followup PRs.

"""
Remove any outer block :any:`Loop` from a given :any:`Subroutine.

The loops are identified accoerding to a given :any:`Dimension`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: according


def visit_PragmaRegion(self, region, **kwargs):
"""
Perform the fileterin and removeal of OpenMP pragma regions.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: Filtering and removal

return region

if kwargs['active'] and region.pragma.keyword.lower() == 'omp':
# Remove other OpenMP pragam regions when in active mode
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: OpenMP pragma regions

Add the OpenMP directives for a parallel driver region with an
outer block loop.
"""
block_dim_size = 'YDGEOMETRY%YRDIM%NGPBLKS'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be configurable, so perhaps a block_dim_size argument with a default value of YDGEOMETRY%YRDIM%NGPBLKS.


# Accumulate the set of locally used symbols and chase parents
symbols = tuple(region.uses_symbols | region.defines_symbols)
symbols = tuple(dict.fromkeys(flatten(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional: this could also be symbols = tuple(set(routine.variable_map[s.name_parts[0]] for s in symbols)), which is perhaps a little more intuitive.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Never mind, my suggestion doesn't make sense, I had originally misunderstood what this line does. Sorry for the noise 😅

with pragma_regions_attached(routine):
with dataflow_analysis_attached(routine):
for region in FindNodes(ir.PragmaRegion).visit(routine.body):
if not is_loki_pragma(region.pragma, starts_with='parallel'):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional: this bailout is untested.

def visit_PragmaRegion(self, region, **kwargs): # pylint: disable=unused-argument
# Apply to pragma-marked "parallel" regions only
if not is_loki_pragma(region.pragma, starts_with='parallel'):
return region
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional: this bailout is untested.

lhs = assign.lhs.name
if lhs in fprivate_map and assign.rhs == fprivate_map[lhs]:
return None
return assign
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we please test that the assignments that aren't in the privatisation map remain unaffected?

"""
block_dim_size = 'YDGEOMETRY%YRDIM%NGPBLKS'

global_variables = global_variables or {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I understand it, these are the variables that compilers struggle to firstprivatise, so perhaps it should be renamed to skip_firstprivatise_vars or something similar. global_variables should be reserved for module imports. And could the tests also be extended to cover this functionality?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed that "global" is ambiguous in this context, but skip_firstprivate_vars is also to long and unclear (they can also not be "private"). How about shared_variables to stick with OpenMP semantics?

local_vars = tuple(v for v in local_vars if v.name not in global_variables)

# Make field group types firstprivate
firstprivates = tuple(dict.fromkeys(
Copy link
Contributor

@awnawab awnawab Nov 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[noaction] In principle it would be great to rely primarily on the dataflow to determine what should be privatised and firstprivatised with a few known overrides. Unfortunately this quickly becomes impractical due to the missing intents on the SELF dummy argument in the type-bound procedures, not to mention the need to first parse all of the corresponding type-definitions. So for now we can leave this as is, but it would be nice to generalise this in a future PR.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, but as I have a bunch of other modifications for this already lined up, I'd like to push this out to future PRs for now (when I have better real-life coverage for this).

@mlange05
Copy link
Collaborator Author

Many thanks for the thorough review, @wertysas and @awnawab . I've pushed updates to the tests, comments and docstrings with small modifications to the methods themselves to address your comments. I've also renamed some of the "firstprivate-copies" routines to bring them in line with the naming convention for the other utilities. I've kept everything clean on top of the current branch to give you a clean diff - please let me know when if you're happy, so I can do another quick rebase over main.

@@ -53,7 +52,7 @@ class RemoveOpenMPRegionTransformer(Transformer):

def visit_PragmaRegion(self, region, **kwargs):
"""
Perform the fileterin and removeal of OpenMP pragma regions.
Perform the filetering and removal of OpenMP pragma regions.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Perform the filetering and removal of OpenMP pragma regions.
Perform the filtering and removal of OpenMP pragma regions.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, thanks, will mop this up in the rebase.

Copy link
Contributor

@awnawab awnawab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many thanks for addressing all the comments, this now looks good to go 👌

@@ -96,14 +95,26 @@ def visit_Pragma(self, pragma, **kwargs):
routine.body = RemoveOpenMPRegionTransformer().visit(routine.body, active=False)


def add_openmp_regions(routine, global_variables=None, field_group_types=None):
def add_openmp_regions(
routine, dimension=None, shared_variables=None, field_group_types=None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
routine, dimension=None, shared_variables=None, field_group_types=None
routine, dimension, shared_variables=None, field_group_types=None

I didn't realise this until the dimension was added to filter loops by loop variable, but attributes of the dimension parameter are accessed in this method, hence it shouldn't be set to None.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, very good catch! Again, I'll mop this up in the rebase.

@mlange05 mlange05 force-pushed the naml-transformations-parallel branch from c6c7b50 to 2d361bc Compare November 14, 2024 16:38
@mlange05 mlange05 added the ready for merge This PR has been approved and is ready to be merged label Nov 14, 2024
Copy link
Collaborator

@reuterbal reuterbal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this looks great!

Many thanks also to @awnawab and @wertysas for your diligent review and strict enforcement of testing - much appreciated!

@mlange05 mlange05 merged commit 00d4290 into main Nov 15, 2024
13 checks passed
@mlange05 mlange05 deleted the naml-transformations-parallel branch November 15, 2024 11:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready for merge This PR has been approved and is ready to be merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants