-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Transformations: "Parallel" sub-module with driver-level parallelilsation utilities #415
Conversation
Documentation for this branch can be viewed at https://sites.ecmwf.int/docs/loki/415/index.html |
8d95562
to
dd545a0
Compare
] | ||
|
||
|
||
def remove_field_api_view_updates(routine, field_group_types, dim_object=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is case-sensitive, but it may be good to make it case-insensitive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch! The transformation is actual case-insensitive, but the warning checks are not. I've added come camel-case testing and checking of the generated warnings now.
57eab72
to
c42095b
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #415 +/- ##
==========================================
+ Coverage 93.16% 93.23% +0.07%
==========================================
Files 205 212 +7
Lines 39832 40259 +427
==========================================
+ Hits 37109 37535 +426
- Misses 2723 2724 +1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left some comments, mostly on typos in docstrings. It was useful for me to see some good examples of the visitor pattern in action in Loki! The caplog functionality of pytests seems very useful, I didn't know that it was this easy to check the warning messages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Many many thanks for these great utilities 🙏 The lack of driver layer transformations has led to a not insignificant amount of code duplication in certain places, which of course causes conflicts when merging science changes. So these will really help cleanup loki integration with the IFS 🥳
I've left a bunch of comments that I think should be addressed, mostly about extending test coverage and typos. There are also some more nitpicky ones which I've marked as optional.
Finally, these utilities in their current guise won't quite work out-of-the-box on ecWAM or the IFS dy-core, but I will address that in followup PRs.
""" | ||
Remove any outer block :any:`Loop` from a given :any:`Subroutine. | ||
|
||
The loops are identified accoerding to a given :any:`Dimension` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo: according
|
||
def visit_PragmaRegion(self, region, **kwargs): | ||
""" | ||
Perform the fileterin and removeal of OpenMP pragma regions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo: Filtering and removal
return region | ||
|
||
if kwargs['active'] and region.pragma.keyword.lower() == 'omp': | ||
# Remove other OpenMP pragam regions when in active mode |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo: OpenMP pragma regions
Add the OpenMP directives for a parallel driver region with an | ||
outer block loop. | ||
""" | ||
block_dim_size = 'YDGEOMETRY%YRDIM%NGPBLKS' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably be configurable, so perhaps a block_dim_size
argument with a default value of YDGEOMETRY%YRDIM%NGPBLKS
.
|
||
# Accumulate the set of locally used symbols and chase parents | ||
symbols = tuple(region.uses_symbols | region.defines_symbols) | ||
symbols = tuple(dict.fromkeys(flatten( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optional: this could also be symbols = tuple(set(routine.variable_map[s.name_parts[0]] for s in symbols))
, which is perhaps a little more intuitive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Never mind, my suggestion doesn't make sense, I had originally misunderstood what this line does. Sorry for the noise 😅
with pragma_regions_attached(routine): | ||
with dataflow_analysis_attached(routine): | ||
for region in FindNodes(ir.PragmaRegion).visit(routine.body): | ||
if not is_loki_pragma(region.pragma, starts_with='parallel'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optional: this bailout is untested.
def visit_PragmaRegion(self, region, **kwargs): # pylint: disable=unused-argument | ||
# Apply to pragma-marked "parallel" regions only | ||
if not is_loki_pragma(region.pragma, starts_with='parallel'): | ||
return region |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optional: this bailout is untested.
lhs = assign.lhs.name | ||
if lhs in fprivate_map and assign.rhs == fprivate_map[lhs]: | ||
return None | ||
return assign |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we please test that the assignments that aren't in the privatisation map remain unaffected?
""" | ||
block_dim_size = 'YDGEOMETRY%YRDIM%NGPBLKS' | ||
|
||
global_variables = global_variables or {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I understand it, these are the variables that compilers struggle to firstprivatise, so perhaps it should be renamed to skip_firstprivatise_vars
or something similar. global_variables
should be reserved for module imports. And could the tests also be extended to cover this functionality?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed that "global" is ambiguous in this context, but skip_firstprivate_vars
is also to long and unclear (they can also not be "private"). How about shared_variables
to stick with OpenMP semantics?
local_vars = tuple(v for v in local_vars if v.name not in global_variables) | ||
|
||
# Make field group types firstprivate | ||
firstprivates = tuple(dict.fromkeys( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[noaction] In principle it would be great to rely primarily on the dataflow to determine what should be privatised and firstprivatised with a few known overrides. Unfortunately this quickly becomes impractical due to the missing intents on the SELF
dummy argument in the type-bound procedures, not to mention the need to first parse all of the corresponding type-definitions. So for now we can leave this as is, but it would be nice to generalise this in a future PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, but as I have a bunch of other modifications for this already lined up, I'd like to push this out to future PRs for now (when I have better real-life coverage for this).
Many thanks for the thorough review, @wertysas and @awnawab . I've pushed updates to the tests, comments and docstrings with small modifications to the methods themselves to address your comments. I've also renamed some of the "firstprivate-copies" routines to bring them in line with the naming convention for the other utilities. I've kept everything clean on top of the current branch to give you a clean diff - please let me know when if you're happy, so I can do another quick rebase over main. |
@@ -53,7 +52,7 @@ class RemoveOpenMPRegionTransformer(Transformer): | |||
|
|||
def visit_PragmaRegion(self, region, **kwargs): | |||
""" | |||
Perform the fileterin and removeal of OpenMP pragma regions. | |||
Perform the filetering and removal of OpenMP pragma regions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perform the filetering and removal of OpenMP pragma regions. | |
Perform the filtering and removal of OpenMP pragma regions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, thanks, will mop this up in the rebase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Many thanks for addressing all the comments, this now looks good to go 👌
@@ -96,14 +95,26 @@ def visit_Pragma(self, pragma, **kwargs): | |||
routine.body = RemoveOpenMPRegionTransformer().visit(routine.body, active=False) | |||
|
|||
|
|||
def add_openmp_regions(routine, global_variables=None, field_group_types=None): | |||
def add_openmp_regions( | |||
routine, dimension=None, shared_variables=None, field_group_types=None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
routine, dimension=None, shared_variables=None, field_group_types=None | |
routine, dimension, shared_variables=None, field_group_types=None |
I didn't realise this until the dimension was added to filter loops by loop variable, but attributes of the dimension parameter are accessed in this method, hence it shouldn't be set to None
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, very good catch! Again, I'll mop this up in the rebase.
Optionally, this utility can also replace these outer pragmas with `!$loki parallal` pragma markers.
OMNI has trouble when combining Loop and region pragmas, since it swallows the empy lines we use to differentiate them.
This is to shorten the method names and bring them in line with the naming convention of the sub-package.
c6c7b50
to
2d361bc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: This sits on top of #414 , as it requires the new
Dimension
overloading mechanism for block-loop dimensions.This change brings a set of utilities that form the basis of "driver"-level control flow manipulation for IFS-style block loops, OpenMP parallel regions and FIELD API boilerplate code. They are loose packaged in a new sub-module
loki.transformations.parallel
and are intended to be extended at a later stage. The come in remove / add pairs, and currently only pertain to host-side driver loops with FIELD-API view updates.I've also piggy-backed two small IR-level fixes (logging string and a weird corner case for pragma region detection), the latter of which was needed for some of the testing of the new functionalities.
In a little more detail, the new sub-package provides:
!$omp parallel
sections and optionally inject!$loki parallel
regions in their place.!$omp parallel
sections for!$loki parallel
markers, where "private" and "firstprivate" clauses are derived using dataflow analysisobject%update_view(ibl)
for derived-type objects that have specifically named types. This can also insert the slightly deviatingdimension%update(ibl, icst, icend)
updates.The transformations that require detection of IFS-style "driver" block loops depend on a specific overloading of the primary/secondary lower/upper/step properties of the re-written
Dimension
object - a very specific convention that I'm open to discussing further (still seems hacky!). 😞