-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DataOffload: Add new FieldOffloadTransformation for FIELD API boilerplate injection #437
Conversation
Documentation for this branch can be viewed at https://sites.ecmwf.int/docs/loki/437/index.html |
…allStatement pre_init method
4acd385
to
ff5ddee
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #437 +/- ##
==========================================
+ Coverage 93.23% 93.31% +0.08%
==========================================
Files 212 212
Lines 40259 40728 +469
==========================================
+ Hits 37535 38007 +472
+ Misses 2724 2721 -3
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚨 Try these New Features:
|
6b9b2e4
to
7e53319
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent contribution and very nicely structured and test; many thanks! This is a great start towards a very comprehensive set of tools for dealing with FIELD API boilerplate.
I've left a bunch of cosmetic comments (mostly docstrings, and minor style things), and a few notes to myself. The reason for the latter is that this one does not yet cover more general cases with compute code in the driver loop, and it will also likely require some restructuring for the utilities to be used outside the DataOffloadTransformation
- but all of these will happen in a follow-on consolidation effort and should not stop this.
return Module.from_source(fcode, frontend=frontend, xmods=[tmp_path]) | ||
|
||
@pytest.fixture(name="field_module") | ||
def fixture_field_module(tmp_path, frontend): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Neat! 😏
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah! It was a good suggestion to use a fixture for this to reduce boilerplate in the tests.
|
||
def __init__(self, devptr_prefix=None, field_group_types=None, offload_index=None): | ||
self.deviceptr_prefix = 'loki_devptr_' if devptr_prefix is None else devptr_prefix | ||
field_group_types = [''] if field_group_types is None else field_group_types |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[no action] For future use, as_tuple
is an internal utility that we use for this pattern everywhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice! And many thanks for addressing this so promptly. GTG from me.
…kwargs to named kwargs in FieldOffload init
e4010ba
to
5ee7c8c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Many thanks for this superb piece of work 👌 The implementation is really neat and well tested 🙏 I especially love your dedication to type hints (even though I am too lazy to adhere to it myself 😇 ).
I've left a few comments, mostly reminders for us/myself for the future but there are a couple which I would like you to please address.
loki/transformations/data_offload.py
Outdated
@@ -49,6 +52,12 @@ def __init__(self, **kwargs): | |||
self.has_data_regions = False | |||
self.remove_openmp = kwargs.get('remove_openmp', False) | |||
self.assume_deviceptr = kwargs.get('assume_deviceptr', False) | |||
self.assume_acc_mapped = kwargs.get('assume_acc_mapped', False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name here (and corresponding control flow) is a little confusing. If this option is not enabled, and the offload instructions are instrumented via DataOffloadTransformation
as per usual, they would still be acc mapped. I suggest we use two options, present_on_device
and assume_deviceptr
. The control flow would look like this:
if self.present_on_device (or self.assume_deviceptr):
# the "or" here is only needed if this suggested change messily breaks backwards compatibility,
# resolving which shouldn't hold back this PR
if self.assume_deviceptr:
# add deviceptr clause
else:
# add present clause
else:
# add copy/copyin/copyout clause
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The option should also be documented, it's missing from the "parameters".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, updated it according to your suggestion now.
loki/transformations/data_offload.py
Outdated
if self.assume_deviceptr: | ||
offload_args = inargs + outargs + inoutargs | ||
if offload_args: | ||
deviceptr = f' deviceptr({", ".join(offload_args)})' | ||
else: | ||
deviceptr = '' | ||
pragma = Pragma(keyword='acc', content=f'data{deviceptr}') | ||
elif self.assume_acc_mapped: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the short term this should be refactored using the suggestion above. In the long term, I think the arrays that are already present on device via a previous transformation should be in a trafo_data entry, and DataOffloadTransformation
would then just instrument the offload for the remaining arrays. This would keep the current transformation more general.
|
||
|
||
|
||
class FieldAPITransferType(Enum): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice 👌
if not isinstance(param, Array): | ||
continue | ||
try: | ||
parent = arg.parent |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couldn't we do this with a parent = arg.getattr('parent', None)
and print the warning if not parent
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but I left it as is now, since getattr
does something similar under the hood.
driver.variables += device_ptrs | ||
return device_ptrs | ||
|
||
def _devptr_from_array(self, driver, a: sym.Array): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really love the type hints ❤️
|
||
def _get_field_ptr_from_view(self, field_view): | ||
type_chain = field_view.name.split('%') | ||
field_type_name = 'F_' + type_chain[-1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[noaction] this is a sensible default, which we might want to make overridable per derived-type.
loki/transformations/data_offload.py
Outdated
change_map = {} | ||
offload_idx_expr = driver.variable_map[self.offload_index] | ||
for arg, devptr in chain(offload_map.in_pairs, offload_map.inout_pairs, offload_map.out_pairs): | ||
dims = (sym.RangeIndex((None, None)),) * (len(devptr.shape)-1) + (offload_idx_expr,) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we not clone arg.dims
here and add the offload_idx_expr
to it? We don't always want to pass the full array, we might for example pass FIELD_PTR(:,1,IBL)
if the field is 3D but the dummy argument is 2D. These are probably mistakes that we should fix in source, but the transformation should nevertheless support this edge-case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch! I had missed that. Fixed it now.
Scope of the created :any:`CallStatement` | ||
""" | ||
|
||
procedure_name = 'SYNC_HOST_RDWR' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[noaction] we might want to mirror the suffix logic of field_get_device_data
here.
…d requirement of present_on_device to assume_deviceptr
45c9e20
to
ac0aa75
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for addressing the changes so promptly 🙏 looks great 👌
This adds a new transformation,
FieldOffloadTransformation
, for offloading F-API fields to GPU. Specifically this transformation is meant to operate on CPU driver code that passes view pointers to the kernels. The transformationThe corresponding
dwarf-p-cloudsc
implementation is on the branch dwarf-p-cloudsc/je-field-api-offload-v2. This adds a new transformation pipeline for converting view based CPU driver code to GPU offloaded F-API code,[pipelines.scc-field]
, and a new Loki targetdwarf-cloudsc-loki-scc-field
.