Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataOffload: Add new FieldOffloadTransformation for FIELD API boilerplate injection #437

Merged
merged 12 commits into from
Nov 20, 2024

Conversation

wertysas
Copy link
Contributor

@wertysas wertysas commented Nov 11, 2024

This adds a new transformation, FieldOffloadTransformation, for offloading F-API fields to GPU. Specifically this transformation is meant to operate on CPU driver code that passes view pointers to the kernels. The transformation

  1. Removes the view update calls
  2. Adds new device pointers for the arrays to offload
  3. Adds F-API data offload and sync calls
  4. Replaces the old view-based arguments in kernel calls with device pointer slices

The corresponding dwarf-p-cloudsc implementation is on the branch dwarf-p-cloudsc/je-field-api-offload-v2. This adds a new transformation pipeline for converting view based CPU driver code to GPU offloaded F-API code, [pipelines.scc-field], and a new Loki target dwarf-cloudsc-loki-scc-field.

Copy link

Documentation for this branch can be viewed at https://sites.ecmwf.int/docs/loki/437/index.html

@wertysas wertysas requested review from mlange05 and awnawab November 11, 2024 13:26
@wertysas wertysas force-pushed the je-field-api-offload-index-from-config branch from 4acd385 to ff5ddee Compare November 15, 2024 12:39
Copy link

codecov bot commented Nov 15, 2024

Codecov Report

Attention: Patch coverage is 98.82353% with 4 lines in your changes missing coverage. Please review.

Project coverage is 93.31%. Comparing base (00d4290) to head (ac0aa75).
Report is 25 commits behind head on main.

Files with missing lines Patch % Lines
loki/transformations/data_offload.py 98.49% 2 Missing ⚠️
loki/transformations/parallel/field_api.py 96.42% 1 Missing ⚠️
loki/transformations/tests/test_data_offload.py 99.32% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #437      +/-   ##
==========================================
+ Coverage   93.23%   93.31%   +0.08%     
==========================================
  Files         212      212              
  Lines       40259    40728     +469     
==========================================
+ Hits        37535    38007     +472     
+ Misses       2724     2721       -3     
Flag Coverage Δ
lint_rules 96.39% <ø> (ø)
loki 93.27% <98.82%> (+0.08%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.


🚨 Try these New Features:

@wertysas wertysas force-pushed the je-field-api-offload-index-from-config branch from 6b9b2e4 to 7e53319 Compare November 18, 2024 16:31
@wertysas wertysas marked this pull request as ready for review November 18, 2024 19:14
Copy link
Collaborator

@mlange05 mlange05 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent contribution and very nicely structured and test; many thanks! This is a great start towards a very comprehensive set of tools for dealing with FIELD API boilerplate.

I've left a bunch of cosmetic comments (mostly docstrings, and minor style things), and a few notes to myself. The reason for the latter is that this one does not yet cover more general cases with compute code in the driver loop, and it will also likely require some restructuring for the utilities to be used outside the DataOffloadTransformation - but all of these will happen in a follow-on consolidation effort and should not stop this.

loki/transformations/data_offload.py Show resolved Hide resolved
loki/transformations/data_offload.py Show resolved Hide resolved
loki/transformations/data_offload.py Outdated Show resolved Hide resolved
loki/transformations/data_offload.py Outdated Show resolved Hide resolved
loki/transformations/data_offload.py Outdated Show resolved Hide resolved
loki/transformations/parallel/field_api.py Show resolved Hide resolved
return Module.from_source(fcode, frontend=frontend, xmods=[tmp_path])

@pytest.fixture(name="field_module")
def fixture_field_module(tmp_path, frontend):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neat! 😏

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah! It was a good suggestion to use a fixture for this to reduce boilerplate in the tests.

loki/transformations/tests/test_data_offload.py Outdated Show resolved Hide resolved
@mlange05 mlange05 changed the title Je field api offload index from config DataOffload: Add new FieldOffloadTransformation for FIELD API boilerplate injection Nov 19, 2024

def __init__(self, devptr_prefix=None, field_group_types=None, offload_index=None):
self.deviceptr_prefix = 'loki_devptr_' if devptr_prefix is None else devptr_prefix
field_group_types = [''] if field_group_types is None else field_group_types
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[no action] For future use, as_tuple is an internal utility that we use for this pattern everywhere.

Copy link
Collaborator

@mlange05 mlange05 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice! And many thanks for addressing this so promptly. GTG from me. :shipit:

@wertysas wertysas force-pushed the je-field-api-offload-index-from-config branch from e4010ba to 5ee7c8c Compare November 19, 2024 16:13
Copy link
Contributor

@awnawab awnawab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many thanks for this superb piece of work 👌 The implementation is really neat and well tested 🙏 I especially love your dedication to type hints (even though I am too lazy to adhere to it myself 😇 ).

I've left a few comments, mostly reminders for us/myself for the future but there are a couple which I would like you to please address.

@@ -49,6 +52,12 @@ def __init__(self, **kwargs):
self.has_data_regions = False
self.remove_openmp = kwargs.get('remove_openmp', False)
self.assume_deviceptr = kwargs.get('assume_deviceptr', False)
self.assume_acc_mapped = kwargs.get('assume_acc_mapped', False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name here (and corresponding control flow) is a little confusing. If this option is not enabled, and the offload instructions are instrumented via DataOffloadTransformation as per usual, they would still be acc mapped. I suggest we use two options, present_on_device and assume_deviceptr. The control flow would look like this:

if self.present_on_device (or self.assume_deviceptr):
# the "or" here is only needed if this suggested change messily breaks backwards compatibility,
# resolving which shouldn't hold back this PR
    if self.assume_deviceptr:
         # add deviceptr clause
    else:
         # add present clause
else:
    # add copy/copyin/copyout clause 

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The option should also be documented, it's missing from the "parameters".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, updated it according to your suggestion now.

if self.assume_deviceptr:
offload_args = inargs + outargs + inoutargs
if offload_args:
deviceptr = f' deviceptr({", ".join(offload_args)})'
else:
deviceptr = ''
pragma = Pragma(keyword='acc', content=f'data{deviceptr}')
elif self.assume_acc_mapped:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the short term this should be refactored using the suggestion above. In the long term, I think the arrays that are already present on device via a previous transformation should be in a trafo_data entry, and DataOffloadTransformation would then just instrument the offload for the remaining arrays. This would keep the current transformation more general.




class FieldAPITransferType(Enum):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice 👌

if not isinstance(param, Array):
continue
try:
parent = arg.parent
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't we do this with a parent = arg.getattr('parent', None) and print the warning if not parent?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but I left it as is now, since getattr does something similar under the hood.

driver.variables += device_ptrs
return device_ptrs

def _devptr_from_array(self, driver, a: sym.Array):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really love the type hints ❤️


def _get_field_ptr_from_view(self, field_view):
type_chain = field_view.name.split('%')
field_type_name = 'F_' + type_chain[-1]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[noaction] this is a sensible default, which we might want to make overridable per derived-type.

change_map = {}
offload_idx_expr = driver.variable_map[self.offload_index]
for arg, devptr in chain(offload_map.in_pairs, offload_map.inout_pairs, offload_map.out_pairs):
dims = (sym.RangeIndex((None, None)),) * (len(devptr.shape)-1) + (offload_idx_expr,)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we not clone arg.dims here and add the offload_idx_expr to it? We don't always want to pass the full array, we might for example pass FIELD_PTR(:,1,IBL) if the field is 3D but the dummy argument is 2D. These are probably mistakes that we should fix in source, but the transformation should nevertheless support this edge-case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch! I had missed that. Fixed it now.

Scope of the created :any:`CallStatement`
"""

procedure_name = 'SYNC_HOST_RDWR'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[noaction] we might want to mirror the suffix logic of field_get_device_data here.

@wertysas wertysas force-pushed the je-field-api-offload-index-from-config branch from 45c9e20 to ac0aa75 Compare November 20, 2024 12:56
@wertysas wertysas requested a review from awnawab November 20, 2024 13:45
Copy link
Contributor

@awnawab awnawab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for addressing the changes so promptly 🙏 looks great 👌

@mlange05 mlange05 added the ready for merge This PR has been approved and is ready to be merged label Nov 20, 2024
@mlange05 mlange05 merged commit ebc7b69 into main Nov 20, 2024
13 checks passed
@mlange05 mlange05 deleted the je-field-api-offload-index-from-config branch November 20, 2024 15:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready for merge This PR has been approved and is ready to be merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants