-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New transformation 'LowerConstantArrayIndices' to allow to … #348
Conversation
Documentation for this branch can be viewed at https://sites.ecmwf.int/docs/loki/348/index.html |
3751695
to
10d932f
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #348 +/- ##
==========================================
+ Coverage 95.39% 95.41% +0.02%
==========================================
Files 177 177
Lines 37094 37332 +238
==========================================
+ Hits 35385 35622 +237
- Misses 1709 1710 +1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this looks quite promising and useful.
I think a current limitation is that this allows replacing only one index and could be easily extended to an arbitrary number - I've left a suggestion how this could be tackled, hope it's clear.
Otherwise the transformation code itself is a bit Spaghetti-like and could benefit from breaking down into a few logical steps to improve clarity and readability.
subroutine driver(...) | ||
real, intent(inout) :: var(nlon,nlev,5,nb) | ||
do ibl=1,10 | ||
call kernel(var(:, :, :, ibl), var(:, :, :, ibl)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have potential issues with aliasing here, or rather compilers not being able to figure out that there isn't any? I'm wondering wether we could avoid passing the same array multiple times and instead rename the argument then on kernel side?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, that is possible. However, this makes the transformation more invasive and the question arises whether we want to do that within this transformation or introduce a separate utility/transformation to do that. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, good point, that could indeed be a separate transformation pass! In that case, disregard this here.
inline_constant_parameters(routine, external_only=self.inline_external_only) | ||
self.process(routine, targets) | ||
|
||
def process(self, routine, targets): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is pretty lengthy code and hard to digest. Can we break this down a bit, e.g., like this?
def process_caller():
dispatched_routines = set()
for call in ...:
for routine_arg, call_arg in ...:
# Create argument mappings
self.update_caller_arg(updated_call_args, ...)
if call.routine.name not in dispatched_routines:
self.update_callee_arg(routine_vmap, ...)
# Apply mapping to callee
if call.routine.name not in dispatched_routines:
dispatched_routines.add(call.routine.name)
self.process_callee(call.routine, routine_vmap)
# Update the call
self.update_call(...)
insert_dims = tuple((i, dim) for i, dim in enumerate(call_arg.dimensions) if self.is_constant_dim(dim)) | ||
if insert_dims: | ||
for insert_dim in insert_dims: | ||
new_dims = list(call_arg.dimensions) | ||
new_dims[insert_dim[0]] = sym.RangeIndex((None, None)) | ||
updated_call_args[call_arg.name] = call_arg.clone(dimensions=as_tuple(new_dims)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this would be a bit shorter and more readable:
insert_dims = tuple((i, dim) for i, dim in enumerate(call_arg.dimensions) if self.is_constant_dim(dim)) | |
if insert_dims: | |
for insert_dim in insert_dims: | |
new_dims = list(call_arg.dimensions) | |
new_dims[insert_dim[0]] = sym.RangeIndex((None, None)) | |
updated_call_args[call_arg.name] = call_arg.clone(dimensions=as_tuple(new_dims)) | |
if any(self.is_constant_dim(d) for d in call_arg.dimensions): | |
new_dims = [sym.RangeIndex((None, None)) if self.is_constant_dim(d) else d for d in call_arg.dimensions] | |
updated_call_args[call_arg.name] = call_arg.clone(dimensions=as_tuple(new_dims)) |
and then re-write the bit for the routine argument similarly.
new_dims = list(routine_arg.dimensions) | ||
# dimension is a constant RangeIndex, e.g., '1:3' | ||
if isinstance(insert_dim[1], sym.RangeIndex): | ||
introduce_offset[routine_arg.name] = (insert_dim[0], insert_dim[1].children[0]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This means the transformation only works if there is a single index to be replaced, right? Couldn't we just use a generic map with offsets, i.e., if we have something like
call routine(..., var(:, 1, :, 2:3, ibl), ...)
...
subroutine routine(..., arg, ...)
... :: arg(:, :, :)
that would be transformed to
call routine(..., var(:, :, :, :, ibl), ...)
...
subroutine routine(..., arg, ...)
... :: arg(:, :, :, :)
then have the corresponding offset_map
as something like:
offset_map[arg] = {
0: (0, None), # same index as before, no offset
1: (None, 1), # New index, offset 1
2: (1, None), # Used to be position 1, no offset
3: (2, 2), # Used to be position 2, Offset 2
}
I hope it is clear what I mean here
0d83abd
to
6e779ee
Compare
Refactored and improved. With this we can transform something funny like: subroutine driver(nlon,nlev,nb,var)
use kernel_mod, only: kernel
implicit none
integer, parameter :: param_1 = 1
integer, parameter :: param_2 = 2
integer, parameter :: param_3 = 5
integer, intent(in) :: nlon,nlev,nb
real, intent(inout) :: var(nlon,4,3,nlev,param_3,nb)
integer :: ibl, j
integer :: offset
integer :: some_val
integer :: loop_start, loop_end
loop_start = 2
loop_end = nb
some_val = 0
offset = 1
!$omp test
do ibl=loop_start, loop_end
do j=1,4
call kernel(nlon,nlev,var(:,j,1,:,param_1,ibl), var(:,j,2:3,:,param_2:param_3,ibl), offset, loop_start, loop_end)
call kernel(nlon,nlev,var(:,j,1,:,param_1,ibl), var(:,j,2:3,:,param_2:param_3,ibl), offset, loop_start, loop_end)
end do
enddo
end subroutine driver
subroutine kernel(nlon,nlev,var,another_var,icend,lstart,lend)
use compute_mod, only: compute
implicit none
integer, intent(in) :: nlon,nlev,icend,lstart,lend
real, intent(inout) :: var(nlon,nlev)
real, intent(inout) :: another_var(nlon,2,nlev,4)
integer :: jk, jl, jt
var(:,:) = 0.
do jk = 1,nlev
do jl = 1, nlon
var(jl, jk) = 0.
do jt= 1,4
another_var(jl, 1, jk, jt) = 0.0
end do
end do
end do
call compute(nlon,nlev,var)
call compute(nlon,nlev,var)
end subroutine kernel
subroutine compute(nlon,nlev,var)
implicit none
integer, intent(in) :: nlon,nlev
real, intent(inout) :: var(nlon,nlev)
var(:,:) = 0.
end subroutine compute to SUBROUTINE driver (nlon, nlev, nb, var)
USE kernel_mod, ONLY: kernel
IMPLICIT NONE
INTEGER, INTENT(IN) :: nlon, nlev, nb
REAL, INTENT(INOUT) :: var(nlon, 4, 3, nlev, 5, nb)
INTEGER :: ibl, j
INTEGER :: offset
INTEGER :: some_val
INTEGER :: loop_start, loop_end
loop_start = 2
loop_end = nb
some_val = 0
offset = 1
!$omp test
DO ibl=loop_start,loop_end
DO j=1,4
CALL kernel(nlon, nlev, var(:, j, :, :, :, ibl), var(:, j, :, :, :, ibl), offset, loop_start, loop_end)
CALL kernel(nlon, nlev, var(:, j, :, :, :, ibl), var(:, j, :, :, :, ibl), offset, loop_start, loop_end)
END DO
END DO
END SUBROUTINE driver
SUBROUTINE kernel (nlon, nlev, var, another_var, icend, lstart, lend)
USE compute_mod, ONLY: compute
IMPLICIT NONE
INTEGER, INTENT(IN) :: nlon, nlev, icend, lstart, lend
REAL, INTENT(INOUT) :: var(nlon, 3, nlev, 5)
REAL, INTENT(INOUT) :: another_var(nlon, 3, nlev, 5)
INTEGER :: jk, jl, jt
var(:, 1, :, 1) = 0.
DO jk=1,nlev
DO jl=1,nlon
var(jl, 1, jk, 1) = 0.
DO jt=1,4
another_var(jl, 1 + 2 + -1, jk, jt + 2 + -1) = 0.0
END DO
END DO
END DO
CALL compute(nlon, nlev, var(:, :, :, :))
CALL compute(nlon, nlev, var(:, :, :, :))
END SUBROUTINE kernel
SUBROUTINE compute (nlon, nlev, var)
IMPLICIT NONE
INTEGER, INTENT(IN) :: nlon, nlev
REAL, INTENT(INOUT) :: var(nlon, 3, nlev, 5)
var(:, 1, :, 1) = 0.
END SUBROUTINE compute |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Many!! thanks for these changes, I think that improved the capabilities of the transform and the structure is also much clearer to me now. Good to go from me!
…pass/lower constant array indices to kernel(s) calls
…ow multiple constant array indices for one array argument)
6e779ee
to
80a1a81
Compare
to allow to pass/lower constant array indices to kernel(s) calls
E.g.,
is transformed to:
This allows to use the "normal" driver and kernel for CLOUDSC for e.g., the CUF transformation, see CLOUDSC branch:
nams-lower-constant-array-indices
.