Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

velocity to strain rate with gauge length support #399

Merged
merged 13 commits into from
Jun 18, 2024
Merged

Conversation

ahmadtourei
Copy link
Collaborator

@ahmadtourei ahmadtourei commented Jun 8, 2024

Description

This PR addresses #396 by adding a new velocity_to_strain_rate patch function that changes data shape and supports higher gauge lengths.

We plan to combine the two of the velocity_to_strain_rate functions for a more general function in the future.

Checklist

I have (if applicable):

  • referenced the GitHub issue this PR closes.
  • documented the new feature with docstrings or appropriate doc page.
  • included a test. See testing guidelines.
  • your name has been added to the contributors page (docs/contributors.md).
  • added the "ready_for_review" tag once the PR is ready to be reviewed.

Copy link

codecov bot commented Jun 8, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 99.84%. Comparing base (b6edf30) to head (5db19fa).

Additional details and impacted files
@@           Coverage Diff           @@
##           master     #399   +/-   ##
=======================================
  Coverage   99.84%   99.84%           
=======================================
  Files         109      109           
  Lines        8788     8843   +55     
=======================================
+ Hits         8774     8829   +55     
  Misses         14       14           
Flag Coverage Δ
unittests 99.84% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

patch = differentiate.func(patch, dim="distance", order=order)
data = patch.data / distance_step
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@d-chambers can you please double check if we need to divide by gauge length here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't actually need this; numpy.grad takes a dx parameter to properly normalize in that function call. This makes me feel better, I was worried the outputs were scaled wrong 😓.

@d-chambers
Copy link
Contributor

I have an idea for making both order and gauge_length to work together. I will try to find time later this week to think more about it and work on it.

@d-chambers
Copy link
Contributor

Ok @ahmadtourei,

If you have a chance take a look at this. It allows Patch.differentiate to use a step parameter. This then decimates the array a certain number of times, applies the central difference operator to each decimated array, and re-assembles everything at the end. velocity_to_strain_rate can then simply use the step argument now to get multiple gauge lengths.

I think it is probably worth keeping velocity_to_strain_rate_fd since it fundamentally does something a little bit different, but the outputs of both functions should be similar (expect small differences in the array shapes).

@d-chambers
Copy link
Contributor

closes #399

Comment on lines +99 to +101
eg: an array of [a b c d e] uses b and d to calculate diff of c when
step = 1 and order = 2. When step = 2, a and e are used to calcualte
diff at c.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, when step=1 and order=2 (same as how the Patch.velocity_to_strain_rate function used to work before), does it mean technically gauge_length = 2 * distance step?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are right. The gauge length should be 2*dx in the original function.

@ahmadtourei
Copy link
Collaborator Author

Ok @ahmadtourei,

If you have a chance take a look at this. It allows Patch.differentiate to use a step parameter. This then decimates the array a certain number of times, applies the central difference operator to each decimated array, and re-assembles everything at the end. velocity_to_strain_rate can then simply use the step argument now to get multiple gauge lengths.

I think it is probably worth keeping velocity_to_strain_rate_fd since it fundamentally does something a little bit different, but the outputs of both functions should be similar (expect small differences in the array shapes).

It looks good to me but it seems the data outputs of both functions are not similar. I even noticed some changes in polarity for the same channels of the resulting array from each method. Here I'm including a test (that currently breaks) for further investigation.

    def test_compare_two_funcs(self, terra15_das_patch):
        g_m = 1
        out1 = terra15_das_patch.velocity_to_strain_rate_cd(gauge_multiple=g_m)
        out2 = terra15_das_patch.velocity_to_strain_rate(gauge_multiple=g_m)
        dist1 = int(np.floor(g_m / 2))
        dist2 = -int(np.ceil(g_m / 2))
        out2_selected = out2.select(distance=(dist1, dist2), samples=True)
        assert np.allclose(out1.data, out2_selected.data, rtol=1e-03, atol=1e-05)

Also, I think we need to change the name of the second function as it actually does a central differentiation. The reference below illustrates what I implemented in the second function (eq. 2):
https://doi.org/10.1093/gji/ggae055

Therefore, I agree we need to keep both functions.

@d-chambers
Copy link
Contributor

it seems the data outputs of both functions are not similar.

Yes, that is odd. Looking at the second function more closely, it should be doing exactly what the first function does when gauge_multiple = 1 and order =2, except the edges are handled differently as we already discussed, I will look into it more closely.

@d-chambers
Copy link
Contributor

I have some insight.

Consider this simple grid of numbers turned into a patch with distance_step = 1 where distance corresponds to axis 1.

[ [8, 6, 7, 5],
[3, 0, 9, 8],
[5, 4, 4, 4],
[9, 1, 1, 4]],

velocity_to_strain_rate (first function) returns a patch with:

[[ -3.5 -0.5 -0.5 -3.5]
[ -9. 3. 4. -6. ]
[ -1.5 -0.5 0. 0. ]
[-12. -4. 1.5 4.5]]

velocity_to_strain_rate_fd (second function) returns a patch with:

[[-2. 1. -2.]
[-3. 9. -1.]
[-1. 0. 0.]
[-8. 0. 3.]]

The first function is estimating the strain at a point x using the neighboring points:

$f'(x) = \frac{f(x + dx) - f(x - dx)}{2dx}$

the second function is estimating the strain between two points:

$f'(x) = \frac{f(x + dx/2) - f(x - dx/2))}{dx}$

so naturally there will be some differences. When given a patch made of a nicely behaved analytical function, like the test I just added, they produce the same (expected) results.

As for renaming the second function, you are right. Tentatively, I have called it staggered_velocity_to_strain_rate, but feel free to rename if you think of something better.

@d-chambers
Copy link
Contributor

I also renamed gauge_multiple to step_multiple and deprecated gauge_multiple in the first function (to not break any existing codes).

@ahmadtourei
Copy link
Collaborator Author

ahmadtourei commented Jun 13, 2024

The first function is estimating the strain at a point x using the neighboring points:

Isn't this the same as the case when the gauge length is actually 2 * channel spacing? It's a bit confusing because we have the gauge_multiple=1 but technically averaging strain on 2 * channel spacing. With the current first function, there is no way of getting what the second function results with step_multiple=1.

Also, it seems order has to be an even number so it might be worth updating the docs.

@d-chambers
Copy link
Contributor

Isn't this the same as the case when the gauge length is actually 2 * channel spacing? It's a bit confusing because we have the gauge_multiple=1 but technically averaging strain on 2 * channel spacing. With the current first function, there is no way of getting what the second function results with step_multiple=1.

I think what should be happening when step_multiple = 2 is this for function 1:

$f'(x) = \frac{f(x +2dx) - f(x - 2dx)}{4dx}$

and this for function 2:

$f'(x) = \frac{f(x + 3dx/2) - f(x - 3dx/2))}{3dx}$

Consider this evenly spaced array:

[a, b, c, d, e]

When step_multiple = 2, the first function calculates the strain at c via:

(e - a) / 4 dx

The second function doesn't estimate strain at c, it can only estimate strain between c and d via

(e - b) / 3 dx

Does that make sense?

Also, it seems order has to be an even number so it might be worth updating the docs.

Yes, good point.

@ahmadtourei
Copy link
Collaborator Author

and this for function 2:

I think this for function 2 should be (step_multiple=2):
$f'(x) = \frac{f(x + dx) - f(x - dx))}{2dx}$

So, it actually estimates at c:
(d - b) / (2*dx)

However, when step_multiple=1, it estimates values between a and b, b and c, etc. So I get what you mean now. For odd step_multiple values, that is the case.

I think what should be happening when step_multiple = 2 is this for function 1:

Also, to me, this looks like a step_multiple of 4 as we are averaging over 4*dx (not considering the order effect)

@ahmadtourei
Copy link
Collaborator Author

I think what should be happening when step_multiple = 2 is this for function 1:

So basically, function 1 with step_multiple=2 and order=2 should result same as function 2 with step_multiple=4. Right?

@d-chambers
Copy link
Contributor

However, when step_multiple=1, it estimates values between a and b, b and c, etc. So I get what you mean now. For odd step_multiple values, that is the case.

Yes, you are right. Function 2 produced the same output as function 1 when its step_multiple is double (and function 1 uses an order of 2) after accounting for differences in edge handling:

  def test_equal_cases(self, terra15_das_patch):
      for mult in range(1, 20):
         # Get function 1s output and trim off edges
          out1 = (
              terra15_das_patch
              .velocity_to_strain_rate(step_multiple=mult)
              .select(distance=(mult, -mult), samples=True)
          )
          # function 2's output should match function 1 when its step is double 
          out2 = terra15_das_patch.staggered_velocity_to_strain_rate(step_multiple=2*mult)
          assert np.allclose(out1.data, out2.data)

Summary

  1. For function 1, step_multiple means distance from the center point, for function 2 it means distance between start and last point used in the estimate

  2. Function 2 calculates a staggered grid (or the strain for values between points) when the step_multiple is odd. Function 1 cant do this.

  3. Function 1 returns a patch of exactly the same shape as the input. Function 2 is always step_multiple shorter along the distance axis.

  4. Judging by how many posts it took us to sort this out, the behavior of these two functions would be too confusing for users. We should try to simplify this.

@d-chambers
Copy link
Contributor

Here are my thoughts on how to move this forward. First, reflecting on what users might want and future DASCore plans:

  1. Being able to calculate the strain between points is nice, and probably what most other DAS codes do. When step_multiple==1, this gives you the smallest possible gauge length, which could be desirable for several reasons. However, the staggered grid is probably less useful in the other cases. For example, the data wont be that different when step_multiple = 7 vs 6 or 8. It is probably better to just use a gauge length close to an even number of step multiples (in function 2's parlance) so the coords don't shift and the array shape doesn't change.

  2. As you rightly pointed out, an adjustable step_multiple is an important feature and something we didn't have before this PR.

  3. I would like to avoid changing the array shape when possible. Down the road, I hope to be able to make many patch functions jit'able with Jax for potential gains in performance. However, any function with dynamically-sized arrays can't be compiled.

So here is what I propose:

  • Clarify in function 1 that the new gauge_length is step_multiple * 2
  • Clarify that order must be a multiple of 2
  • Let's use function 2, currently named staggered_velocity_to_strain_rate only for the in-between case of step_multiple = 1 by removing the step_multiple parameter. For larger step multiples and orders we will point users to function_1.
  • Let's add a recipe or note that explores the effect of different step_sizes and orders a bit more. We may also need to add an event stored in velocity format to the example data repo.

What do you think?

@ahmadtourei
Copy link
Collaborator Author

ahmadtourei commented Jun 17, 2024

Here are my thoughts on how to move this forward. First, reflecting on what users might want and future DASCore plans:

  1. Being able to calculate the strain between points is nice, and probably what most other DAS codes do. When step_multiple==1, this gives you the smallest possible gauge length, which could be desirable for several reasons. However, the staggered grid is probably less useful in the other cases. For example, the data wont be that different when step_multiple = 7 vs 6 or 8. It is probably better to just use a gauge length close to an even number of step multiples (in function 2's parlance) so the coords don't shift and the array shape doesn't change.
  2. As you rightly pointed out, an adjustable step_multiple is an important feature and something we didn't have before this PR.
  3. I would like to avoid changing the array shape when possible. Down the road, I hope to be able to make many patch functions jit'able with Jax for potential gains in performance. However, any function with dynamically-sized arrays can't be compiled.

So here is what I propose:

  • Clarify in function 1 that the new gauge_length is step_multiple * 2
  • Clarify that order must be a multiple of 2
  • Let's use function 2, currently named staggered_velocity_to_strain_rate only for the in-between case of step_multiple = 1 by removing the step_multiple parameter. For larger step multiples and orders we will point users to function_1.
  • Let's add a recipe or note that explores the effect of different step_sizes and orders a bit more. We may also need to add an event stored in velocity format to the example data repo.

What do you think?

Thanks for the discussion, Derrick! As we discussed today, let's keep step_multiple adjustable for function 2 as the functions have different behaviors on the edges. I also agree that it'd be beneficial to add notes/recipes.

@d-chambers
Copy link
Contributor

Thanks @ahmadtourei.

This findiff documentation page might be helpful for better understanding the edge stencil implementations.

@d-chambers
Copy link
Contributor

Thanks @ahmadtourei for all your work on this.

@d-chambers d-chambers merged commit 8522827 into master Jun 18, 2024
15 checks passed
@d-chambers d-chambers deleted the vel_to_sr branch June 18, 2024 16:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants