Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exposure pipeline #4

Open
wants to merge 11 commits into
base: main
Choose a base branch
from
Open

Exposure pipeline #4

wants to merge 11 commits into from

Conversation

t-downing
Copy link
Collaborator

Pipeline to calculate flood exposure using Worldpop and Floodscan. Basic methodology is to:

  1. Calculate exposure raster (with src.datasources.floodscan.calculate_recent_flood_exposure_rasters())
    1. Filter Floodscan raster to ≥ 0.05 (i.e. only keep pixels with more than 5% flooding, to reduce noise)
    2. Interpolate Floodscan raster to Worldpop grid
    3. Multiply Floodscan raster by Worldpop raster to get exposure raster
  2. Take raster stats (currently just sum, with src.datasources.floodscan.calculate_recent_flood_exposure_rasterstats())
    1. Iterate over admin2s and clip raster to calculate sum

Only things that need looking at are the actual functions used by the pipeline (i.e. what is outlined above). There are a couple of notebooks that may be of interest, whose functionality hasn't yet been integrated into either the pipeline or the app:

  • exposure_plotting: the first two plots have already been integrated into the app, but the admin bounds ones haven't been yet. I think these could be pretty useful for picking out where specifically flooding is high.
  • floodscan_historical: just calculating the 1998-2023 flood exposure using the .nc in the Google Drive (faster than stacking up all the historical COGs), which needs to be done whenever a new country is added (takes about two hours).

@hannahker
Copy link
Collaborator

@t-downing what kind of feedback do you think makes the most sense here? There's a convo to be had about what we want our "production" setup to be, but I think that's probably best had elsewhere.

@t-downing
Copy link
Collaborator Author

@hannahker yes good question, I should have specified. I think here it would be good just to agree on the core methods for:

  1. calculating the exposure rasters
  2. taking the raster stats of the exposure

Let me just point out where exactly I'm talking about.

Comment on lines +97 to +101
# filter to only pixels with flood extent > 5% to reduce noise
ds_recent_filtered = ds_recent.where(ds_recent >= 0.05)
# interpolate to Worldpop grid and
# multiply by population to get exposure
exposure = ds_recent_filtered.interp_like(pop, method="nearest") * pop
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is where we are actually calculating the exposure rasters. I am unsure whether it makes sense to only take pixels with flood extent ≥ 5%. I think we lose a fair amount of information this way, and I'm not sure we really benefit from the reducing the noise.

We also may want to think about whether to multiply by the relevant population raster for that year. Also, which population raster to use? There are several options on WorldPop (of which we're using the 2020_1km_Aggregated_UNadj). There is also GHSL.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a lot of background in working with the Floodscan data, but I'd lean more towards removing that 5% threshold. I think we're already doing a lot of smoothing, so this seems a little bit over cautious to remove noise. We'd also want to be able to justify why we picked that 5% number, which seems slightly arbitrary.

Comment on lines +222 to +235
for pcode, row in tqdm(
adm.set_index("ADM2_PCODE").iterrows(), total=len(adm)
):
da_clip = ds_exp_recent.rio.clip([row.geometry])
dff = (
da_clip.sum(dim=["x", "y"])
.to_dataframe(name="total_exposed")["total_exposed"]
.astype(int)
.reset_index()
)
dff["ADM2_PCODE"] = pcode
dfs.append(dff)

df_exp_adm_new = pd.concat(dfs, ignore_index=True)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And here is where we're calculating the raster stats (in this case, just sum). But I guess we should replace this with whatever standard method we are using in ds-raster-stats

Copy link
Collaborator

@hannahker hannahker Nov 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would we want to also upsample the raster as we're taking the raster stats here? I know we're going to the 1km worldpop grid, but if we're calculating to the admin 2 level we might want to go a bit more granular.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants