Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cooperator data munge in support of expansion to MGLP footprint #45

Closed
8 tasks
limnoliver opened this issue Apr 30, 2019 · 7 comments
Closed
8 tasks

cooperator data munge in support of expansion to MGLP footprint #45

limnoliver opened this issue Apr 30, 2019 · 7 comments

Comments

@limnoliver
Copy link
Contributor

limnoliver commented Apr 30, 2019

A list of munging tasks that need to be done to incorporate cooperator data in the expanded footprint (part of #33).

SOUTH DAKOTA

  • Crosswalk between sites in file SD_Lake_temp_export.xlsx and our IDs (in this case, MGLP IDs). There is lat/long and good metadata for each lake to do some quality checking.

  • Track down depth data for SD_Lake_temp_export.xlsx. There are zero values for maximum depth for some lakes in tab Metadata_Max_Depth. I found a few depth resources for SD: a thesis and a USGS report. Or might be able to get some site metadata from the org's data portal.

INDIANA

  • Crosswalk between sites and our IDs (in this case, MGLP IDs). This is likely best done using lake metadata in file Indiana CLP lake data 1994-2013. In this file, there is lat/long and county info to help ID water bodies. This should cover many lakes in file Indiana_Glacial_Lakes_WQ_IN_DNR and Indiana_GlacialLakes_TempDOprofiles_5.6.13.

  • Verify data lakes have depth info. Depth data are included in Indiana CLP lake data 1994-2013.

OTHER

  • Additional cooperator files are currently not being incorporated. These include anything in a sub-folder here. Note that many of these lakes already have data (e.g., the PCA subfolder and Ten Mile Lake subfolder) but in some cases, I know these will add to the record (e.g., Ten Mile missing data in the early 2000s that will be covered by these files).

  • File Water_Temp.accdb has negative depths that have not been resolved. The negative depths, according to the explainer file, are Water temperature sensor depth. This number is positive if measurement is taken from below the water surface and negative if measurement is taken from lake bottom. I confirmed that we cannot easily find what depth from surface this is. That is, -41m is 41 meters from the bottom, and there are no metadata to support figuring out what the site depth is.

  • The files that end in historicalfiles_manual have some clues about sites in the "notes" column. However, some info is not site related. Keywords for sites include c('basin', 'bay', 'end', 'station', 'WQ') and likely others. Incorporating this info would help to resolve multiple lake-date-depth values. For now, we assume that multiple lake-date-depth values from these data sources are multiple sites.

  • The mendota_temps_long file has site information in column Loc. Additionally, there is an observer column, and in some instances, multiple values per date-depth in Mendota have different observers but NA in the Loc column (I would assume these to be different locations).

@jordansread
Copy link

Was there also some unprocessed LAGOS temperature data available for this expansion too @limnoliver ?

@limnoliver
Copy link
Contributor Author

Yes, see this subfolder. Looks like there is Illinois, Indiana, and Dakotas data included.

@jordansread
Copy link

Indeed. likely going to require some oversight to prioritize those files. I took a quick look at the "ChemistryResults" file for SD. Looks like AU_IDs correspond to this set of lake sampling tiers. From that you can get names and counties. From my very light pass, it seems that SD is pretty good about putting these data into the WQP (I looked at SD-BS-L-PICKEREL_01 which is nhdhr_145088870 in our dataset), but there are more recent data from 2014 and 2015 that aren't in the WQP but are in this file.

I'm guessing it is going to be a state-specific and potentially monitoring-program-specific decision to dig into the effort of matching these state IDs with NHDHR IDs, since we'll want to be at least somewhat confident that we'll gain new data we don't already have from WQP pulls.

@jordansread
Copy link

From our current "status map", it looks like Iowa is really light on temperature data

image

And I don't see any IA data in the LAGOS folders :(

@jordansread
Copy link

jordansread commented Jun 30, 2021

We now have ~400 lakes in the Iowa temperature data queue for parsing, see here.

@jordansread
Copy link

@padilla410
Copy link
Contributor

padilla410 commented Apr 18, 2022

**This comment is a copy/paste of the initial issue. I then respond to each bullet point in italics. At the end of my commentary there are some action items that need review from @lindsayplatt and/or @jread-usgs **

A list of munging tasks that need to be done to incorporate cooperator data in the expanded footprint (part of #33).

SOUTH DAKOTA

  • Crosswalk between sites in file SD_Lake_temp_export.xlsx and our IDs (in this case, MGLP IDs). There is lat/long and good metadata for each lake to do some quality checking.

  • Track down depth data for SD_Lake_temp_export.xlsx. There are zero values for maximum depth for some lakes in tab Metadata_Max_Depth. I found a few depth resources for SD: a thesis and a USGS report. Or might be able to get some site metadata from the org's data portal.

    • There is temp and depth data included in the SD_Lake_temp_export.xlsx data set and in 7a_temp_coop_munge/tmp/SD_Lake_temp_export.rds. I would advocate against running down more depth data.

INDIANA

  • Crosswalk between sites and our IDs (in this case, MGLP IDs). This is likely best done using lake metadata in file Indiana CLP lake data 1994-2013. In this file, there is lat/long and county info to help ID water bodies. This should cover many lakes in file Indiana_Glacial_Lakes_WQ_IN_DNR and Indiana_GlacialLakes_TempDOprofiles_5.6.13.

  • Verify data lakes have depth info. Depth data are included in Indiana CLP lake data 1994-2013.

    • There is temp and depth data included in the Indiana_Glacial_Lakes_WQ_IN_DNR data set and in 7a_temp_coop_munge/tmp/Indiana_CLP_lakedata_1994_2013.rds. There is also depth data in 7a_temp_coop_munge/tmp/Indiana_Glacial_Lakes_WQ_IN_DNR.rds. I would advocate against running down more depth data.

OTHER

  • Additional cooperator files are currently not being incorporated. These include anything in a sub-folder here. Note that many of these lakes already have data (e.g., the PCA subfolder and Ten Mile Lake subfolder) but in some cases, I know these will add to the record (e.g., Ten Mile missing data in the early 2000s that will be covered by these files).

  • File Water_Temp.accdb has negative depths that have not been resolved. The negative depths, according to the explainer file, are Water temperature sensor depth. This number is positive if measurement is taken from below the water surface and negative if measurement is taken from lake bottom. I confirmed that we cannot easily find what depth from surface this is. That is, -41m is 41 meters from the bottom, and there are no metadata to support figuring out what the site depth is.

    • This is still unresolved. These values likely drop out of the pipeline downstream of 7a_temp_coop_munge. I can run this down if Lindsay and Jordan would like - it looks like it is not an easy fix, as mentioned above
  • The files that end in historicalfiles_manual have some clues about sites in the "notes" column. However, some info is not site related. Keywords for sites include c('basin', 'bay', 'end', 'station', 'WQ') and likely others. Incorporating this info would help to resolve multiple lake-date-depth values. For now, we assume that multiple lake-date-depth values from these data sources are multiple sites.

    • This doesn't feel like there is a clear action to move forward on. Spot checking these data sets, they do make it into all_coop_dat_linked.feather. I'm inclined to let this one lie.
  • The mendota_temps_long file has site information in column Loc. Additionally, there is an observer column, and in some instances, multiple values per date-depth in Mendota have different observers but NA in the Loc column (I would assume these to be different locations).

  • This doesn't feel like there is a clear action to move forward on. The data set in question does have a parser and does make it into all_coop_dat_linked.feather

Decisions that I need feedback on:

The decisions outlined above were decided as part of the 4/21 Sprint Planning meeting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants