-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cooperator data munge in support of expansion to MGLP footprint #45
Comments
Was there also some unprocessed LAGOS temperature data available for this expansion too @limnoliver ? |
Yes, see this subfolder. Looks like there is Illinois, Indiana, and Dakotas data included. |
Indeed. likely going to require some oversight to prioritize those files. I took a quick look at the "ChemistryResults" file for SD. Looks like AU_IDs correspond to this set of lake sampling tiers. From that you can get names and counties. From my very light pass, it seems that SD is pretty good about putting these data into the WQP (I looked at SD-BS-L-PICKEREL_01 which is nhdhr_145088870 in our dataset), but there are more recent data from 2014 and 2015 that aren't in the WQP but are in this file. I'm guessing it is going to be a state-specific and potentially monitoring-program-specific decision to dig into the effort of matching these state IDs with NHDHR IDs, since we'll want to be at least somewhat confident that we'll gain new data we don't already have from WQP pulls. |
We now have ~400 lakes in the Iowa temperature data queue for parsing, see here. |
**This comment is a copy/paste of the initial issue. I then respond to each bullet point in italics. At the end of my commentary there are some action items that need review from @lindsayplatt and/or @jread-usgs ** A list of munging tasks that need to be done to incorporate cooperator data in the expanded footprint (part of #33). SOUTH DAKOTA
INDIANA
OTHER
Decisions that I need feedback on:
The decisions outlined above were decided as part of the 4/21 Sprint Planning meeting. |
A list of munging tasks that need to be done to incorporate cooperator data in the expanded footprint (part of #33).
SOUTH DAKOTA
Crosswalk between sites in file
SD_Lake_temp_export.xlsx
and our IDs (in this case, MGLP IDs). There is lat/long and good metadata for each lake to do some quality checking.Track down depth data for
SD_Lake_temp_export.xlsx
. There are zero values for maximum depth for some lakes in tabMetadata_Max_Depth
. I found a few depth resources for SD: a thesis and a USGS report. Or might be able to get some site metadata from the org's data portal.INDIANA
Crosswalk between sites and our IDs (in this case, MGLP IDs). This is likely best done using lake metadata in file
Indiana CLP lake data 1994-2013
. In this file, there is lat/long and county info to help ID water bodies. This should cover many lakes in fileIndiana_Glacial_Lakes_WQ_IN_DNR
andIndiana_GlacialLakes_TempDOprofiles_5.6.13
.Verify data lakes have depth info. Depth data are included in
Indiana CLP lake data 1994-2013
.OTHER
Additional cooperator files are currently not being incorporated. These include anything in a sub-folder here. Note that many of these lakes already have data (e.g., the PCA subfolder and Ten Mile Lake subfolder) but in some cases, I know these will add to the record (e.g., Ten Mile missing data in the early 2000s that will be covered by these files).
File
Water_Temp.accdb
has negative depths that have not been resolved. The negative depths, according to the explainer file, areWater temperature sensor depth. This number is positive if measurement is taken from below the water surface and negative if measurement is taken from lake bottom.
I confirmed that we cannot easily find what depth from surface this is. That is, -41m is 41 meters from the bottom, and there are no metadata to support figuring out what the site depth is.The files that end in
historicalfiles_manual
have some clues about sites in the "notes" column. However, some info is not site related. Keywords for sites includec('basin', 'bay', 'end', 'station', 'WQ')
and likely others. Incorporating this info would help to resolve multiple lake-date-depth values. For now, we assume that multiple lake-date-depth values from these data sources are multiple sites.The
mendota_temps_long
file has site information in columnLoc
. Additionally, there is an observer column, and in some instances, multiple values per date-depth in Mendota have different observers but NA in the Loc column (I would assume these to be different locations).The text was updated successfully, but these errors were encountered: