-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Brittle coop #249
Brittle coop #249
Conversation
@@ -1,2 +1,3 @@ | |||
hash: 59795c7efcc7526f0462cdac07698f21 | |||
hash: 7566d6ac6bfe10e47a9a3a90065e827e | |||
freshened: 2021-12-10 05:44:22 -0600 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
going to scdel()
this and rename from .ind to .tind (time-indicator) to avoid confusion with the extra information
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Never mind, that isn't possible because the various scipiper
functions check that it has a valid name for an indicator file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modified this with an updated commit
Nice fixes to some questionable choices I made 🙈! Just a few comments -- I still don't understand why you need the always stale trigger in both targets? (but I see your note, so trust you've thought it through) |
@@ -15,7 +15,7 @@ parse_mendota_daily_buoy <- function(inind, outind) { | |||
sc_indicate(ind_file = outind, data_file = outfile) | |||
} | |||
|
|||
parse_long_term_ntl <- function(inind, outind) { | |||
parse_ntl29_v10_0 <- function(inind, outind) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will impact #247 open PR if it re-pulls google drive files
|
||
coop_wants: | ||
command: filter_coop_all('6_temp_coop_fetch/out/coop_all_files.rds.ind') | ||
command: filter_coop_all('6_temp_coop_fetch/out/coop_all_files.rds.ind', | ||
trigger_file = '1_crosswalk_fetch/out/always_stale_time.txt') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you need this trigger when the target is relying on 6_temp_coop_fetch/out/coop_all_files.rds.ind
which also has a always stale trigger?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I can tell, this is related to the way remake is tracking build changes while things are executing. If we only have one target in the dependency chain that has the trigger file, it oddly is not considered stale the second build (even though you would expect it to, since it has secretly modified the trigger_file as a side-effect of the build). I can't remember exactly what I determined here - perhaps if you don't have two trigger_file-associated targets, you only get rebuilds every other time, which is frustrating and confusing. So my solution is to always pair them when used as this is reliable 100% of the time.
|
||
# do we trigger a rebuild if the file contents changes? not just the names? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm, I think we were only assuming file name changes (or file adds) would trigger rebuilds.
6_temp_coop_fetch/in/Winnie_temp_2_2009.xlsx.ind: c19ce966b9c5a6072c0cf1d1516dd2f2 | ||
6_temp_coop_fetch/in/DNRdatarequest_Secchi_DO_and_Temp_1083_2016_AllLakes_JL.xlsx.ind: 22c8cc6fc3af2311c2d90edf9940fedb |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did some of these just change order? I see the same hash. Could add some sort function to the files_in_drive
function that does the GD inventory?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I created this scipiper issue and I think the ordering that matters here is in sc_indicate
instead of the result from drive_ls
, since this .ind
file is the result of the task table and not of the GD inventory function.
But yes, you are seeing the re-ordering, not a new hash. It is distracting to see diffs when the hashes aren't changing. I'm with ya
(difftime(Sys.time(), as.POSIXct(ind_data$freshened), units = 'mins')) > trigger_wait){ | ||
gd_freshen_dir(out_ind, gd_path) | ||
} else { | ||
message('skipping google drive index because it still smells fresh') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ha! love it.
- 6_temp_coop_fetch/log/6_temp_coop_fetch_tasks.ind | ||
|
||
# -- download LTER files and push to GD -- # | ||
coop_file_upload_location: | ||
command: as_id(I('1dutCiFEOoRObXLn6n3BRKhZDDAijK65Q')) | ||
|
||
6_temp_coop_fetch/downloads/long_term_ntl.csv.ind: | ||
command: download_and_indicate(in_ind = target_name, | ||
url = I("https://lter.limnology.wisc.edu/sites/default/files/data/ntl29_v5.csv"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the download URL still being documented somewhere? So now it's on google drive in the "in" folder, so that's our new canonical for this data?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was originally going into that in
automatically too, since in the download_and_indicate()
function:
googledrive::drive_upload(media = local_file, path = push_location)
Where push_location
is the google drive location for the in
folder. So this was going into downloads
and into in
. in
is where it became part of the coop parsers and I think the downloads
folder is ignored by this pipeline.
Is the download URL still being documented somewhere?
yes, I added an explainer file for this target.
Merge branch 'main' of github.com:USGS-R/lake-temperature-model-prep into brittle_coop # Conflicts: # 7a_temp_coop_munge/src/parsing_task_fxns.R
Getting this PR up because it has ind/build changes.
Will add more context later this am.But is tied to #231 and things @padilla410 discovered in #248update
The focus of this PR is to make working on new temperature data sources more developer-friendly, by clearing out several of the gotchas in target connections and triggers that in hindsight, explain a number of mystery builds from the past.
coop_wants
target that relies on the index ind. It is necessary to pair targets to use the trigger file, otherwise it doesn't work correctly (this explains whycoop_wants
uses it too)freshened: 2021-12-10 05:44:22 -0600
) and it won't re-index google drive unless the last build was less thantrigger_wait
minutes ago (I set this to 30). This seems a little questionable, but I feel confident it is a major improvement over what was happening in the pastin
folder anyhow and we don't need to re-download them. I did go ahead and update one of the files since a newer version was available, so that explains the file name and parser change in hereNULL
as default on the trigger file targets, so leaving out that arg will skip the trigger stale step in case you want to avoid this when developing.