-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Registering halfway products #31
Comments
I agree, I think it is definitely an avenue worth exploring, Definitely worth exploring, with some examples.
Non-trivial dependency chains -- like BinDeps. Registering the halfway files yourself, on your own server (or dropbox or something), as a second datadep is an option, but then as you say synchronizing them becomes an issue. Also it should be noted DataDeps.jl doesn't know anything about files, except between the |
DataDeps works best when the dependent data is static. It know nothing about what generated the data it fetches. I think that such a "limitation" is totally acceptable in this scenario as well. So a dependency on some halfway files doesn't need to know if their dependent data changed. We can leave that responsibility to the user. We can add a convenience As to the implementation, maybe I should post a call on Discourse? |
A call for what? |
A call for help with the actual implementation :) My experience is that sometimes people get very excited by an idea for a package and the package gets quickly built thereafter. Because I my julia-fu is probably not good enough I won't be able to do much except some pointed help here and there. I assume you are probably very busy... |
Not too busy to maintain my own packages, no. |
Awesome. As an academic I'm completely convinced this package is super useful. And if you feel you can bake in the halfway deps functionality into this one, then awesome! I guess the most common use case is that the halfway files are stored locally: the product of the analysis resides on the computer where the analysis occurs. Maybe not always, but at least in a large proportion of the cases. So it would make sense to register the files as local. I guess we need a test case to see how it would look. And I guess we need that |
Just so there is an example of the extent that is currently possible.
|
That is mega cool. Sorry for being slow. |
So I'm using this wonderful package, and that saves me the need to redownload stuff every time I want to reevaluate things. Great. But this got me thinking:
Usually we have this static dataset we want to process. Most often this means there will be some processed data files that result from this initial processing. We then want to do some analysis on those processed files. But it's irritating to have to take care of those halfway processing products. It would be amazing if there could be a way to register these midway processed files, so that next time we need them we won't need to recalculate them.
I think all the facilities are already here (e.g. supplying an alternative download method, one that process the files), but I would appreciate an example made just for this use case and I'll argue that many people would love this functionality just as much as the intended use of this awesome package.
One glaring problem is the need to have something similar to a make file, to check if the source files are newer than the ones the processed files rely on...
The text was updated successfully, but these errors were encountered: