-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable downloading remote dataflow #682
Conversation
…e within the url to get the path avoiding to create confusion on the name of the file.
|
||
fn resolve_dataflow(dataflow: String) -> eyre::Result<PathBuf> { | ||
let dataflow = if source_is_url(&dataflow) { | ||
// try to download the shared library |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- // try to download the shared library
+ // try to download the dataflow YAML file
let target_path = Path::new("build") | ||
.join(node_id.to_string()) | ||
.with_extension(EXE_EXTENSION); | ||
download_file(source, &target_path) | ||
let target_dir = Path::new("build"); | ||
download_file(source, &target_dir) | ||
.await | ||
.wrap_err("failed to download custom node")?; | ||
target_path.clone() | ||
.wrap_err("failed to download custom node")? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you clarify the reason for this breaking change?
By renaming the downloaded files we avoided collisions. For example, consider:
nodes:
- id: camera
path: example.com/camera/node
inputs:
tick: dora/timer/millis/20
outputs:
- image
- id: plot
path: example.com/plot/node
inputs:
image: camera/image
With the new logic, the second download overwrites the first, as both nodes are stored as build/node
. It probably still works as expected on Linux because the executables are directly started, but I could imagine that there will be some errors on Windows (since Windows disallows removing executable files that are still running). Also, this collision would prevent us from adding some kind of caching again in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand you're reasoning.
But, I really don't think we should rename file on the fly as it makes it very hard to keep track of which file correspond to what.
I think that if people name everything "node.py" and colludes with names, I don't think that changing every name is the right solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would very much prefer people use build for caching, in the likes of something like:
- id: plot
build: wget -O plot.py example.com/plot/node
path: node.py
inputs:
image: camera/image
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that if people name everything "node.py" and colludes with names, I don't think that changing every name is the right solution.
The thing is that the two node.py
files might come from different sources. For example, they could be part of different git repositories that are not your own. In this case, you have no way to avoid the name conflict.
But, I really don't think we should rename file on the fly as it makes it very hard to keep track of which file correspond to what.
For the above example, you don't have the source for one of the nodes at all (because it's overwritten).
I would very much prefer people use build for caching, in the likes of something like:
Sure, this is a good alternative. But if we support pointing path
to remote URLs, we should be able to deal with potential name collisions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, this is a good alternative. But if we support pointing path to remote URLs, we should be able to deal with potential name collisions.
Ok, so I guess, what we could do is fail on pre-existing files. That would make the most sense to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The other alternative would be renaming, either always or only on conflict. If we don't want to do that, I agree that failing is better than silently overwriting the files of other nodes.
let target_path = adjust_shared_library_path( | ||
&Path::new("build") | ||
.join(node_id.to_string()) | ||
.join(operator_id.to_string()), | ||
)?; | ||
let target_path = &Path::new("build"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason for this adjust_shared_library_path
was that you can provide cross-platform URLs. E.g. example.com/operator
would become example.com/liboperator.so
on Linux and example.com/operator.dll
on Windows. This is a bit hacky, but without this adjust you need different dataflow files for Windows and Linux.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand, but I honestly find it super hard to keep track about all the rules that we had rather than having simple link.
I would rather prefer user implement their own url resolver for cross platform than choosing a url definition that people might need to remember.
this might resolve some issues as you might also want to deal with architecture (x86 or arm) and I don't think we should expect people to build library for all platform.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let say I want to use the current status quo for something like:
-> libopencv_plot.so
-> I need to push into a server and need to be referenced at name url.com/libopencv_plot.so
any other link will not work.
-> I need to remember to put it in my node as: url.com/opencv_plot
which is not obvious
-> And if there is a bug from me or someone else it's going to be: Something went wrong with build/libplot.so
for example. Which I need to remember links to url.com/opencv_plot
which is my libopencv_plot.so
Knowing that none of this is going to be properly documented in the near future...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Being cross platform could be as simple as:
- id: plot
build: }
wget -O plot.zip example.com/plot/$OSTYPE/node/plot.zip
unzip plot.zip
path: libplot.so
inputs:
image: camera/image
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I fully agree that the status quo solution was hacky and not ideal. Of course, things get simpler if we simply ignore the platform differences. However, if we want dora to be cross-platform, we should provide some way to use this feature for cross-platform dataflows.
Being cross platform could be as simple as:
[...] path: libplot.so [...]
The issue is that the file would need to be named plot.dll
if you want to run it on Windows.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we decided to remove operator support in the near future, so we might not have to deal with this anymore. However, the same things also applies to executables, which need to have an .exe
suffix on Windows.
Would you be fine with example.com/path/node
expanding to example.com/path/node.exe
when run on Windows? Or would you prefer some solution with explicit placeholders, e.g. example.com/path/node{EXE_SUFFIX}
? Or do you want to make this a "one-platform-only" feature that does not do any path adjustments at all (so that you need a separate dataflow_Windows.yml
with adjusted paths)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So in windows you can run an exe without the .exe
suffix so we can always call something like target/release/node
without the exe extension. So I would prefer defaulting to this.
See: https://stackoverflow.com/questions/44441265/calling-an-exe-without-the-extension-but-with-i
FYI, i hope that most binary that we ship for dora can be packaged in a way that is pip
or cargo
installable so that everything is contained in PATH by default and easy to find and deploy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's useful for local paths, but unfortunately not for URLs. We need different binaries on Windows and Linux, so downloading the same file would not help. Also, the downloaded file would probably need an .exe
extension to be recognized as an executable file on Windows, no?
However, even if we added a .exe
extension automatically, we would still have the issue that there are different CPU architectures etc. So placeholders is probably the way to go, e.g. example.com/path/{target}/node
, which would expand to example.com/path/x86_64-unknown-linux-gnu/node
on an x86_64
Linux system. Alternatively, we could of course support separate URLs for each target, e.g.:
path:
x86_64-unknown-linux-gnu: https://example.com/path/x86/linux/node
x86_64-pc-windows-msvc: https://example.com/path/x86/win/node.exe
(Of course, this is not a high priority right now.)
This PR makes it possible to start a dataflow from scratch without having to git clone a repository.
Example
Video
Screencast.from.2024-10-09.12-11-57.webm
Small breaking change
download_file
function to use the predefined downloaded file name instead of changing it when downloading.download_file
as there is now no way to know if the filename and path is going to be the same before hand. I think that if users want to have caching, they should link the path before hand instead of using the url. This can be done in conjunction of a buildwget
orcurl
command.