Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to send stdout as node/operator output #388

Merged
merged 8 commits into from
Feb 29, 2024
Merged

Conversation

haixuanTao
Copy link
Collaborator

This PR makes it possibles for nodes to receive the logs of other nodes.

This makes it possible to see logs within the plotting window for example as shown in the example.

This is aim at potentially simplify debugging when using hot-reloading features or even in a further future auto-debugger.

@haixuanTao haixuanTao self-assigned this Dec 5, 2023
@haixuanTao haixuanTao marked this pull request as ready for review December 5, 2023 16:14
Copy link
Collaborator

@phil-opp phil-opp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea that you can subscribe to log outputs of other operators/nodes, but I'm not sure if the special op/logs output is the best approach for this. How about we introduce a node-level capture_logs switch to enable the sending of the DoraEvent::Logs messages? This way, it would be more obvious that some dora-provided functionality is used.

For subscribers we could make the logs available through a new built-in input stream, e.g. named dora/logs/<node_id>. This would make it clear that this is some data provided by dora, not a normal node/operator output.

What do you think? I can implement the above if you like.

binaries/daemon/src/spawn.rs Outdated Show resolved Hide resolved
binaries/daemon/src/spawn.rs Outdated Show resolved Hide resolved
examples/python-operator-dataflow/dataflow.yml Outdated Show resolved Hide resolved
@haixuanTao
Copy link
Collaborator Author

Commenting on dora/logs/<node_id>, I think that this might create some hidden connections between node that can later be hard to manage. I would prefer if they were only one way of creating links between nodes. Maybe we can name the output: captured_logs or stdout?

Sure thing if you want to do the implementation

@phil-opp
Copy link
Collaborator

Commenting on dora/logs/<node_id>, I think that this might create some hidden connections between node that can later be hard to manage. I would prefer if they were only one way of creating links between nodes.

Good point! How about we introduce a capture_logs_to: output_name config key, which allows users to specify the output that they want to use for logs? This way, we have all outputs still listed under the outputs key without a "magic" output name.


Apart from the syntax that we want to use, there is a second challenge. Right now we capture the output per node, not per operator. So the operator logs would also contain the outputs of other operators on that node.

Ideally we would split the output properly, but this is difficult sind stdout is set per process. Maybe we can make the dora runtime print special separator messages before and after calling into an operator, which the daemon could then use to split the split the output by operator... I'll look into it.

@haixuanTao
Copy link
Collaborator Author

Okay! I think that this can also be useful for Opentelemetry logs in the future to tag log at the operator level as well.

Shall we maybe merge a first version at custom nodes level first?

We can then prioritize operator level in the future?

@phil-opp
Copy link
Collaborator

Shall we maybe merge a first version at custom nodes level first?

Sure, I'm fine with that.

@phil-opp
Copy link
Collaborator

How about we introduce a capture_logs_to: output_name config key, which allows users to specify the output that they want to use for logs? This way, we have all outputs still listed under the outputs key without a "magic" output name.

I opened a draft PR at #392 to add a new send_stdout_as: <output_name> to custom nodes.

I decided for stdout instead of log in the name because I would like to implement a custom logger for dora at some point. For Rust code, this would mean that we provide set a custom log logger, which gives us more context than the normal println (e.g. proper start/end for multi-line messages, log level, source location, etc). For Python, we could provide a dora.log function that should be used instead of calling print. This function would also accept an optional log level (info, warn, err, etc.). Using this approach, we could easily assign log messages to operators, even in runtimes with multiple operators. What do you think, @haixuanTao?

@haixuanTao
Copy link
Collaborator Author

Sure! I think we can separate stdout and log.

FYI, for python, we can do something in the likes of tracing_for_pyo3_logging https://docs.rs/tracing-for-pyo3-logging/latest/src/tracing_for_pyo3_logging/lib.rs.html#1-63

@phil-opp
Copy link
Collaborator

FYI, for python, we can do something in the likes of tracing_for_pyo3_logging https://docs.rs/tracing-for-pyo3-logging/latest/src/tracing_for_pyo3_logging/lib.rs.html#1-63

Ah, that's nice!

@haixuanTao
Copy link
Collaborator Author

@haixuanTao
Copy link
Collaborator Author

Maybe we can also create a features feature flag configuration input for nodes in a similar way that Cargo.toml has feature flags for crates and stdout or logs can be one of them. In the future we can make it configurable to put more configuration flags.

@haixuanTao
Copy link
Collaborator Author

Or as an environment variable as well for the specific node

@haixuanTao
Copy link
Collaborator Author

So, I think that we can merge this. I have added the possibility to send_stdout_as for operators and added warnings that we do not split stdout between operators.

@phil-opp phil-opp changed the title Output logs Add option to send stdout as node/operator output Feb 28, 2024
Copy link
Collaborator

@phil-opp phil-opp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some unused import warnings that we should fix, otherwise this looks good to me. Thanks!

libraries/core/src/descriptor/mod.rs Outdated Show resolved Hide resolved
@haixuanTao haixuanTao enabled auto-merge February 29, 2024 13:47
@haixuanTao haixuanTao merged commit 2615b04 into main Feb 29, 2024
17 checks passed
@haixuanTao haixuanTao deleted the output-logs branch February 29, 2024 13:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants