Overlay execution info messages in timeline view #3501

hamersaw · 2023-03-20T16:33:18Z

Discussed in #3429

^{Originally posted by hamersaw March 8, 2023}

Motivation

The timeline UI view is marginally useful to debug performance, but has a lot of room for improvement. Integrating the runtime metrics breakdown proposed in the performance observability RFC is a step in the right direction, partitioning node executions into a collection of categorized time-series. This representation will help the "what" but misses a lot of the "why". For example, if a particular execution has a large amount of frontend plugin overhead this means that Flyte started the Task but the backend service has not yet indicated the service has started. K8s gurus will be quick to identify that there may be scheduling contention, large image pull times, or a few other likely scenarios. However, this is not easily available to the user even though FlytePropeller has this information available. We currently store a singular "reason" for the current execution status' but may be better off tracking a time-series of reasons to better explain the execution.

Proposal

This proposal outlines a solution for overlaying a collection of human readable messages in the timeline view. The exact representation is VERY open for debate, but I envision something similar to jaeger (time-series telemetry data with events) which uses a single tick mark that displays a message on hover. This solution supplies the "why" in an explanation of the reported execution status that will complement the "what" in the runtime breakdown of the execution time-series. The goal will be to balance utility with simplicity, displaying a "useful" number of messages to improve context.

Implementation

Currently, FlyteAdmin maintains a singular "reason" within the task execution metadata. This is updated in-place on each event from FlytePropeller, meaning the old "reasons" are not persisted. At risk of over-simplifying this, we will need to transition to maintaining a collection of "reasons" with associated timestamps. This will require updates in the following repositories:

FlyteIDL: update TaskExecutionClosure to have repeated reasons with associated timestamps.
FlyteAdmin: use an append to the "reason" list rather than overwriting the existing singular "reason".
FlyteConsole: correctly parse the "reason" list to annotate the timeline UI view.

Open Questions

How should this be visualized? I will leave this discussion for more UI / UX oriented personnel.
Should we add this information to node executions / workflow executions? Currently the "reason" is only tracked for the task-level execution.
Do we need to be able to send multiple reasons in a single task event?
currently possible to skip phases if execution progresses before FlytePropeller detects and processes the intermediate stage
could use event buffers to just send multiple events -> probably the better solution

hamersaw · 2023-03-22T13:35:46Z

similar to #3357

hamersaw self-assigned this Mar 20, 2023

hamersaw added this to the 1.5.0 milestone Mar 20, 2023

This was referenced Mar 20, 2023

Add Reasons field to TaskExecutionClosure to track time-series of reasons flyteorg/flyteidl#382

Merged

Tracking reasons time-series flyteorg/flyteadmin#540

Merged

eapolinario assigned jsonporter Mar 27, 2023

jsonporter assigned james-union Mar 27, 2023

eapolinario unassigned hamersaw Apr 3, 2023

jsonporter mentioned this issue Apr 13, 2023

[UI Feature] Add full-list log output to execution detail panel #3592

Closed

2 tasks

cosmicBboy modified the milestones: 1.5.0, 1.6.0 Apr 20, 2023

eapolinario added the ui Admin console user interface label Dec 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Overlay execution info messages in timeline view #3501

Overlay execution info messages in timeline view #3501

hamersaw commented Mar 20, 2023

Motivation

Proposal

Implementation

Open Questions

hamersaw commented Mar 22, 2023

Overlay execution info messages in timeline view #3501

Overlay execution info messages in timeline view #3501

Comments

hamersaw commented Mar 20, 2023

Discussed in #3429

Motivation

Proposal

Implementation

Open Questions

hamersaw commented Mar 22, 2023