Skip to content
This repository has been archived by the owner on Oct 9, 2023. It is now read-only.

Support gRPC config for agent-service plugin #368

Merged
merged 5 commits into from
Jul 31, 2023

Conversation

honnix
Copy link
Member

@honnix honnix commented Jul 3, 2023

TL;DR

Support customized gRPC config for agent-service plugin.

Type

  • Bug Fix
  • Feature
  • Plugin

Are all requirements met?

  • Code completed
  • Smoke tested
  • Unit tests added
  • Code documentation added
  • Any pending items have an associated Issue

Complete description

Support configuring gRPC endpoint with:

  • insecure
  • default service config
  • timeouts

Details can be found in flyteorg/flyte#3823

Note that this is a breaking change! I'm not sure how mature the agent-service is and how many users are using it, but if backward compatibility is important in this case, I can go an extra mile to achieve that.

Tracking Issue

Closes flyteorg/flyte#3823

Follow-up issue

NA

@codecov
Copy link

codecov bot commented Jul 3, 2023

Codecov Report

Merging #368 (52d9f7e) into master (3405f89) will increase coverage by 1.36%.
The diff coverage is 72.34%.

❗ Current head 52d9f7e differs from pull request most recent head 3c9957d. Consider uploading reports for the commit 3c9957d to get more accurate results

@@            Coverage Diff             @@
##           master     #368      +/-   ##
==========================================
+ Coverage   63.00%   64.36%   +1.36%     
==========================================
  Files         154      154              
  Lines       13030    10605    -2425     
==========================================
- Hits         8209     6826    -1383     
+ Misses       4208     3163    -1045     
- Partials      613      616       +3     
Flag Coverage Δ
unittests ?

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Changed Coverage Δ
go/tasks/plugins/webapi/agent/config.go 100.00% <ø> (ø)
go/tasks/plugins/webapi/agent/plugin.go 71.14% <72.34%> (+3.05%) ⬆️

... and 134 files with indirect coverage changes

@honnix honnix force-pushed the grpc-config branch 2 times, most recently from 67ef7a7 to f3662b9 Compare July 3, 2023 10:01
@honnix honnix marked this pull request as ready for review July 3, 2023 10:07
@honnix honnix requested a review from pingsutw July 3, 2023 10:07
Copy link
Contributor

@hamersaw hamersaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@honnix can you elaborate on what is not backwards compatible?

@honnix
Copy link
Member Author

honnix commented Jul 5, 2023

@honnix can you elaborate on what is not backwards compatible?

Both EndpointForTaskTypes and DefaultGrpcEndpoint are now completely different types so if a user has configured either of those, upgrading to this version will crash their application I guess.

@hamersaw
Copy link
Contributor

hamersaw commented Jul 5, 2023

upgrading to this version will crash their application I guess

@pingsutw we released flyte agents as an experimental feature right? so no guarantees of backwards compatibility. IMO fixing this grpc configuration issue warrants a backwards incompatible merge. thoughts?

@eapolinario
Copy link
Contributor

upgrading to this version will crash their application I guess

@pingsutw we released flyte agents as an experimental feature right? so no guarantees of backwards compatibility. IMO fixing this grpc configuration issue warrants a backwards incompatible merge. thoughts?

Agreed. This feature is tagged as experimental, so let's use this time to break backwards compatibility if needed (which seems like this particular instance is one of such times).

@hamersaw
Copy link
Contributor

hamersaw commented Jul 7, 2023

@honnix thinking about this it will be unable to cover the scenario where there are two flyte agent deployments where each satisfies multiple task types right? I'm thinking something like:

grpcConnections:
    - endpoint: x.com:8088
      taskTypes:
      - foo
      - bar
    - endpoint: y.com:8088
      taskTypes:
      - baz
      - bat

the current approach would create a separate gRPC connection for each task type right? whereas it would probably be better to reuse a single connection for each endpoint. Thoughts?

@honnix
Copy link
Member Author

honnix commented Jul 7, 2023

I don't think so. The task type here is only for routing and there should be only one grpc connection per each endpoint. This PR shouldn't change how that part works I think.

@hamersaw
Copy link
Contributor

I don't think so. The task type here is only for routing and there should be only one grpc connection per each endpoint. This PR shouldn't change how that part works I think.

So the EndpointForTaskTypes map is keyed on the task type, but the connectionCache is keyed on the endpoint. So you can't have two endpoints configured differently for different task types. For example, differing timeouts will result in unintended behavior:

endpointForTaskTypes:
  foo: 
    endpoint: localhost:8088
    timeout: 20
  bar: 
    endpoint: localhost:8088
    timeout: 40

IIUC the above would only create a single gRPC connection in the background (with timeout of whatever agent was called first) and reuse it for calls to both task types. I don't think this is a huge issue, but I don't like the unintended behavior for future added gRPC params that might be more important.

@honnix
Copy link
Member Author

honnix commented Jul 10, 2023

@hamersaw I see what you mean. I thought about that but I was not sure per task type endpoint configuration is a real use case or a bit of overthinking. My interpretation of this config is mostly about grouping, e.g. task type foo1 and foo2 go to endpoint A, bar1 and bar2 go to endpoint B. It is entirely possible that different task types might need different timeout values, but I was not very sure of the need. If we want to support this, we could make the connection cache keyed on GrpcEndpoint pointer instead of endpoint string. What do you think?

@honnix
Copy link
Member Author

honnix commented Jul 16, 2023

@hamersaw I did some experiment in d67a015 and now the config is like:

endpointForTaskTypes:
  foo: agent
  bar: bar_agent
  foo_bar: agent

grpcEndpoints:
  agent:
    endpoint: localhost:8088
    timeouts:
      CreateTask: 1s
  bar_agent:
    endpoint: localhost:8088
    timeouts:
      CreateTask: 2s

And the cache is keyed on the whole endpoint config (as a pointer).

Copy link
Contributor

@hamersaw hamersaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to update the naming here? Ex grpcEndpoints -> agentEndpoints and endpointForTaskTypes -> agentsForTaskTypes or taskTypeAgents?

Also, just want to make sure this doesn't introduce additional confusion. We still need to add all supported plugins to the supportedTaskTypes configuration right?

@honnix
Copy link
Member Author

honnix commented Jul 19, 2023

Does it make sense to update the naming here? Ex grpcEndpoints -> agentEndpoints and endpointForTaskTypes -> agentsForTaskTypes or taskTypeAgents?

SGTM but maybe it's better double check with the original author of the config section.

Also, just want to make sure this doesn't introduce additional confusion. We still need to add all supported plugins to the supportedTaskTypes configuration right?

Yes that is correct. This PR does not make any change to the semantics of supportedTaskTypes.

honnix added 4 commits July 27, 2023 08:43
@honnix
Copy link
Member Author

honnix commented Jul 27, 2023

Does it make sense to update the naming here? Ex grpcEndpoints -> agentEndpoints and endpointForTaskTypes -> agentsForTaskTypes or taskTypeAgents?

Also, just want to make sure this doesn't introduce additional confusion. We still need to add all supported plugins to the supportedTaskTypes configuration right?

@hamersaw I did some experiments in the latest commit. PTAL, thanks!

@hamersaw
Copy link
Contributor

@honnix I think this looks great! Thanks for humoring my nits.

@pingsutw would be great to get a final look from you before merging.

Copy link
Member

@pingsutw pingsutw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@honnix Thank you so much

@hamersaw hamersaw merged commit 22d2635 into flyteorg:master Jul 31, 2023
@honnix honnix deleted the grpc-config branch July 31, 2023 14:25
@honnix
Copy link
Member Author

honnix commented Jul 31, 2023

Thank you all for reviewing and providing great comments! No worries about nits at all. If we are gonna ship a breaking change, it's better we do it right.

@pingsutw
Copy link
Member

pingsutw commented Aug 1, 2023

@honnix btw, would you mind helping us update the doc here.

@honnix
Copy link
Member Author

honnix commented Aug 1, 2023

@honnix btw, would you mind helping us update the doc here.

Sure. I did not know this. I can do it tomorrow.

@honnix honnix mentioned this pull request Aug 2, 2023
8 tasks
@honnix
Copy link
Member Author

honnix commented Aug 2, 2023

@honnix btw, would you mind helping us update the doc here.

@pingsutw I tried that in flyteorg/flytesnacks#1073. Also I propose a small fix in #381 for better naming alignment. PTAL. Thank you.

eapolinario pushed a commit that referenced this pull request Sep 6, 2023
* Support gRPC config for agent-service plugin

Signed-off-by: Hongxin Liang <[email protected]>

* Address comments

Signed-off-by: Hongxin Liang <[email protected]>

* No deadline if timeout is 0

Signed-off-by: Hongxin Liang <[email protected]>

* Per task type grpc endpoint config

Signed-off-by: Hongxin Liang <[email protected]>

* Rename config items according to comments

Signed-off-by: Hongxin Liang <[email protected]>

---------

Signed-off-by: Hongxin Liang <[email protected]>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Core feature] Support more gRPC configuration in agent-service plugin
4 participants