Skip to content
This repository has been archived by the owner on Oct 9, 2023. It is now read-only.

Add BigQuery plugin #161

Merged
merged 3 commits into from
Jun 4, 2021
Merged

Add BigQuery plugin #161

merged 3 commits into from
Jun 4, 2021

Conversation

kanterov
Copy link
Contributor

@kanterov kanterov commented Mar 11, 2021

TL;DR

Implements WebAPI plugin for BigQuery.

Type

  • Bug Fix
  • Feature
  • Plugin

Are all requirements met?

  • Code completed
  • Smoke tested
  • Unit tests added
  • Code documentation added
  • Any pending items have an associated Issue

Complete description

Adds implementation of token exchange mechanism to get Google credentials using workload identity. Implement BigQuery plugin, for now, only to create QUERY jobs, COPY/EXTRACT/LOAD jobs are going to be parts of subsequent pull requests.

Tracking Issue

flyteorg/flyte#817

Follow-up issue

NA

@kanterov kanterov requested a review from EngHabu March 11, 2021 13:29
@kanterov kanterov changed the title Add BigQuery plugin [WIP] Add BigQuery plugin Mar 11, 2021
@codecov
Copy link

codecov bot commented Mar 11, 2021

Codecov Report

Merging #161 (7a72837) into master (bc5350f) will increase coverage by 0.36%.
The diff coverage is 68.29%.

❗ Current head 7a72837 differs from pull request most recent head ce858be. Consider uploading reports for the commit ce858be to get more accurate results
Impacted file tree graph

@@            Coverage Diff             @@
##           master     #161      +/-   ##
==========================================
+ Coverage   60.00%   60.36%   +0.36%     
==========================================
  Files         131      135       +4     
  Lines        6966     7292     +326     
==========================================
+ Hits         4180     4402     +222     
- Misses       2361     2447      +86     
- Partials      425      443      +18     
Flag Coverage Δ
unittests ?

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
go/tasks/pluginmachinery/core/phase.go 26.92% <0.00%> (-1.65%) ⬇️
go/tasks/plugins/webapi/bigquery/config_flags.go 50.00% <50.00%> (ø)
go/tasks/plugins/webapi/bigquery/plugin.go 63.87% <63.87%> (ø)
go/tasks/plugins/webapi/bigquery/query_job.go 89.70% <89.70%> (ø)
go/tasks/plugins/array/k8s/transformer.go 74.41% <100.00%> (+1.24%) ⬆️
go/tasks/plugins/webapi/bigquery/config.go 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bc5350f...ce858be. Read the comment docs.

@kanterov kanterov force-pushed the bigquery branch 5 times, most recently from 650beec to 02d23f7 Compare March 12, 2021 16:10
@kanterov kanterov changed the title [WIP] Add BigQuery plugin Add BigQuery plugin Mar 12, 2021
@kanterov kanterov marked this pull request as ready for review March 12, 2021 16:22
@kanterov kanterov force-pushed the bigquery branch 2 times, most recently from b626f25 to 4b94f6f Compare March 12, 2021 17:14
// For standard SQL queries, this flag is ignored and large results are
// always allowed. However, you must still set destinationTable when
// result size exceeds the allowed maximum response size.
AllowLargeResults bool `json:"allowLargeResults,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be input to the execution or task definition?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ohh nevermind, i see you are using this as a json input

Copy link
Contributor Author

@kanterov kanterov Mar 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a great question :), there is no way to interact between task inputs and these fields yet. One option can be to support go templates in some of the fields, for instance: custom: { query: "{{ .inputs.query }} }" }

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we want to model this as a protobuf (plugins) so that we can get the protobufs built automatically?

go/tasks/pluginmachinery/google/config.go Outdated Show resolved Hide resolved
go.sum Outdated Show resolved Hide resolved
go/tasks/pluginmachinery/google/config.go Outdated Show resolved Hide resolved
go/tasks/pluginmachinery/google/config.go Outdated Show resolved Hide resolved
go/tasks/pluginmachinery/google/config.go Outdated Show resolved Hide resolved
}

if createError.Code >= http.StatusInternalServerError {
return core.PhaseInfoFailed(pluginsCore.PhaseRetryableFailure, systemExecutionError, taskInfo)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

func PhaseInfoSystemRetryableFailure(code, reason string, info *TaskInfo) PhaseInfo {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can change, but there is no similar method for non-retryable failures, that loses symmetry in the code and makes it less readable (in my opinion).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we do have a method for non-retryable errors. Just the word retryable is dropeed -

func PhaseInfoSystemFailure(code, reason string, info *TaskInfo) PhaseInfo {

go/tasks/plugins/webapi/bigquery/plugin.go Outdated Show resolved Hide resolved
go/tasks/plugins/webapi/bigquery/plugin.go Outdated Show resolved Hide resolved
options := []option.ClientOption{
option.WithScopes("https://www.googleapis.com/auth/bigquery"),
// FIXME how do I access current version?
option.WithUserAgent(fmt.Sprintf("%s/%s", "flytepropeller", "LATEST")),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably put this in the context...

go/tasks/plugins/webapi/bigquery/plugin.go Show resolved Hide resolved
@kanterov
Copy link
Contributor Author

kanterov commented Apr 7, 2021

Didn't have much time to work on it. I'm going through the comments. There is an issue with changing to using Kubernetes client from controller-runtime. It doesn't support sub-resources, and TokenRequest is a sub-resource to ServiceAccount. See kubernetes-sigs/controller-runtime#172.

We have REST client that was originally used on top-level initialization code in flytepropeller, so we can potentially propagate it all the way through, or we can keep it as is. As for now, the second option seems more feasible not to bloat this pull request.

@EngHabu @kumare3 what do you think?

@kumare3
Copy link
Contributor

kumare3 commented May 11, 2021

Shall we discuss this today?

@kumare3
Copy link
Contributor

kumare3 commented May 27, 2021

@kanterov Would love to help push this over the boundary. How should we do this?

@kanterov kanterov force-pushed the bigquery branch 3 times, most recently from 5b37cd8 to b5ebb0f Compare May 28, 2021 11:11
@kanterov
Copy link
Contributor Author

@kumare3 @EngHabu as we discussed, I've removed token exchange part (that needs k8s rest client), and addressed comments for the rest of the code.

@@ -10,7 +10,7 @@ download_tooling: #download dependencies (including test deps) for the package

.PHONY: lint
lint: download_tooling #lints the package for common code smells
GL_DEBUG=linters_output,env golangci-lint run --deadline=5m --exclude deprecated -v
GL_DEBUG=linters_output,env golangci-lint run --deadline=7m --exclude deprecated -v
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this needs an update in boilerplate and it will automatically percolate.
But do you need this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did that because linting was timing out. I think it's because of added dependencies. Let me try again.

Copy link
Contributor

@kumare3 kumare3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments, more like nits
Just one major thing, the config for the plugin. Do we want that to be a protobuf? Or how is the java sdk sending this object today

@kanterov
Copy link
Contributor Author

kanterov commented Jun 3, 2021

@kumare3 we have implemented dataclass for Python and auto-value class for Java/Scala. I can also write protobuf for it, but it isn't clear if it worth it. SDK code is already typed, and on wire it becomes proto struct.

kanterov added 3 commits June 4, 2021 13:39
Add default implementation for token exchange for GCP.

Signed-off-by: Gleb Kanterov <[email protected]>
Signed-off-by: Gleb Kanterov <[email protected]>
@kanterov
Copy link
Contributor Author

kanterov commented Jun 4, 2021

@kumare3 for protobuf, I started to write it, it will take some time, because there are many nested objects are imported from dependencies.

One thing that I noticed is that currently go structs use camelCase for naming, while protobuf messages use snake_case. I'm not sure it's possible to change how protobuf serializes into structs. Underlying Google API uses camelCase, while proto convention is snake_case. Do you think we can use camelCase in proto?

@kumare3
Copy link
Contributor

kumare3 commented Jun 4, 2021

@kanterov that's ok, task template has versioning, so that you can easily choose the implementation.

I see that there is too much work. Let's go ahead with what you have. This might be a different example.

I also have another idea, we can get the proto file from the go struct or use open api

@kanterov kanterov merged commit fea5d84 into master Jun 4, 2021
@welcome
Copy link

welcome bot commented Jun 4, 2021

Congrats on merging your first pull request! 🎉

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants