Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Openapi new auth #4086

Merged
merged 9 commits into from
Feb 25, 2022
Merged

Openapi new auth #4086

merged 9 commits into from
Feb 25, 2022

Conversation

vlavorini
Copy link
Contributor

Checklist

  • The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
  • Links to related issues (if applicable)
  • Tests for the changes have been added/updated (if applicable)
  • Docs related to the changes have been added/updated (if applicable)

Context

In order to interrogate the endpoints, an authentication via token is required.

Before this PR, only a POST request with username/password in the body was available. Now also a GET method, with username/password directly wrote in the url, is available.

@xiphl
Copy link
Contributor

xiphl commented Feb 8, 2022

Hi @vlavorini , may I ask if there are any plans to expand the connector to return information about POST endpoints as well?
Currently I only see GET endpoints when I ingest openapi sources.

@vlavorini
Copy link
Contributor Author

Hi @xiphl, in my idea you can obtain datasets only with 'get' methods, but i acknowledge that also POST requests can return data.

There are two theoretical issues with POST:

From the practical point of view, even if you are sure about your endpoints, you need to fill the request Body, and that mean that you have to know what to insert in there.

I'm pretty sure that even faced to the same set of APIs, different POST endpoints need different content in their body. I don't know how to model this.

@github-actions
Copy link

github-actions bot commented Feb 8, 2022

Unit Test Results (build & test)

  70 files  +  2    70 suites  +2   15m 40s ⏱️ + 4m 47s
611 tests +15  552 ✔️ +15  59 💤 ±0  0 ±0 

Results for commit 80a557e. ± Comparison against base commit 60c17a2.

♻️ This comment has been updated with latest results.

@github-actions
Copy link

github-actions bot commented Feb 8, 2022

Unit Test Results (metadata ingestion)

       5 files  +    2         5 suites  +2   43m 25s ⏱️ +28s
   341 tests +  37     341 ✔️ +  37    0 💤 ±  0  0 ±0 
1 552 runs  +680  1 514 ✔️ +662  38 💤 +18  0 ±0 

Results for commit 80a557e. ± Comparison against base commit 60c17a2.

This pull request removes 6 and adds 43 tests. Note that renamed tests count towards both.
tests.unit.test_pipeline.PipelineTest ‑ test_configure
tests.unit.test_pipeline.PipelineTest ‑ test_run_including_fake_transformation
tests.unit.test_pipeline.PipelineTest ‑ test_run_including_registered_transformation
tests.unit.test_plugin_system ‑ test_list_all[False]
tests.unit.test_plugin_system ‑ test_list_all[True]
tests.unit.test_utilities ‑ test_groupby_unsorted
tests.integration.clickhouse.test_clickhouse ‑ test_clickhouse_ingest
tests.integration.kafka.test_kafka_state ‑ test_kafka_ingest_with_stateful
tests.integration.tableau.test_tableau ‑ test_tableau_ingest
tests.unit.stateful_ingestion.test_kafka_state ‑ test_kafka_common_state
tests.unit.test_athena_source ‑ test_athena_get_table_properties
tests.unit.test_athena_source ‑ test_athena_uri
tests.unit.test_bigquery_source ‑ test_simple_upstream_table_generation
tests.unit.test_bigquery_source ‑ test_upstream_table_generation_with_temporary_table_with_multiple_temp_upstream
tests.unit.test_bigquery_source ‑ test_upstream_table_generation_with_temporary_table_with_temp_upstream
tests.unit.test_bigquery_source ‑ test_upstream_table_generation_with_temporary_table_without_temp_upstream
…

♻️ This comment has been updated with latest results.

@xiphl
Copy link
Contributor

xiphl commented Feb 8, 2022

I see... So these were the considerations behind it. Thanks for sharing!

@RyanHolstien
Copy link
Collaborator

I think we should make note that using user/pass in the URL should be done cautiously. Depending on set-up, this could log the username & password combo in the webserver logs as a part of the request URL. Otherwise LGTM cc: @dexter-mh-lee

@vlavorini
Copy link
Contributor Author

Hi @RyanHolstien , do you think it will be good to add few lines in the docs? Cheers

@RyanHolstien
Copy link
Collaborator

After discussing with the team, we think that this should be done through an auth token rather than directly supplying the username and password. This adds an extra step of reaching out to the auth provider separately from the script to get the auth token, but we believe it to be a more secure pattern to follow.

@vlavorini
Copy link
Contributor Author

I understand, and agree with you. The problem is that there are applications that leave you no choice, like the one I faced, for which I had to implement such a poor method.

@RyanHolstien
Copy link
Collaborator

Yeah in the case that your auth provider does not provide an option to produce a token, I think this is the only option that makes sense.

We should still add some docs on the drawbacks of this approach and if possible include another option to support a token in addition to the user/pass combo.

@@ -269,7 +297,7 @@ def close(self):

class OpenApiSource(APISource):
def __init__(self, config: OpenApiConfig, ctx: PipelineContext):
super().__init__(config, ctx, "OpenApi")
Copy link
Contributor

@shirshanka shirshanka Feb 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Is this (changing OpenApi to openapi) intentional?
  2. what happens to previously ingested urns? when you re-run ingestion they get new urns, so the old entities will continue to exist?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's intentional. I spotted that there was issues with finding the icon, so I uniformed all to the lowercase format, as in other ingestion plugins.
The older urns will continue to exist as different entities.

@shirshanka shirshanka added the pending-submitter-response Issue/request has been reviewed but requires a response from the submitter label Feb 20, 2022
Copy link
Contributor

@shirshanka shirshanka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@shirshanka shirshanka merged commit a113e43 into datahub-project:master Feb 25, 2022
@vlavorini vlavorini deleted the openapi_new_auth branch February 25, 2022 08:13
anshbansal added a commit to anshbansal/datahub that referenced this pull request Feb 28, 2022
@anshbansal
Copy link
Collaborator

@vlavorini I am sending a PR to revert this after discussing with @shirshanka. This is backward incompatible and it will cause issues for companies using OpenApi already in production.

Problem in finding logo -> Please fix here to change the case so logo is found correctly https://github.com/linkedin/datahub/blob/master/metadata-service/war/src/main/resources/boot/data_platforms.json#L171

get_token: True changed to be a dict. Please add another variable so existing recipes do not break.

@vlavorini vlavorini restored the openapi_new_auth branch February 28, 2022 12:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pending-submitter-response Issue/request has been reviewed but requires a response from the submitter
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants