Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-604] get_columns_in_relation when database is missing #180

Closed
jtcohen6 opened this issue May 4, 2022 · 0 comments
Closed

[CT-604] get_columns_in_relation when database is missing #180

jtcohen6 opened this issue May 4, 2022 · 0 comments
Assignees
Labels
type:bug Something isn't working type:regression

Comments

@jtcohen6
Copy link
Contributor

jtcohen6 commented May 4, 2022

Slack thread

A highly specific edge case for code that worked in v1.0, but not in v1.1.

Reproduction case

-- macros/test_get_cols.sql

{% test get_cols_in(model) %}

  {# The step which causes the issue #}
  {%- set relation = api.Relation.create(schema=model.schema, identifier=model.table) if execute -%}

  {% set columns = adapter.get_columns_in_relation(relation) %}
  
  select
    {% for col in columns %}
      {{ col.name }} {{ "," if not loop.last }}
    {% endfor %}
  
  from {{ model }}
  limit 0

{% endtest %}
-- models/my_model.sql
select 1 as id, 'text' as another_col
# models/whatever.yml
version: 2
models:
  - name: my_model
    tests:
      - get_cols_in

When I run this code locally, I see this error in the debug logs (logs/dbt.log):

BigQuery adapter: get_columns_in_relation error: Pass a string for project

Indeed, it looks like this error is because the project/database is empty when relation is created. When I update that line of code to be:

{%- set relation = api.Relation.create(database=model.database, schema=model.schema, identifier=model.table) if execute -%}

It works just fine.

So the question is, how was this working previously?

Root cause

The origin of this change: #98

get_columns_in_relation calls get_bq_table:

@staticmethod
def table_ref(database, schema, table_name):
dataset_ref = google.cloud.bigquery.DatasetReference(database, schema)
return google.cloud.bigquery.TableReference(dataset_ref, table_name)
def get_bq_table(self, database, schema, identifier):
"""Get a bigquery table for a schema/model."""
conn = self.get_thread_connection()
table_ref = self.table_ref(database, schema, identifier)
return conn.handle.get_table(table_ref)

Starting in v1.1, these use a new mechanism (DatasetReference + TableReference), which expect all of database/project, schema/dataset, and table_name to be present.

Previously, we were using a legacy / deprecated conn.handle.dataset endpoint instead. We passed the whole connection into that endpoint, so if the project was missing, we'd use the default project from the connection in its place:

@staticmethod
def dataset(database, schema, conn):
dataset_ref = conn.handle.dataset(schema, database)
return google.cloud.bigquery.Dataset(dataset_ref)
@staticmethod
def dataset_from_id(dataset_id):
return google.cloud.bigquery.Dataset.from_string(dataset_id)
def table_ref(self, database, schema, table_name, conn):
dataset = self.dataset(database, schema, conn)
return dataset.table(table_name)
def get_bq_table(self, database, schema, identifier):
"""Get a bigquery table for a schema/model."""
conn = self.get_thread_connection()
table_ref = self.table_ref(database, schema, identifier, conn)
return conn.handle.get_table(table_ref)

Resolution

I definitely don't want to go back to the legacy endpoint! I think it should be quite simple for us to restore the previous behavior. In the case where database is None, let's fill in with the configured database (from credentials or GCloud default).

    def get_bq_table(self, database, schema, identifier):
        """Get a bigquery table for a schema/model."""
        conn = self.get_thread_connection()
        # backwards compatibility: fill in with defaults if not specified
        database = database or conn.credentials.database
        schema = schema or conn.credentials.schema
        table_ref = self.table_ref(database, schema, identifier)
        return conn.handle.get_table(table_ref)
@jtcohen6 jtcohen6 added type:bug Something isn't working type:regression labels May 4, 2022
@github-actions github-actions bot changed the title get_columns_in_relation when database is missing [CT-604] get_columns_in_relation when database is missing May 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type:bug Something isn't working type:regression
Projects
None yet
Development

No branches or pull requests

2 participants