Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change of schema for ML.FEATURE_INFO #33

Open
hardtke opened this issue Jul 6, 2022 · 1 comment
Open

Change of schema for ML.FEATURE_INFO #33

hardtke opened this issue Jul 6, 2022 · 1 comment
Labels
bug Something isn't working

Comments

@hardtke
Copy link
Contributor

hardtke commented Jul 6, 2022

Our model audit post hook started failing recently. As far as I can tell, Bigquery ML removed the median column from ML.FEATURE_INFO. Does anyone have a fix that can preserve our historical model data?

{% macro _audit_table_columns() %}

{% do return ({
'model': 'string',
'schema': 'string',
'created_at': dbt_utils.type_timestamp(),
'training_info': 'array<struct<training_run int64, iteration int64, loss float64, eval_loss float64, learning_rate float64, duration_ms int64, cluster_info array<struct<centroid_id int64, cluster_radius float64, cluster_size int64>>>>',
'feature_info': 'array<struct<input string, min float64, max float64, mean float64, median float64, stddev float64, category_count int64, null_count int64>>',
'weights': 'array<struct<processed_input string, weight float64, category_weights array<struct<category string, weight float64>>>>',
}) %}

image

@rbjerrum
Copy link
Collaborator

Thanks for reporting this, @hardtke. Do I understand correctly that you'd like to preserve all data that you've collected thus far, and thus, it's not sufficient to remove the median column from the audit table?

I think a viable backwards compatible change would be to keep the median column in the audit table, but ensure that going forward, we simply write a null in that column instead of trying to select it from the feature_info table which fails.

Changing line 28 in https://github.com/kristeligt-dagblad/dbt_ml/blob/master/macros/hooks/model_audit.sql#L28:

feature_info: &default_feature_info ['*']

to

feature_info: &default_feature_info array(select as struct input, min, max, mean, cast(null as float64) as median, stddev, category_count, null_count)

should do the trick I believe.

@rbjerrum rbjerrum added the bug Something isn't working label Jul 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants