-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(classifier): Add support for excluding list of exact column names #9472
feat(classifier): Add support for excluding list of exact column names #9472
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor changes suggested. Otherwise looks good.
@@ -66,6 +83,9 @@ class Config: | |||
description="Factors and their weights to consider when predicting info types", | |||
alias="prediction_factors_and_weights", | |||
) | |||
ExcludeName: Optional[ExcludeNameFactorConfig] = Field( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ExcludeName: Optional[ExcludeNameFactorConfig] = Field( | |
ExcludeName: Optional[List[str]] = Field(default=None, description="List of exact column names to exclude from classification for this info type") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do not need separate class as it can be represented as list of str directly..
}, | ||
} | ||
).config | ||
if config.info_types_config["Email_Address"].ExcludeName is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can remove this if/else by asserting presence of ExcludeName directly.
if config.info_types_config["Email_Address"].ExcludeName is not None: | |
assert config.info_types_config["Email_Address"].ExcludeName is not None |
}, | ||
} | ||
).config | ||
assert config.info_types_config["Email_Address"].ExcludeName is None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for adding tests.
2716727
to
d7d0430
Compare
@@ -73,6 +73,14 @@ class Config: | |||
description="Factors and their weights to consider when predicting info types", | |||
alias="prediction_factors_and_weights", | |||
) | |||
StripExclusionFormatting: bool = Field( | |||
default=True, alias="strip_exclusion_formatting" | |||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
StripExclusionFormatting
is not an infotype level config. As per equivalent datahub-classify PR, its at global level and is set only once across all infotypes. Did you intend this global behavior ? In that case -> strip_exclusion_formatting
should be in DataHubClassifierConfig
class in same file . Also, need not use StripExclusionFormatting
with alias strip_exclusion_formatting
here . You can directly name the field as strip_exclusion_formatting
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done!
StripExclusionFormatting: bool = Field( | ||
default=True, alias="strip_exclusion_formatting" | ||
) | ||
ExcludeName: Optional[List[str]] = Field( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ethan-cartwright could you also please update classification.md
with these changes ? Unfortunately the configs are not automatically updated yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, except one small comment regarding log levels on datahub-classify PR
done in this PR: acryldata/datahub-classify#21 |
Description
Extends the classification module's config to allow specification of an
ExcludeName
list that is used in this datahub-classify PR to exclude a list of exact column namesChecklist