-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WP-10970 : add source_type to tap-s3-csv #21
Conversation
if column_updates and len(column_updates) > 0: | ||
updates = list(column_updates.values())[0] | ||
for update in updates: | ||
column = update['column'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need to check modify
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
source_types_for_updatecols
@@ -27,8 +27,12 @@ def load_metadata(table_spec, schema): | |||
if table_spec.get('key_properties', []) and field_name in table_spec.get('key_properties', []): | |||
mdata = metadata.write( | |||
mdata, ('properties', field_name), 'inclusion', 'automatic') | |||
mdata = metadata.write( | |||
mdata, ('properties', field_name), 'source_type', 'string') | |||
else: | |||
mdata = metadata.write( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
always do this, no if else
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need to revert setting string as [number, stirng, null]?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this provided by singer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@woody-feng @yi-varicent meant in conversion.py datatype_schema() we append string as type, we shouldn't need to anymore right? for example discovery should not return [null, 'number', 'string'] for a numeric column?
tap_s3_csv/transform.py
Outdated
self.integer_datetime_fmt = integer_datetime_fmt | ||
self.pre_hook = pre_hook | ||
self.removed = set() | ||
self.filtered = set() | ||
self.errors = [] | ||
self.column_updates_map = column_updates_map |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename this as well
else: | ||
return False, None | ||
|
||
def _transform(self, data, typ, schema, path, source_type=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True/False, None/data
doesSchemaMatch, value
if source_type == 'string': | ||
return True, str(data) | ||
else: | ||
return False, None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
elseif type === number: returen true, float(data)
tap_s3_csv/transform.py
Outdated
num = data.replace(',', '') if isinstance(data, str) else data | ||
float(num) | ||
return True, str(data) | ||
return True, float(data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getType(type)
if type==="humber"+: return true float(data)
tap_s3_csv/sync.py
Outdated
@@ -230,15 +233,16 @@ def sync_csv_file(config, file_handle, s3_path, table_spec, stream): | |||
auto_fields, filter_fields, source_type_map = transform.resolve_filter_fields( | |||
mdata) | |||
|
|||
column_updates_map = get_column_update_map(config, source_type_map) | |||
source_type_for_updatecol_map = get_column_update_map( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change method name
@@ -27,8 +27,12 @@ def load_metadata(table_spec, schema): | |||
if table_spec.get('key_properties', []) and field_name in table_spec.get('key_properties', []): | |||
mdata = metadata.write( | |||
mdata, ('properties', field_name), 'inclusion', 'automatic') | |||
mdata = metadata.write( | |||
mdata, ('properties', field_name), 'source_type', 'string') | |||
else: | |||
mdata = metadata.write( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@woody-feng @yi-varicent meant in conversion.py datatype_schema() we append string as type, we shouldn't need to anymore right? for example discovery should not return [null, 'number', 'string'] for a numeric column?
@@ -193,6 +193,22 @@ def sync_compressed_file(config, s3_path, table_spec, stream): | |||
return records_streamed | |||
|
|||
|
|||
def get_source_type_for_updatecol_map(config, source_type_map): | |||
column_updates = config['columns_to_update'] if 'columns_to_update' in config else None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can do config.get('columns_to_update')
instead.
|
||
source_type_for_updatecol_map = {} | ||
if column_updates and len(column_updates) > 0: | ||
updates = list(column_updates.values())[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the key to column_updates stream? why don't we pass in stream and access column_updates properly instead of converting it to a list?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
emmmm for csv it has only one sheet, so I just take the default one but u r right, passing the stream should be better
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://varicent.atlassian.net/browse/WP-11496
I will fix all above issues and remove string
in this ticket
https://varicent.atlassian.net/browse/WP-10970