Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WP-10970 : add source_type to tap-s3-csv #21

Merged
merged 9 commits into from
Aug 3, 2022
Merged

WP-10970 : add source_type to tap-s3-csv #21

merged 9 commits into from
Aug 3, 2022

Conversation

woody-feng
Copy link

if column_updates and len(column_updates) > 0:
updates = list(column_updates.values())[0]
for update in updates:
column = update['column']
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to check modify

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

source_types_for_updatecols

@@ -27,8 +27,12 @@ def load_metadata(table_spec, schema):
if table_spec.get('key_properties', []) and field_name in table_spec.get('key_properties', []):
mdata = metadata.write(
mdata, ('properties', field_name), 'inclusion', 'automatic')
mdata = metadata.write(
mdata, ('properties', field_name), 'source_type', 'string')
else:
mdata = metadata.write(
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

always do this, no if else

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to revert setting string as [number, stirng, null]?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this provided by singer

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@woody-feng @yi-varicent meant in conversion.py datatype_schema() we append string as type, we shouldn't need to anymore right? for example discovery should not return [null, 'number', 'string'] for a numeric column?

self.integer_datetime_fmt = integer_datetime_fmt
self.pre_hook = pre_hook
self.removed = set()
self.filtered = set()
self.errors = []
self.column_updates_map = column_updates_map

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename this as well

else:
return False, None

def _transform(self, data, typ, schema, path, source_type=None):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True/False, None/data
doesSchemaMatch, value

if source_type == 'string':
return True, str(data)
else:
return False, None

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

elseif type === number: returen true, float(data)

num = data.replace(',', '') if isinstance(data, str) else data
float(num)
return True, str(data)
return True, float(data)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getType(type)
if type==="humber"+: return true float(data)

@woody-feng woody-feng requested a review from yi-varicent August 3, 2022 04:00
@@ -230,15 +233,16 @@ def sync_csv_file(config, file_handle, s3_path, table_spec, stream):
auto_fields, filter_fields, source_type_map = transform.resolve_filter_fields(
mdata)

column_updates_map = get_column_update_map(config, source_type_map)
source_type_for_updatecol_map = get_column_update_map(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change method name

@woody-feng woody-feng merged commit 3c52b28 into master Aug 3, 2022
@@ -27,8 +27,12 @@ def load_metadata(table_spec, schema):
if table_spec.get('key_properties', []) and field_name in table_spec.get('key_properties', []):
mdata = metadata.write(
mdata, ('properties', field_name), 'inclusion', 'automatic')
mdata = metadata.write(
mdata, ('properties', field_name), 'source_type', 'string')
else:
mdata = metadata.write(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@woody-feng @yi-varicent meant in conversion.py datatype_schema() we append string as type, we shouldn't need to anymore right? for example discovery should not return [null, 'number', 'string'] for a numeric column?

@@ -193,6 +193,22 @@ def sync_compressed_file(config, s3_path, table_spec, stream):
return records_streamed


def get_source_type_for_updatecol_map(config, source_type_map):
column_updates = config['columns_to_update'] if 'columns_to_update' in config else None

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can do config.get('columns_to_update') instead.


source_type_for_updatecol_map = {}
if column_updates and len(column_updates) > 0:
updates = list(column_updates.values())[0]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the key to column_updates stream? why don't we pass in stream and access column_updates properly instead of converting it to a list?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

emmmm for csv it has only one sheet, so I just take the default one but u r right, passing the stream should be better

Copy link
Author

@woody-feng woody-feng Aug 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://varicent.atlassian.net/browse/WP-11496
I will fix all above issues and remove string in this ticket

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants