Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WP-14838 data pre-process configuration: header/footer rows and column header [API] #48

Merged
merged 9 commits into from
Apr 12, 2023

Conversation

jayma91
Copy link

@jayma91 jayma91 commented Apr 10, 2023

WP-14838 data pre-process configuration: header/footer rows and column header [API]
https://varicent.atlassian.net/browse/WP-14838

notes:

  • for handling first_row as header/first record, preprocessStream will skip empty lines to grab first row to follow csv.DictReader's behaviour.
  • skip_header_row handled in first thread, skip_footer_row handled in last thread. With current implementation in symon backend, min split size is 5MB and ui added restriction of max 100 rows for skip header, footer.

# Only modifying for imports from csv connector for now as imports from s3 connector might have reasons for skipping empty rows.
# Could look into using csv.reader instead for cleaner code if s3 connector could also keep empty rows.
if options.get('is_csv_connector_import', False):
csv.DictReader.__next__ = next_without_skip
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is no longer an issue. Both original csv.DictReader.next and next_without_skip will skip empty lines (no commas), but keep lines with commas.

@jayma91 jayma91 marked this pull request as ready for review April 11, 2023 16:44
ChrisLing1
ChrisLing1 previously approved these changes Apr 11, 2023
'type': 'object',
'properties': {}
}
raise Exception('File is empty.')
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this thrown just if there is no data rows in the file (since we throw another Exception('File is empty.') from csv_iterator.get_row_iterator if the file doesn't even have a header row)? If so, do we need to do that, since there's already complaints about ICM connector failing when data is empty, when they would rather it just import the empty data successfully?

@mjdoor mjdoor self-requested a review April 12, 2023 13:32
mjdoor
mjdoor previously approved these changes Apr 12, 2023
@jayma91 jayma91 dismissed stale reviews from mjdoor, rsantos-varicent, and ChrisLing1 via 07cbc00 April 12, 2023 13:45
@jayma91
Copy link
Author

jayma91 commented Apr 12, 2023

removed exception for empty file as we will have to revisit this to allow empty table import for icm.

Update: import fails for empty file during sync for having empty schema. Seems better to keep error handling and remove later when fixing for empty file import.

@jayma91 jayma91 requested a review from mjdoor April 12, 2023 13:52
@jayma91 jayma91 merged commit 227e0eb into master Apr 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants