Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validate gzip AND txt files found in conc/raw #157

Merged
merged 2 commits into from
Jan 10, 2022

Conversation

jsteverman
Copy link
Contributor

No description provided.

@@ -25,9 +25,9 @@ def review_logs(log_file)

conc_dir = Settings.concordance_path

raw_gzip_files = Dir.glob("#{conc_dir}/raw/*txt.gz").sort
raw_files = Dir.glob("#{conc_dir}/raw/*txt*").sort
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any concerns about this matching e.g. "this_is_not_a_txt_file"? I figure we control what goes in that directory, so it shouldn't be a big issue?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No more than it matching "junk_that_shouldnt_be_in_this_directory.txt.gz". If it gets garbage it will end up erroring out anyway.

@jsteverman jsteverman requested a review from mwarin January 10, 2022 18:03

file_names_split = raw_gzip_files.map {|fname| fname.split("/").last.split("_").first }
file_names_split = raw_files.map {|fname| fname.split("/").last.split("_").first }
raw_dates = file_names_split.select {|d| d =~ /^\d+$/ }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be more clear here to use a regex here -- if nothing else, it would be worth a comment here as to what filename format this is expecting.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commented expected file format.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And now with regex instead of split("_").first

@jsteverman jsteverman force-pushed the validate-and-delta-txt-and-gzip branch from 41fa652 to 4c6dfe9 Compare January 10, 2022 18:29
@jsteverman jsteverman force-pushed the validate-and-delta-txt-and-gzip branch from 4c6dfe9 to 665645d Compare January 10, 2022 20:07
@jsteverman jsteverman merged commit dadd8d7 into main Jan 10, 2022
@aelkiss aelkiss deleted the validate-and-delta-txt-and-gzip branch February 8, 2022 14:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants