Skip to content

Commit

Permalink
collapse whitespace for all langs
Browse files Browse the repository at this point in the history
  • Loading branch information
josh committed Jan 30, 2019
1 parent b0cbeb8 commit ba2b52e
Showing 1 changed file with 2 additions and 0 deletions.
2 changes: 2 additions & 0 deletions src/corporacreator/preprocessors/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -75,5 +75,7 @@ def common(sentence):
sentence = _strip_tags(sentence)
# Remove non-printable characters
sentence = _strip_string(sentence)
# collapse all whitespace and replace with single space
sentence = (' ').join(sentence.split())
# TODO: Clean up data in a language independent manner
return sentence

0 comments on commit ba2b52e

Please sign in to comment.