Convert text segments (from Youtube ASR) into full sentences.
Developed for Temporal Alignment Networks for Long-term Video. Details described in Appendix A.
This module takes as input the ASR texts, and fixes three problems:
- Incorrect Language Translation. (example (a), original language is Thai).
- Incorrect linebreaks - resulting in repeatitive text segments. (example (b)).
- Incorrect Sentence Partition - cut text segments to fix window size (example (b)).
This module does five things:
- filter out non-EN languages.
- remove duplicate text segments.
- combine text segments into a paragraph, then add punctuations.
- cut the paragraph into sentences.
- interpolate ASR timestamps.
langdetect (https://pypi.org/project/langdetect/)
transformers (huggingface)
joblib
pandas
torch (v1.10, earlier versions might work)
Download the punctuation model [link]
cd sentencify_text
git lfs install
git clone https://huggingface.co/felflare/bert-restore-punctuation
Use as a separate python function:
from filters.sentencify import Sentencify
processor = Sentencify() # it will use cuda if torch.cuda.is_available()
# example input data
text_segments = [
"lay the tomatoes on the tray",
"and these go into the oven for about 90",
"minutes if you turn the oven down really",
"low you can leave them in overnight"]
start_timestamps = [33.45, 36.84, 39.69, 41.67]
end_timestamps = [39.69, 41.67, 43.77, 45.59]
# transform text segments into complete sentences
sentences, _, _ = processor.punctuate_and_cut(text_segments)
print(sentences)
# >>> ['lay the tomatoes on the tray and these go into the oven for about 90 minutes', 'if you turn the oven down really low you can leave them in overnight']
# can also interpolate the start/end timestamps for output sentences
sentences, new_start, new_end = processor.punctuate_and_cut(text_segments, start_timestamps, end_timestamps)
print(list(zip(sentences, new_start, new_end)))
# >>> [('lay the tomatoes on the tray and these go into the oven for about 90 minutes', 33.45, 40.20), ('if you turn the oven down really low you can leave them in overnight', 40.20, 45.59)]
Use to pre-process HowTo100M dataset:
python process_htm.py # check process_htm.py for details
Check https://www.robots.ox.ac.uk/~vgg/research/tan/index.html#htm-sentencify
If you find this module useful for your project, please consider citing our paper:
@InProceedings{Han2022TAN,
author = "Tengda Han and Weidi Xie and Andrew Zisserman",
title = "Temporal Alignment Networks for Long-term Video",
booktitle = "CVPR",
year = "2022",
}