You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
"Some languages have not been added to ZIM metadata due to missing "
f"ISO639-3 code: {ignored_ted_codes}"
)
ifnotself.disable_metadata_checks:
# Validate ZIM languages
validate_language("Language", self.zim_languages)
In order to properly sort the language in the list, we attribute 10 points to audio, and 1 point to subtitle.
We want to revisit this to only consider languages which are present in at least 30 or 50% of the videos. We are not really sure about the appropriate percentage.
The plan is hence:
add a new CLI argument which will be a float number between 0 and 1 (0.5 meaning 50%, ...) ; default value should be 50%
use this threshold in the compute_zim_languages to consider only languages which are present in at least 50% of the videos (a language is considered to be present in a given video if it is available as audio or as subtitle) for the computation of the list of languages
The text was updated successfully, but these errors were encountered:
Languages advertised in a TED ZIM is now based on the fact that the video has audio or subtitle in a given language.
This computation is done in
compute_zim_languages
function atted/src/ted2zim/scraper.py
Lines 343 to 397 in 9fc26d4
In order to properly sort the language in the list, we attribute 10 points to audio, and 1 point to subtitle.
We want to revisit this to only consider languages which are present in at least 30 or 50% of the videos. We are not really sure about the appropriate percentage.
The plan is hence:
compute_zim_languages
to consider only languages which are present in at least 50% of the videos (a language is considered to be present in a given video if it is available as audio or as subtitle) for the computation of the list of languagesThe text was updated successfully, but these errors were encountered: