You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Adding an apostrophe (or apostrophes) anywhere in a recognizable word will be treated as a distinct word, but will have the same closeness value as the word without the apostrophes.
For example, all of the following words were accepted as distinct words, and they all had the exact same closeness value:
צבע
צבע'
'צבע
צ'בע
צב'ע
צב'ע'
צ''''בע
More correct behavior would probably be to either reject those words or not count them as distinct from the original.
The text was updated successfully, but these errors were encountered:
Thanks!
It seems that gensim.corpora.wikicorpus which we are using to sanitize the w2v input sanitizes apostrophes by default. It might be the case that it does not have to be the case, but it requires some investigation. In order to allow words like ז'בוטינסקי we are deleting the apostrophes on the server side.
A possible solution to this bug (suggested by @Iddoyadlin) is to delete the apostrophe on the client side, at least until we figure out how to sanitize the data correctly.
Adding an apostrophe (or apostrophes) anywhere in a recognizable word will be treated as a distinct word, but will have the same closeness value as the word without the apostrophes.
For example, all of the following words were accepted as distinct words, and they all had the exact same closeness value:
צבע
צבע'
'צבע
צ'בע
צב'ע
צב'ע'
צ''''בע
More correct behavior would probably be to either reject those words or not count them as distinct from the original.
The text was updated successfully, but these errors were encountered: