You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If we assumed a hypothetical 18,000 word list that was just 9,000 words and their plurals, I think the odds of getting at least one "awkward double" in a 4-word passphrase is (1/18000) * (2/18000) * (3/18000), which is a really small number. But check my math!
Although your conclusion is correct ("the odds...is a really small number"), the odds of this happening is over 600 million times more probable than what you have estimated.
The correct probability is 1/9000 + 2/9000 + 3/9000 - 11/9000**2 + 6/9000**3.
To prove this for a word list containing N words and their plurals (2 N words total), if P1 is the probability of getting at least one "awkward double", and if P0 is the probability of getting no awkward doubles, then
P1 =1 - P0
The probability if getting no awkward doubles (P0) is the number of passphrases containing only unique stems (i.e., once a word has been selected, it cannot be reselected itself, and neither can its conjugate -- the plural or singular form, whichever was not picked in the previous selection), divided by the total number of possible passphrases. For a passphrase consisting of k words, the total number of passphrases is
Ntotal = (2 N)k
To compute the number of passphrases containing only unique stems, the size of the word pool is reduced by 2 each time a word is selected (because the word itself is eliminated from further consideration, as is the plural/singular form of that word):
Nunique = (2 N)(2 N - 2)(2 N - 4) ... (2 N - 2 k +2)
Therefore, the probability of getting only unique stems is
Thanks! Let me know if you want me to try to clarify any particular steps of the derivation -- I did skip over some algebra steps in a few of the equations above.
In The Similar-Words Problem in your Readme, you wrote:
Although your conclusion is correct ("the odds...is a really small number"), the odds of this happening is over 600 million times more probable than what you have estimated.
The correct probability is
1/9000 + 2/9000 + 3/9000 - 11/9000**2 + 6/9000**3
.To prove this for a word list containing N words and their plurals (2 N words total), if P1 is the probability of getting at least one "awkward double", and if P0 is the probability of getting no awkward doubles, then
P1 =1 - P0
The probability if getting no awkward doubles (P0) is the number of passphrases containing only unique stems (i.e., once a word has been selected, it cannot be reselected itself, and neither can its conjugate -- the plural or singular form, whichever was not picked in the previous selection), divided by the total number of possible passphrases. For a passphrase consisting of k words, the total number of passphrases is
Ntotal = (2 N)k
To compute the number of passphrases containing only unique stems, the size of the word pool is reduced by 2 each time a word is selected (because the word itself is eliminated from further consideration, as is the plural/singular form of that word):
Nunique = (2 N)(2 N - 2)(2 N - 4) ... (2 N - 2 k +2)
Therefore, the probability of getting only unique stems is
P0 = Nunique/Ntotal = (1 - (1/N))(1 - (2/N))...(1 - (k-1)/N)
Therefore, the general solution for the probability of getting at least one "awkward double" is
P1 = 1 - (1 - (1/N))(1 - (2/N))...(1 - (k-1)/N)
For k=4, the math works out to the following result:
P1 = 6/N - 11/N2 + 6/N3
If, in the general solution, one neglects higher-order terms (N-2, N-3, etc.), the following approximate solution is obtained:
P1 ≈ 1/N + 2/N + ... + (k - 1)/N = (k(k - 1))/(2 N)
The text was updated successfully, but these errors were encountered: