-
Notifications
You must be signed in to change notification settings - Fork 77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Language Negotiation #5
Conversation
RFC4647 - https://tools.ietf.org/html/rfc4647 |
Looking at https://tools.ietf.org/html/rfc4647#section-3.3.2, that's mostly what we want, aside from the problem that, AFAICT, I think the right way to address that problem is to treat the available languages as language ranges. Which is what we intend to do: For For siblings, conceptually, we ammend the requested language range with a more generic range, if applicable? We have two choices here, The numerical weights are hard to get right, I think. In particular in the code right now, the weights mix as soon as you have 10 or so requested languages, as you mix order within the requested language array and weight. I'd much rather just use different output lists, and concat and unique them afterwards. Last but not least, the idea that this algorithm always returns an non-empty result is OK, but not something that can be implemented generically. It's a project setup question on which localization to fall back to in the end. Take the BN, maybe? Site developed in Polish, with a German localization? You'd need to configure your l10n setup to fall back to Polish in this case. I think the correct way to implement this is to post-fix the requested languages with the site default language, and only then to call into the language negotiation code. By doing that, you guarantee a non-empty result, but that's not part of the algorithm, but part of the setup. |
Thanks Pike! I started implementing it and the only problem I encountered so far is the one related to:
Since the second list is not ordered, there's no way for me to predict which one will it pick. ICU introduces Do you think it's ok to use that? I may also have to expand ranges with scripts (so, "en" is not becoming "en-", but "en-Latn-" in order to handle sr, zh etc.). |
I think I'm implementing this: http://www.unicode.org/reports/tr35/#LanguageMatching Is that ok, @Pike ? |
The whole weights stuff in tr35 is a bit bewildering, their math is close to using complex numbers? We'd run the match algorithm as a filter, though, right? Also, in the gecko version, we can use addLikelySubtags, but we'd still need languageInfo.xml data? And for a vanilla js impl, we'd need that and likelySubtags.xml, too? |
yes
For JS, I'd like to limit the amount of data we carry. I'm thinking about trying to filter the data based on all-locales. |
Am 17/02/2017 um 19:47 schrieb Zibi Braniecki:
but we'd still need languageInfo.xml data? And for a vanilla js
impl, we'd need that and likelySubtags.xml, too?
For JS, I'd like to limit the amount of data we carry. I'm thinking
about trying to filter the data based on all-locales.
In the context of fluent, reducing the dataset in general to the needs
of mozilla doesn't seem to be the right approach. I'm not sure if
there's a way to drill down on the dataset for a general-use low-level
library.
|
@Pike - I updated the patch to mostly follow the RFC4647, with a few bits from LDML likelySubtags and languageMatching. I think the code is actually pretty good for how small it is (less than 100 lines of code). It doesn't do everything that I'd like it to do (including ability for each available locale to provide its own fallback chain), but I think it's a good start. Let me know what you think. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit unsure on what to focus on in this review. I've ended up being weirdly close to the code at big-picture things, so I nag about details where the big picture is involved.
With the bugs I see, I don't yet see the outcome of the algorithm as it'd impact the big picture. Seems the English translation of the German saying is "I can't see the wood for the trees".
Hope this is helpful regardless.
fluent/src/langneg.js
Outdated
if (loc[2] === '*') { | ||
loc[2] = loc[0].toUpperCase(); | ||
return loc.join('-'); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should make this fully data-driven. Assuming that a language tag in upper case was a region code is just wrong. Let's not introduce ca-CA
or lij-LIJ
as likely subtags?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, here's the thing. LikelySubtags is a 50kb uncompressed JSON table [0].
If we have it (Gecko does), or in case where the user will want to pay the price (give me fluent with likelySubtags), then we don't need a dummy code like that.
But when the user doesn't want to pay the price, and 50kb is a pretty hefty price to pay on the client side, lack of this dummy code will make us not be able to catch any kind of 'ab-AB' extrapolation from 'ab'.
Since it's unlikely that anyone will provide 'ca-CA' or 'lij-LIJ' in available locales, there's no consequence of testing against it. But in the simplest form, they'll at least match all 'fr' to 'fr-FR', and 'it' to 'it-IT' and 'de' to 'de-DE' etc.
My idea was to allow for three "modes":
- Full CLDR likelySubtags (50kb price for excellent matching)
- Limited CLDR likelySubtags (say, 10-20 most common ones) + dummy
- Just dummy
I can see us curating this limited CLDR likelySubtags just to catch the most common ones like 'en' into 'en-US', and 'fr' into 'fr-FR' etc.
I believe that if we remove the dummy all together, we'll make the algorithm not work for most common cases and basically require the likelySubtags.
What do you think?
[0] https://github.com/unicode-cldr/cldr-core/blob/master/supplemental/likelySubtags.json
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find the line about "a language tag is a script tag with bad casing" pretty revolting, tbh. Let's not set precedent for that, people look at mozilla for how to handle standards, and this is just not like it.
I also start to realize that I actually disagree with the way you use of likelySubtags. In that context, I think most of the unfiltered likelySubtags data goes away. I'll take that top-level.
fluent/src/langneg.js
Outdated
* Replaces the region position with a range to allow for matches | ||
* with the same language/script but from different region. | ||
*/ | ||
function getRegionRange(locale) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd name this something else, maybe regionRangeFor
? getRegionRange sounds like a getter of information, not a transformation of the passed in data.
fluent/src/langneg.js
Outdated
* It can also accept a range `*` character on any position. | ||
*/ | ||
const localeRe = new RegExp( | ||
`^${languageCodeRe}${scriptCodeRe}?${regionCodeRe}?${variantCodeRe}?$`); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The regular expression isn't implementing the spec, I'd recommend to just parse a script tag as a {4}, and make it optional. Like
/^([a-z]{2,3})(?:[-([a-z]{4}|\*))?/i
I'm not sure if we should allow for trailing junk?
Also, we should probably special-case -mac
as something that violates well-formed locale tags, as long as we still have it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm following BCP47 in language-script-region-variant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll update the regexp, but I think it makes sense to keep the 4 pieces. I may accept and cut off any extension keys (-t-
, -x-
and -u-
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My point was really about
const scriptCodeRe = '(?:-([a-zA-Z]{2,4}|\\*))';
which should be exactly 4,
const scriptCodeRe = '(?:-([a-z]{4}|\\*))';
Also, I'd remove (did here) the Upper vs lower case regex ranges, and create the regex with a flag='i'.
fluent/src/langneg.js
Outdated
} | ||
|
||
return [language, script, territory, variant]; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... this code would become easier, if the regular expression was more standards-conformant, I think.
Also, we should probably normalize casing here, so that we can compare strings later on. 'cause language tags are compared case-independently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I'll normalize case, either by going all lower-case or running it via Canonicalize.
what do you mean by easier?
fluent/src/langneg.js
Outdated
function filterMatches(requestedLocales, availableLocales) { | ||
const supportedLocales = []; | ||
|
||
outer: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is probably up to @stasm, but GOTOs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can improve the exact code later if we agree on how it's supposed to work and what kind of matches produce.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a strong preference about labelled for-loops.
fluent/src/langneg.js
Outdated
} | ||
|
||
for (const availableLocale of availableLocales) { | ||
if (compareLocales(requestedLocale, availableLocale)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This parses the language tags over and over in a loop, I'd refactor this code to not do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree. I'll improve the exact code once we agree on what the code is supposed to do :)
fluent/src/langneg.js
Outdated
for (const requestedLocale of requestedLocales) { | ||
|
||
for (const availableLocale of availableLocales) { | ||
if (requestedLocale === availableLocale) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... needs to be independent of casing.
fluent/src/langneg.js
Outdated
|
||
export function negotiateLanguages(requestedLocales, | ||
availableLocales, | ||
defaultLocale) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
defaultLocale isn't covered in tests, otherwise the bug below would have been caught.
fluent/src/langneg.js
Outdated
const supportedLocales = filterMatches(requestedLocales, availableLocales); | ||
if (supportedLocales.includes(defaultLocale)) { | ||
supportedLocales.push(defaultLocale); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... if undefined is in list, add it to the list.
You want that the other way around.
fluent/src/langneg.js
Outdated
return supportedLocales; | ||
} | ||
|
||
export function negotiateLanguages(requestedLocales, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the only public method of this module, and the only without documentation :-/
@stasm, what's the intent for the generated documentation? It seems to be done, but I haven't found it having a prominent role yet, so maybe it doesn't matter. Your call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for bringing this up. For now, each package in the fluent.js
repository has its docstrings extracted using the documentation.js
package. The extracted markdown files (one per package) land in the top-loevel docs/
directory.
It might be better to move the docs into each package's directory as $(PACKAGE)/docs/api.md
, in case we want to add more hand-written docs later on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be better to move the docs into each package's directory as $(PACKAGE)/docs/api.md, in case we want to add more hand-written docs later on.
Implemented in 6c46100.
Thank you Like! That's precisely the level of detail I want to iron things out on right now. Let's get to an algorithm we both like and then I'll add detailed tests for each function and step. I'll work on applying your feedback today. |
*Pike (autocorrect snafu) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've spent quite some time trying to find out what data backs up likelySubtags in CLDR, and much of it is thin air.
I think it's OK to use the data to order the matching languages in the absence of anything better, but really, we shouldn't exclude a match for anything in that data set.
In particular for cases where the requested language don't specify a script and we have multiple scripts, we should just return all of the matching languages.
Also, I think that when we lack the likelySubtags data, all that happens is that the returned match data might not be deterministically ordered. I think that's a fair outcome, and we don't need to write any kind of funky code to avoid that.
On the topic of modes, how is that supposed to work in practice? As in, I npm install fluent
, and then?
fluent/src/langneg.js
Outdated
* It can also accept a range `*` character on any position. | ||
*/ | ||
const localeRe = new RegExp( | ||
`^${languageCodeRe}${scriptCodeRe}?${regionCodeRe}?${variantCodeRe}?$`); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My point was really about
const scriptCodeRe = '(?:-([a-zA-Z]{2,4}|\\*))';
which should be exactly 4,
const scriptCodeRe = '(?:-([a-z]{4}|\\*))';
Also, I'd remove (did here) the Upper vs lower case regex ranges, and create the regex with a flag='i'.
fluent/src/langneg.js
Outdated
if (loc[2] === '*') { | ||
loc[2] = loc[0].toUpperCase(); | ||
return loc.join('-'); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find the line about "a language tag is a script tag with bad casing" pretty revolting, tbh. Let's not set precedent for that, people look at mozilla for how to handle standards, and this is just not like it.
I also start to realize that I actually disagree with the way you use of likelySubtags. In that context, I think most of the unfiltered likelySubtags data goes away. I'll take that top-level.
fluent/test/langneg_test.js
Outdated
|
||
assert.deepEqual( | ||
negotiateLanguages(['sr'], ['sr-Cyrl', 'sr-Latn']), | ||
['sr-Cyrl']); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I spent half a day trying to find out why cldr says these things about Serbian, and to the best of my findings, this is just made up. Notably, sr-Latn-RS is given a writingPopulation of 5%, because
<reference type="R1017">For languages not customarily written, the writing population is artificially set to 5% in the absence of better information.</reference>
I'd like the language negotiation module to be a separate package. You would then do:
And: import negotiateLanguages from 'fluent-langneg';
negotiateLanguages(requested, available); Any additional data can be published in a separate package or as a submodule: import negotiateLanguages from 'fluent-langneg';
import withLikelySubtags from 'fluent-langneg/subtags';
withLikelySubtags(negotiateLanguages)(requested, available); For this to work, const negotiator = LanguageNegotiation({
available: […],
default: […],
likelySubtags: […]
}); And then you |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a couple of small changes to align this with the rest of the repo structure.
If you're on the latest master
, you can run make html
to have jsdoc extract the API docs into html/fluent-langneg
in the root of the repo, and verify that everything looks OK.
fluent-langneg/.gitignore
Outdated
@@ -0,0 +1,2 @@ | |||
fluent-intl-polyfill.js |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You'll need to change this to fluent-langneg.js
.
fluent-langneg/README.md
Outdated
# fluent-langneg | ||
|
||
`fluent-langneg` is an API for language negotiation API that is recommended | ||
by the Fluent Team for all language selection and matching. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I started including a one-sentence summary of what Fluent is at the beginning of all READMEs. Here's how I'd phrase it:
`fluent-langneg` is an API for negotiating languages. It's part of
Project Fluent, a localization framework designed to unleash
the expressive power of the natural language.
fluent-langneg/README.md
Outdated
|
||
## How to use | ||
|
||
Simply `import` or `require` the package somewhere in your code. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove the line? It was written for fluent-intl-polyfill
which doesn't expose any importable API.
fluent-langneg/README.md
Outdated
Simply `import` or `require` the package somewhere in your code. | ||
|
||
```javascript | ||
import { negotiateLanguages } from 'fluent-langneg'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With fluent-langneg
being so purpose-specific, I think we could make negotiateLanguages
the default export:
import negotiateLanguages from 'fluent-langneg';
fluent-langneg/docs/api.md
Outdated
@@ -0,0 +1 @@ | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove the whole docs
directory.
fluent-langneg/fluent-langneg.js
Outdated
@@ -0,0 +1,267 @@ | |||
(function (global, factory) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove this file.
fluent-langneg/src/index.js
Outdated
@@ -0,0 +1,282 @@ | |||
/* | |||
* @module |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I learned last night that the @module
tag takes the module's name. What we probably want here is:
@module fluent-langneg
@overview
`fluent-intl-polyfill` provides...
fluent-langneg/src/index.js
Outdated
return fallback; | ||
} | ||
|
||
export function negotiateLanguages(requestedLocales, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd consider making this export default
.
fluent-langneg/src/index.js
Outdated
* This means that if `ab` locale is present in the available locales, | ||
* it is treated as matching `ab-*-*-*`. | ||
*/ | ||
function compareLocales(loc1, loc2) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps rename this function to localesEqual
to clearly indicate that it returns a bool
?
fluent-langneg/test/langneg_test.js
Outdated
@@ -0,0 +1,144 @@ | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are there two blank lines here and in test/setup.js
? Do you know if the use strict
pragma is still required in node? Can we skip it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can skip it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How can we make the additional JSON data available in browsers? I think a safer approach would be to store the data in a JS file, e.g. data/subtags.js
which could be imported if needed.
Or, it looks like there's https://github.com/rollup/rollup-plugin-json but I haven't tried it.
fluent-langneg/README.md
Outdated
``` | ||
|
||
The API reference is available at | ||
http://projectfluent.io/fluent.js/fluent-syntax. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fluent-langneg/README.md
Outdated
```javascript | ||
import negotiateLanguages from 'fluent-langneg'; | ||
|
||
const supported = negotiateLanguages(requested, available, default); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
default
is a reserved keyword in JS, so I'd suggest changing this to:
const supportedLanguages = negotiateLanguages(
requestedLanguages, availableLanguages, defaultLanguage
);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, are these languages or locales?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
those are locales. We will negotiate between the language part of those locales so we will negotiate languages, but we pass locales and the returned ids are locales as well. I know it's complex :(
fluent-langneg/src/index.js
Outdated
|
||
export default function negotiateLanguages(requestedLocales, | ||
availableLocales, | ||
options = {}) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Weird indentation here. You can just indent one level like so:
export default function negotiateLanguages(
requestedLocales,
availableLocales,
options = {}
) {
fluent-langneg/src/index.js
Outdated
availableLocales, | ||
options = {}) { | ||
|
||
const defaultLocale = GetOption(options, 'defaultLocale', 'string'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't defaultLocale
common enough that it should get its own positional argument?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it is. Since the algorithm works perfectly fine without it, I'd prefer to keep only the necessary arguments as positional - requestedLocales and availableLocales.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps I'm missing something but I don't see it in tests. What happens when there are no common languages between requested
and available
and no default has been specified?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we return an empty list as per language matching spec in rfc4647. Added to tests.
fluent-langneg/src/index.js
Outdated
options, 'likelySubtags', 'object', undefined, {}); | ||
|
||
const supportedLocales = | ||
filterMatches(requestedLocales, availableLocales, likelySubtags); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can you change this to:
const supportedLocales = filterMatches(
requestedLocales, availableLocales, likelySubtags
);
@@ -0,0 +1,18 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is so small that I think we should include this in the main package. Can rollup bundle JSON files? If not, I'd just put this in src/subtags.js
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know how small it will be yet. I'll need to generate the data for the min
variant, and so far it's just a stub. I agree we'll want to bundle it, but I'd like to make it a conditional bundling i.e. when you're building fluent-langneg I'd like you to be able to choose if you want with min
or full likelySubtags.
@@ -0,0 +1,266 @@ | |||
/* | |||
* @module fluent-langneg |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add the @overview
tag here.
@Pike , @stasm - I think this is ready for another round of reviews. There are two three known things I'd like to add once we agree on the logic:
But first, I'd like to iron out the first version of the algorithm to be stable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great. r+.
I'd suggest using default exports more, especially with well-scoped modules. From https://esdiscuss.org/topic/moduleimport#content-0: ES6 favors the single/default export style.
ability to build fluent-langneg with either 'min' or full likelySubtags
My recommendation right now would be to ship 0.0.1 with 10-20 most likely subtags hardcoded in the JS source and with a dummy algorithmic way of guessing more. It's okay if it makes mistakes at this stage.
I'd also put the 50KB JSON in the data/
folder so that anyone can use it if they wish. They'll need to figure out how to load it themselves, but at least it's there.
I'd also prefer if the JSON file name was all lowercase, as this is the file naming scheme we've been using so far. So perhaps: data/likely-subtags.json
. I see that the camelcase name comes directly from the CLDR repo, so keeping it that way is okay, too.
a script to generate the 'likelySubtagsMin' based on our own threshold.
Let's move this to a future milestone, together with finding out how to bundle JSON files using our build system.
fluent-langneg/package.json
Outdated
"engine": { | ||
"node": ">=6" | ||
}, | ||
"eslintConfig": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feel free to remove this. I followed your suggestion and used a global .eslintrc.json
file in the root of the repo (0f1bc1f).
|
||
if (strategy === 'lookup') { | ||
if (supportedLocales.length === 0) { | ||
supportedLocales.push(defaultLocale); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if the defaultLocale
is undefined? Are you okay pushing it to supportedLocales
too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's undefined it'll throw earlier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I didn't notice that. Thanks.
fluent-langneg/src/locale.js
Outdated
* It also allows skipping the script section of the id, so `en-US` is properly | ||
* parsed as `en-*-US-*`. | ||
*/ | ||
export function parseLocale(locale, range = false) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
export default
?
fluent-langneg/src/matches.js
Outdated
* ['en-AU'] * ['en-US'] = ['en-US'] | ||
* ['sr-RU'] * ['sr-Latn-RO'] = ['sr-Latn-RO'] // sr-RU -> sr-Latn-RU | ||
*/ | ||
export function filterMatches( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
export default
?
fluent-langneg/src/matches.js
Outdated
) { | ||
const supportedLocales = new Set(); | ||
|
||
const availableLocalesCache = {}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use Map
? That way you won't have to check with hasOwnProperty
in getOrParseLocale
.
The challenge here is how we select the 10-20? Is 'it' more important than 'in'?
@Pike would you be ok with this? Alternatively, I could add a curated by us list of locals that are ok to expand: const localesThatHasDefaultThatMatchRegion = ['it', 'fr', 'ru', 'cs', 'pl', ...]; instead? |
Would you prefer that over external dependency or just a link to cldr-json npm package? |
My hope was to keep a direct link to CLDR and they use camel case.
The reason I didn't do that is so that I can expose more internal methods for tests. If each file can only expose one function, I can't expose others for testing.
SGTM. |
Ah, if that's an option, then let's go for it: let's link the CLDR repo in the README.
You sure can :)
imports the default export as |
I want to ❤️️ es6 right now. |
fluent-langneg/CHANGELOG.md
Outdated
|
||
## fluent-langneg 0.0.1 | ||
|
||
- (05d2487c) fluent-langneg 0.0.1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is my bad: I shouldn't have included the SHAs in other changelogs. I've since stopped doing it. Changelogs should describe changes between named versions and they don't have to correspond to commits. Let's remove the SHA of the commit here and squash everything into a single commit.
fluent-langneg/README.md
Outdated
|
||
The API supports three negotiation strategies: | ||
|
||
* filtering (defualt) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo in default
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, please either use a ###
header or intend the text below by 4 space to make it part of the bullet point.
fluent-langneg/README.md
Outdated
user can load into `fluent-langneg` to replace the minimal version. | ||
|
||
```javascript | ||
let data = require('./data/likelySubtags.json'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're not shipping this file with the rest of the package (which is good) so maybe link to the CLDR repo and put the following here:
const data = require('cldr-core/supplemental/likelySubtags.json');
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've got some comments more on details, and some comments on code, and some comments on defaults.
Overall, I wonder how to document this right, too. We have three lengthy blurbs, one in README, and one index.js and one in matches.js. Not sure why which is where, and it seems that the generated docs are rather confused, too? I ran make html
, and the important comments don't seem to make it into the generated docs?
|
||
The API supports three negotiation strategies: | ||
|
||
### filtering (default) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't the general use-case within fluent be to use matching
instead of filtering
? In that case, I'd make that default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's up to @stasm I guess.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implemented it in C++ as the default, I'll update js version once I'm done with C++.
['filtering', 'matching', 'lookup'], 'filtering'); | ||
|
||
if (strategy === 'lookup' && defaultLocale === undefined) { | ||
throw new Error('defaultLocale cannot be undefined for strategy `lookup`'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This condition should be in the docs.
function variantRangeFor(locale) { | ||
locale.variant = '*'; | ||
return locale; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
regionRangeFor
and variantRangeFor
sound like they'd create new Locale objects, IMHO. Not sure if it's better to actually do a copy, or make this a setter. I could even go as far as to not have this be a function. Or make it a setter on the object?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stasm - what do you think? Is it worth creating a new object over modifying the existing one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm working on the C++ port right now for Gecko. Once I'm done with it, I'll update the JS implementation to match it. It does have this change as well.
// Attempt to match against the available range | ||
// This turns `en` into `en-*-*-*` and `en-US` into `en-*-US-*` | ||
// Example: ['en-US'] * ['en'] = ['en'] | ||
for (const availableLocale of availableLocales) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It took me quite a while to untangle this code, mostly because the patterns in the code aren't factored in.
Is there a way to make this a second inner loop, between this and the inner
if (matches) {
add_and_bail_if_not_matching
}
? Might make the code shorter, and easier to digest, I think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would I bail out of the outer
if I factored it out to a separate function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking of just loops, not functions.
You could also do functions, and instead of continue :outer, do early returns?
'zh': 'zh-hans-cn', | ||
'zh-gb': 'zh-hant-gb', | ||
'zh-us': 'zh-hant-us', | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd be good to add to the comment on how you got to this particular sublist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm ashamed to admit that I went through likelySubtags and selected locales that I consider to be more significant and likely to have someone specify the minimized version (like ab
in requested instead of ab-CD
) where there are more than one region/script.
Not sure how to document it and I believe we should eventually do this in a more formal and algorithmic way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd go for a comment saying that then. You could even file an issue for the follow-up and link to that from the comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do!
describe('Basic Language Negotiation without likelySubtags', () => { | ||
const nl = negotiateLanguages; | ||
|
||
it('exact match', () => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should these it() statements read as full sentences starting with "it ..."? mocha docs say yes, not sure if there's a special style among the fluent repos.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I filed #14.
I'd like to start discussing our intended language negotiation strategy (I'd also like to share it between Fluent and Gecko).
I looked at rfc4647 and some of the ICU implementations, but I don't think that what they suggest (matching vs filtering) applies to us perfectly.
Thus, I started working on my own algorithm and the JS implementation is imho a good starting point to discuss it.
The patch here is not finalized but it uses two data-driven algorithms:
The algorithm currently has four parts. It tries, in order:
1.0
0.9
0.8
0.6
It stops on the best match for the candidate.
I'd like to get feedback and any suggestions at this point from @Pike and @stasm.