Language Negotiation #5

zbraniecki · 2017-02-14T02:34:54Z

I'd like to start discussing our intended language negotiation strategy (I'd also like to share it between Fluent and Gecko).

I looked at rfc4647 and some of the ICU implementations, but I don't think that what they suggest (matching vs filtering) applies to us perfectly.

Thus, I started working on my own algorithm and the JS implementation is imho a good starting point to discuss it.

The patch here is not finalized but it uses two data-driven algorithms:

getParent - which in simplest form cuts off last part ("en-US" -> "en") but eventually will have some exceptions to prevent things like "zh-Hant" -> "zh", and hopefully will land in ECMA402 (see Intl.getParentLocales tc39/ecma402#87)
getBestSubtags does not have a good equivalent in ICU so will likely stay curated by us. It allows us to easily go up from more generic to more detailed subtags

The algorithm currently has four parts. It tries, in order:

exact match at weight 1.0
likely subtags (tests "en-US" for "en") at weight 0.9
parents (tests "en" for "en-US") at weight 0.8
siblings (tests "en-GB" for "en-US") at weight 0.6

It stops on the best match for the candidate.

I'd like to get feedback and any suggestions at this point from @Pike and @stasm.

zbraniecki · 2017-02-14T18:12:01Z

RFC4647 - https://tools.ietf.org/html/rfc4647

Pike · 2017-02-15T11:22:02Z

Looking at https://tools.ietf.org/html/rfc4647#section-3.3.2, that's mostly what we want, aside from the problem that, AFAICT, langneg(["de-DE"], ["de"]) === [], and we want that to be ["de"], right?

I think the right way to address that problem is to treat the available languages as language ranges. Which is what we intend to do: fr is used for French spoken across the globe. Not French spoken nowhere. I think that getBestSubtags is the wrong concept.

For langneg(["en"], ["en-US", "en-GB"]) that does leave us with some ambiguity, and I think that's accurate. On our UX side, we should discourage the situation. We can build in a deterministic ordering of sibling language tags, which would do the thing you built getBestSubtags for, in this use case?

For siblings, conceptually, we ammend the requested language range with a more generic range, if applicable? ["de-DE"] becomes ["de-DE", "de-*"] ? I think that's sound if we have the "don't drop script ranges for locales where that matters". Are there more constraints to that?

We have two choices here, ["de-DE", "fr-CA"] can either become ["de-DE", "de-*", "fr-CA", "fr-*"] or ["de-DE", "fr-CA", "de-*", "fr-*"]. I think it's the first we want?

The numerical weights are hard to get right, I think. In particular in the code right now, the weights mix as soon as you have 10 or so requested languages, as you mix order within the requested language array and weight. I'd much rather just use different output lists, and concat and unique them afterwards.

Last but not least, the idea that this algorithm always returns an non-empty result is OK, but not something that can be implemented generically. It's a project setup question on which localization to fall back to in the end. Take the BN, maybe? Site developed in Polish, with a German localization? You'd need to configure your l10n setup to fall back to Polish in this case.

I think the correct way to implement this is to post-fix the requested languages with the site default language, and only then to call into the language negotiation code. By doing that, you guarantee a non-empty result, but that's not part of the algorithm, but part of the setup.

zbraniecki · 2017-02-17T09:46:52Z

Thanks Pike!

I started implementing it and the only problem I encountered so far is the one related to:

langneg(["en"], ["en-US", "en-GB"])

Since the second list is not ordered, there's no way for me to predict which one will it pick.

ICU introduces addLikelySubtags which I was intended to use - http://www.icu-project.org/apiref/icu4c/uloc_8h.html#a0cb2dcd65f745e7a966a729395499770

Do you think it's ok to use that?

I may also have to expand ranges with scripts (so, "en" is not becoming "en-", but "en-Latn-" in order to handle sr, zh etc.).

zbraniecki · 2017-02-17T09:54:43Z

I think I'm implementing this: http://www.unicode.org/reports/tr35/#LanguageMatching
so I'll have weights there.

Is that ok, @Pike ?

Pike · 2017-02-17T15:16:45Z

The whole weights stuff in tr35 is a bit bewildering, their math is close to using complex numbers?

We'd run the match algorithm as a filter, though, right?

Also, in the gecko version, we can use addLikelySubtags, but we'd still need languageInfo.xml data? And for a vanilla js impl, we'd need that and likelySubtags.xml, too?

zbraniecki · 2017-02-17T18:47:22Z

We'd run the match algorithm as a filter, though, right?

yes

but we'd still need languageInfo.xml data? And for a vanilla js impl, we'd need that and likelySubtags.xml, too?

For JS, I'd like to limit the amount of data we carry. I'm thinking about trying to filter the data based on all-locales.

Pike · 2017-02-17T19:17:45Z

Am 17/02/2017 um 19:47 schrieb Zibi Braniecki:

but we'd still need languageInfo.xml data? And for a vanilla js impl, we'd need that and likelySubtags.xml, too? For JS, I'd like to limit the amount of data we carry. I'm thinking about trying to filter the data based on all-locales.

In the context of fluent, reducing the dataset in general to the needs of mozilla doesn't seem to be the right approach. I'm not sure if there's a way to drill down on the dataset for a general-use low-level library.

zbraniecki · 2017-02-19T10:12:53Z

@Pike - I updated the patch to mostly follow the RFC4647, with a few bits from LDML likelySubtags and languageMatching.

I think the code is actually pretty good for how small it is (less than 100 lines of code). It doesn't do everything that I'd like it to do (including ability for each available locale to provide its own fallback chain), but I think it's a good start.

Let me know what you think.

Pike

I'm a bit unsure on what to focus on in this review. I've ended up being weirdly close to the code at big-picture things, so I nag about details where the big picture is involved.

With the bugs I see, I don't yet see the outcome of the algorithm as it'd impact the big picture. Seems the English translation of the German saying is "I can't see the wood for the trees".

Hope this is helpful regardless.

Pike · 2017-02-19T17:29:15Z

fluent/src/langneg.js

+  if (loc[2] === '*') {
+    loc[2] = loc[0].toUpperCase();
+    return loc.join('-');
+  }


We should make this fully data-driven. Assuming that a language tag in upper case was a region code is just wrong. Let's not introduce ca-CA or lij-LIJ as likely subtags?

So, here's the thing. LikelySubtags is a 50kb uncompressed JSON table [0].
If we have it (Gecko does), or in case where the user will want to pay the price (give me fluent with likelySubtags), then we don't need a dummy code like that.

But when the user doesn't want to pay the price, and 50kb is a pretty hefty price to pay on the client side, lack of this dummy code will make us not be able to catch any kind of 'ab-AB' extrapolation from 'ab'.
Since it's unlikely that anyone will provide 'ca-CA' or 'lij-LIJ' in available locales, there's no consequence of testing against it. But in the simplest form, they'll at least match all 'fr' to 'fr-FR', and 'it' to 'it-IT' and 'de' to 'de-DE' etc.

My idea was to allow for three "modes":

Full CLDR likelySubtags (50kb price for excellent matching)

Limited CLDR likelySubtags (say, 10-20 most common ones) + dummy

Just dummy

I can see us curating this limited CLDR likelySubtags just to catch the most common ones like 'en' into 'en-US', and 'fr' into 'fr-FR' etc.

I believe that if we remove the dummy all together, we'll make the algorithm not work for most common cases and basically require the likelySubtags.

What do you think?

[0] https://github.com/unicode-cldr/cldr-core/blob/master/supplemental/likelySubtags.json

I find the line about "a language tag is a script tag with bad casing" pretty revolting, tbh. Let's not set precedent for that, people look at mozilla for how to handle standards, and this is just not like it.

I also start to realize that I actually disagree with the way you use of likelySubtags. In that context, I think most of the unfiltered likelySubtags data goes away. I'll take that top-level.

Pike · 2017-02-19T17:30:36Z

fluent/src/langneg.js

+ * Replaces the region position with a range to allow for matches
+ * with the same language/script but from different region.
+ */
+function getRegionRange(locale) {


I'd name this something else, maybe regionRangeFor? getRegionRange sounds like a getter of information, not a transformation of the passed in data.

Pike · 2017-02-19T17:35:24Z

fluent/src/langneg.js

+ * It can also accept a range `*` character on any position.
+ */
+const localeRe = new RegExp(
+`^${languageCodeRe}${scriptCodeRe}?${regionCodeRe}?${variantCodeRe}?$`);


The regular expression isn't implementing the spec, I'd recommend to just parse a script tag as a {4}, and make it optional. Like

/^([a-z]{2,3})(?:[-([a-z]{4}|\*))?/i

I'm not sure if we should allow for trailing junk?

Also, we should probably special-case -mac as something that violates well-formed locale tags, as long as we still have it.

I'm following BCP47 in language-script-region-variant.

I'll update the regexp, but I think it makes sense to keep the 4 pieces. I may accept and cut off any extension keys (-t-, -x- and -u-).

My point was really about

const scriptCodeRe = '(?:-([a-zA-Z]{2,4}|\\*))';

which should be exactly 4,

const scriptCodeRe = '(?:-([a-z]{4}|\\*))';

Also, I'd remove (did here) the Upper vs lower case regex ranges, and create the regex with a flag='i'.

Pike · 2017-02-19T17:36:38Z

fluent/src/langneg.js

+  }
+
+  return [language, script, territory, variant];
+}


... this code would become easier, if the regular expression was more standards-conformant, I think.

Also, we should probably normalize casing here, so that we can compare strings later on. 'cause language tags are compared case-independently.

Yeah, I'll normalize case, either by going all lower-case or running it via Canonicalize.
what do you mean by easier?

Pike · 2017-02-19T17:37:28Z

fluent/src/langneg.js

+function filterMatches(requestedLocales, availableLocales) {
+  const supportedLocales = [];
+
+  outer:


This is probably up to @stasm, but GOTOs?

we can improve the exact code later if we agree on how it's supposed to work and what kind of matches produce.

I don't have a strong preference about labelled for-loops.

Pike · 2017-02-19T17:38:09Z

fluent/src/langneg.js

+    }
+
+    for (const availableLocale of availableLocales) {
+      if (compareLocales(requestedLocale, availableLocale)) {


This parses the language tags over and over in a loop, I'd refactor this code to not do that.

agree. I'll improve the exact code once we agree on what the code is supposed to do :)

Pike · 2017-02-19T17:38:28Z

fluent/src/langneg.js

+  for (const requestedLocale of requestedLocales) {
+
+    for (const availableLocale of availableLocales) {
+      if (requestedLocale === availableLocale) {


... needs to be independent of casing.

Pike · 2017-02-19T17:38:57Z

fluent/src/langneg.js

+
+export function negotiateLanguages(requestedLocales,
+                                   availableLocales,
+                                   defaultLocale) {


defaultLocale isn't covered in tests, otherwise the bug below would have been caught.

Pike · 2017-02-19T17:39:59Z

fluent/src/langneg.js

+  const supportedLocales = filterMatches(requestedLocales, availableLocales);
+  if (supportedLocales.includes(defaultLocale)) {
+    supportedLocales.push(defaultLocale);
+  }


... if undefined is in list, add it to the list.

You want that the other way around.

Pike · 2017-02-19T17:41:37Z

fluent/src/langneg.js

+  return supportedLocales;
+}
+
+export function negotiateLanguages(requestedLocales,


This is the only public method of this module, and the only without documentation :-/

@stasm, what's the intent for the generated documentation? It seems to be done, but I haven't found it having a prominent role yet, so maybe it doesn't matter. Your call.

Thanks for bringing this up. For now, each package in the fluent.js repository has its docstrings extracted using the documentation.js package. The extracted markdown files (one per package) land in the top-loevel docs/ directory.

It might be better to move the docs into each package's directory as $(PACKAGE)/docs/api.md, in case we want to add more hand-written docs later on.

It might be better to move the docs into each package's directory as $(PACKAGE)/docs/api.md, in case we want to add more hand-written docs later on.

Implemented in 6c46100.

zbraniecki · 2017-02-19T20:27:58Z

Thank you Like! That's precisely the level of detail I want to iron things out on right now.

Let's get to an algorithm we both like and then I'll add detailed tests for each function and step.

I'll work on applying your feedback today.

zbraniecki · 2017-02-19T20:28:41Z

*Pike (autocorrect snafu)

Pike

I've spent quite some time trying to find out what data backs up likelySubtags in CLDR, and much of it is thin air.

I think it's OK to use the data to order the matching languages in the absence of anything better, but really, we shouldn't exclude a match for anything in that data set.

In particular for cases where the requested language don't specify a script and we have multiple scripts, we should just return all of the matching languages.

Also, I think that when we lack the likelySubtags data, all that happens is that the returned match data might not be deterministically ordered. I think that's a fair outcome, and we don't need to write any kind of funky code to avoid that.

On the topic of modes, how is that supposed to work in practice? As in, I npm install fluent, and then?

Pike · 2017-02-20T12:29:18Z

fluent/src/langneg.js

+ * It can also accept a range `*` character on any position.
+ */
+const localeRe = new RegExp(
+`^${languageCodeRe}${scriptCodeRe}?${regionCodeRe}?${variantCodeRe}?$`);


My point was really about

const scriptCodeRe = '(?:-([a-zA-Z]{2,4}|\\*))';

which should be exactly 4,

const scriptCodeRe = '(?:-([a-z]{4}|\\*))';

Also, I'd remove (did here) the Upper vs lower case regex ranges, and create the regex with a flag='i'.

Pike · 2017-02-20T14:10:22Z

fluent/src/langneg.js

+  if (loc[2] === '*') {
+    loc[2] = loc[0].toUpperCase();
+    return loc.join('-');
+  }


I find the line about "a language tag is a script tag with bad casing" pretty revolting, tbh. Let's not set precedent for that, people look at mozilla for how to handle standards, and this is just not like it.

I also start to realize that I actually disagree with the way you use of likelySubtags. In that context, I think most of the unfiltered likelySubtags data goes away. I'll take that top-level.

Pike · 2017-02-20T14:14:13Z

fluent/test/langneg_test.js

+
+    assert.deepEqual(
+      negotiateLanguages(['sr'], ['sr-Cyrl', 'sr-Latn']),
+      ['sr-Cyrl']);


I spent half a day trying to find out why cldr says these things about Serbian, and to the best of my findings, this is just made up. Notably, sr-Latn-RS is given a writingPopulation of 5%, because

<reference type="R1017">For languages not customarily written, the writing population is artificially set to 5% in the absence of better information.</reference>

stasm · 2017-02-20T15:29:39Z

On the topic of modes, how is that supposed to work in practice? As in, I npm install fluent, and then?

I'd like the language negotiation module to be a separate package. You would then do:

npm install fluent-langneg

And:

import negotiateLanguages from 'fluent-langneg';
negotiateLanguages(requested, available);

Any additional data can be published in a separate package or as a submodule:

import negotiateLanguages from 'fluent-langneg';
import withLikelySubtags from 'fluent-langneg/subtags';
withLikelySubtags(negotiateLanguages)(requested, available);

For this to work, negotiateLanguages should accept likelySubtags as a parameter. Perhaps a better approach overall would be to mimic other Intl objects:

const negotiator = LanguageNegotiation({
 available: […],
 default: […],
 likelySubtags: […]
});

And then you import likelySubtags from 'fluent-langneg/subtags';

stasm

Just a couple of small changes to align this with the rest of the repo structure.

If you're on the latest master, you can run make html to have jsdoc extract the API docs into html/fluent-langneg in the root of the repo, and verify that everything looks OK.

stasm · 2017-02-22T09:47:17Z

fluent-langneg/.gitignore

@@ -0,0 +1,2 @@
+fluent-intl-polyfill.js


You'll need to change this to fluent-langneg.js.

stasm · 2017-02-22T09:50:02Z

fluent-langneg/README.md

+# fluent-langneg
+
+`fluent-langneg` is an API for language negotiation API that is recommended
+by the Fluent Team for all language selection and matching.


I started including a one-sentence summary of what Fluent is at the beginning of all READMEs. Here's how I'd phrase it:

`fluent-langneg` is an API for negotiating languages. It's part of Project Fluent, a localization framework designed to unleash the expressive power of the natural language.

stasm · 2017-02-22T09:50:59Z

fluent-langneg/README.md

+
+## How to use
+
+Simply `import` or `require` the package somewhere in your code.


Remove the line? It was written for fluent-intl-polyfill which doesn't expose any importable API.

stasm · 2017-02-22T09:52:03Z

fluent-langneg/README.md

+Simply `import` or `require` the package somewhere in your code.
+
+```javascript
+import { negotiateLanguages } from 'fluent-langneg';


With fluent-langneg being so purpose-specific, I think we could make negotiateLanguages the default export:

import negotiateLanguages from 'fluent-langneg';

stasm · 2017-02-22T09:52:28Z

fluent-langneg/docs/api.md

@@ -0,0 +1 @@
+


Please remove the whole docs directory.

stasm · 2017-02-22T09:54:18Z

fluent-langneg/fluent-langneg.js

@@ -0,0 +1,267 @@
+(function (global, factory) {


Please remove this file.

stasm · 2017-02-22T09:57:10Z

fluent-langneg/src/index.js

@@ -0,0 +1,282 @@
+/*
+ * @module


I learned last night that the @module tag takes the module's name. What we probably want here is:

@module fluent-langneg @overview `fluent-intl-polyfill` provides...

stasm · 2017-02-22T09:57:31Z

fluent-langneg/src/index.js

+  return fallback;
+}
+
+export function negotiateLanguages(requestedLocales,


I'd consider making this export default.

stasm · 2017-02-22T09:58:14Z

fluent-langneg/src/index.js

+ * This means that if `ab` locale is present in the available locales,
+ * it is treated as matching `ab-*-*-*`.
+ */
+function compareLocales(loc1, loc2) {


Perhaps rename this function to localesEqual to clearly indicate that it returns a bool?

stasm · 2017-02-22T10:01:49Z

fluent-langneg/test/langneg_test.js

@@ -0,0 +1,144 @@
+


Why are there two blank lines here and in test/setup.js? Do you know if the use strict pragma is still required in node? Can we skip it?

I think we can skip it.

stasm

How can we make the additional JSON data available in browsers? I think a safer approach would be to store the data in a JS file, e.g. data/subtags.js which could be imported if needed.

Or, it looks like there's https://github.com/rollup/rollup-plugin-json but I haven't tried it.

stasm · 2017-02-23T00:40:24Z

fluent-langneg/README.md

+```
+
+The API reference is available at
+http://projectfluent.io/fluent.js/fluent-syntax.


http://projectfluent.io/fluent.js/fluent-langneg :)

stasm · 2017-02-23T00:43:17Z

fluent-langneg/README.md

+```javascript
+import negotiateLanguages from 'fluent-langneg';
+
+const supported = negotiateLanguages(requested, available, default);


default is a reserved keyword in JS, so I'd suggest changing this to:

const supportedLanguages = negotiateLanguages( requestedLanguages, availableLanguages, defaultLanguage );

Also, are these languages or locales?

those are locales. We will negotiate between the language part of those locales so we will negotiate languages, but we pass locales and the returned ids are locales as well. I know it's complex :(

stasm · 2017-02-23T00:46:00Z

fluent-langneg/src/index.js

+
+export default function negotiateLanguages(requestedLocales,
+                                   availableLocales,
+                                   options = {}) {


Weird indentation here. You can just indent one level like so:

export default function negotiateLanguages( requestedLocales, availableLocales, options = {} ) {

stasm · 2017-02-23T00:46:44Z

fluent-langneg/src/index.js

+                                   availableLocales,
+                                   options = {}) {
+
+  const defaultLocale = GetOption(options, 'defaultLocale', 'string');


Isn't defaultLocale common enough that it should get its own positional argument?

I don't think it is. Since the algorithm works perfectly fine without it, I'd prefer to keep only the necessary arguments as positional - requestedLocales and availableLocales.

Perhaps I'm missing something but I don't see it in tests. What happens when there are no common languages between requested and available and no default has been specified?

we return an empty list as per language matching spec in rfc4647. Added to tests.

stasm · 2017-02-23T00:47:54Z

fluent-langneg/src/index.js

+    options, 'likelySubtags', 'object', undefined, {});
+
+  const supportedLocales =
+    filterMatches(requestedLocales, availableLocales, likelySubtags);


nit: can you change this to:

const supportedLocales = filterMatches( requestedLocales, availableLocales, likelySubtags );

stasm · 2017-02-23T00:54:03Z

fluent-langneg/data/likelySubtags-min.json

@@ -0,0 +1,18 @@
+{


This is so small that I think we should include this in the main package. Can rollup bundle JSON files? If not, I'd just put this in src/subtags.js.

I don't know how small it will be yet. I'll need to generate the data for the min variant, and so far it's just a stub. I agree we'll want to bundle it, but I'd like to make it a conditional bundling i.e. when you're building fluent-langneg I'd like you to be able to choose if you want with min or full likelySubtags.

stasm · 2017-02-23T00:54:42Z

fluent-langneg/src/index.js

@@ -0,0 +1,266 @@
+/*
+ * @module fluent-langneg


Please add the @overview tag here.

zbraniecki · 2017-02-23T07:23:35Z

@Pike , @stasm - I think this is ready for another round of reviews.

There are two three known things I'd like to add once we agree on the logic:

more tests
ability to build fluent-langneg with either 'min' or full likelySubtags
a script to generate the 'likelySubtagsMin' based on our own threshold.

But first, I'd like to iron out the first version of the algorithm to be stable.

stasm

This looks great. r+.

I'd suggest using default exports more, especially with well-scoped modules. From https://esdiscuss.org/topic/moduleimport#content-0: ES6 favors the single/default export style.

ability to build fluent-langneg with either 'min' or full likelySubtags

My recommendation right now would be to ship 0.0.1 with 10-20 most likely subtags hardcoded in the JS source and with a dummy algorithmic way of guessing more. It's okay if it makes mistakes at this stage.

I'd also put the 50KB JSON in the data/ folder so that anyone can use it if they wish. They'll need to figure out how to load it themselves, but at least it's there.

I'd also prefer if the JSON file name was all lowercase, as this is the file naming scheme we've been using so far. So perhaps: data/likely-subtags.json. I see that the camelcase name comes directly from the CLDR repo, so keeping it that way is okay, too.

a script to generate the 'likelySubtagsMin' based on our own threshold.

Let's move this to a future milestone, together with finding out how to bundle JSON files using our build system.

stasm · 2017-02-23T09:59:00Z

fluent-langneg/package.json

+  "engine": {
+    "node": ">=6"
+  },
+  "eslintConfig": {


Feel free to remove this. I followed your suggestion and used a global .eslintrc.json file in the root of the repo (0f1bc1f).

stasm · 2017-02-23T10:00:26Z

fluent-langneg/src/index.js

+
+  if (strategy === 'lookup') {
+    if (supportedLocales.length === 0) {
+      supportedLocales.push(defaultLocale);


What if the defaultLocale is undefined? Are you okay pushing it to supportedLocales too?

If it's undefined it'll throw earlier.

Ah, I didn't notice that. Thanks.

stasm · 2017-02-23T10:01:25Z

fluent-langneg/src/locale.js

+ * It also allows skipping the script section of the id, so `en-US` is properly
+ * parsed as `en-*-US-*`.
+ */
+export function parseLocale(locale, range = false) {


export default?

stasm · 2017-02-23T10:02:11Z

fluent-langneg/src/matches.js

+ *    ['en-AU'] * ['en-US'] = ['en-US']
+ *    ['sr-RU'] * ['sr-Latn-RO'] = ['sr-Latn-RO'] // sr-RU -> sr-Latn-RU
+ */
+export function filterMatches(


export default?

stasm · 2017-02-23T10:07:05Z

fluent-langneg/src/matches.js

+) {
+  const supportedLocales = new Set();
+
+  const availableLocalesCache = {};


Use Map? That way you won't have to check with hasOwnProperty in getOrParseLocale.

zbraniecki · 2017-02-23T17:45:46Z

My recommendation right now would be to ship 0.0.1 with 10-20 most likely subtags hardcoded in the JS source

The challenge here is how we select the 10-20? Is 'it' more important than 'in'?

and with a dummy algorithmic way of guessing more. It's okay if it makes mistakes at this stage.

@Pike would you be ok with this?

Alternatively, I could add a curated by us list of locals that are ok to expand:

const localesThatHasDefaultThatMatchRegion = ['it', 'fr', 'ru', 'cs', 'pl', ...];

instead?

zbraniecki · 2017-02-23T17:46:23Z

I'd also put the 50KB JSON in the data/ folder so that anyone can use it if they wish. They'll need to figure out how to load it themselves, but at least it's there.

Would you prefer that over external dependency or just a link to cldr-json npm package?

zbraniecki · 2017-02-23T17:48:07Z

I'd also prefer if the JSON file name was all lowercase, as this is the file naming scheme we've been using so far.

My hope was to keep a direct link to CLDR and they use camel case.

I'd suggest using default exports more, especially with well-scoped modules. From https://esdiscuss.org/topic/moduleimport#content-0: ES6 favors the single/default export style.

The reason I didn't do that is so that I can expose more internal methods for tests. If each file can only expose one function, I can't expose others for testing.

Let's move this to a future milestone, together with finding out how to bundle JSON files using our build system.

SGTM.

@Pike?

stasm · 2017-02-23T17:54:35Z

Would you prefer that over external dependency or just a link to cldr-json npm package?

Ah, if that's an option, then let's go for it: let's link the CLDR repo in the README.

The reason I didn't do that is so that I can expose more internal methods for tests. If each file can only expose one function, I can't expose others for testing.

You sure can :)

import foo, { bar } from './module'

imports the default export as foo as well as the named bar export.

zbraniecki · 2017-02-23T17:59:20Z

I want to ❤️️ es6 right now.

stasm · 2017-02-27T21:01:14Z

fluent-langneg/CHANGELOG.md

+
+## fluent-langneg 0.0.1
+
+  - (05d2487c) fluent-langneg 0.0.1


This is my bad: I shouldn't have included the SHAs in other changelogs. I've since stopped doing it. Changelogs should describe changes between named versions and they don't have to correspond to commits. Let's remove the SHA of the commit here and squash everything into a single commit.

stasm · 2017-02-27T21:01:52Z

fluent-langneg/README.md

+
+The API supports three negotiation strategies:
+
+* filtering (defualt)


Typo in default.

Also, please either use a ### header or intend the text below by 4 space to make it part of the bullet point.

stasm · 2017-02-27T21:07:52Z

fluent-langneg/README.md

+user can load into `fluent-langneg` to replace the minimal version.
+
+```javascript
+let data = require('./data/likelySubtags.json');


We're not shipping this file with the rest of the package (which is good) so maybe link to the CLDR repo and put the following here:

const data = require('cldr-core/supplemental/likelySubtags.json');

Pike

I've got some comments more on details, and some comments on code, and some comments on defaults.

Overall, I wonder how to document this right, too. We have three lengthy blurbs, one in README, and one index.js and one in matches.js. Not sure why which is where, and it seems that the generated docs are rather confused, too? I ran make html, and the important comments don't seem to make it into the generated docs?

Pike · 2017-03-02T17:12:22Z

fluent-langneg/README.md

+
+The API supports three negotiation strategies:
+
+### filtering (default)


Wouldn't the general use-case within fluent be to use matching instead of filtering? In that case, I'd make that default.

That's up to @stasm I guess.

Implemented it in C++ as the default, I'll update js version once I'm done with C++.

Pike · 2017-03-02T17:14:56Z

fluent-langneg/src/index.js

+    ['filtering', 'matching', 'lookup'], 'filtering');
+
+  if (strategy === 'lookup' && defaultLocale === undefined) {
+    throw new Error('defaultLocale cannot be undefined for strategy `lookup`');


This condition should be in the docs.

Pike · 2017-03-02T17:23:04Z

fluent-langneg/src/matches.js

+function variantRangeFor(locale) {
+  locale.variant = '*';
+  return locale;
+}


regionRangeFor and variantRangeFor sound like they'd create new Locale objects, IMHO. Not sure if it's better to actually do a copy, or make this a setter. I could even go as far as to not have this be a function. Or make it a setter on the object?

@stasm - what do you think? Is it worth creating a new object over modifying the existing one?

I'm working on the C++ port right now for Gecko. Once I'm done with it, I'll update the JS implementation to match it. It does have this change as well.

Pike · 2017-03-02T17:25:20Z

fluent-langneg/src/matches.js

+    // Attempt to match against the available range
+    // This turns `en` into `en-*-*-*` and `en-US` into `en-*-US-*`
+    // Example: ['en-US'] * ['en'] = ['en']
+    for (const availableLocale of availableLocales) {


It took me quite a while to untangle this code, mostly because the patterns in the code aren't factored in.

Is there a way to make this a second inner loop, between this and the inner

if (matches) { add_and_bail_if_not_matching }

? Might make the code shorter, and easier to digest, I think

How would I bail out of the outer if I factored it out to a separate function?

I was thinking of just loops, not functions.
You could also do functions, and instead of continue :outer, do early returns?

Pike · 2017-03-02T17:27:15Z

fluent-langneg/src/subtags.js

+  'zh': 'zh-hans-cn',
+  'zh-gb': 'zh-hant-gb',
+  'zh-us': 'zh-hant-us',
+};


It'd be good to add to the comment on how you got to this particular sublist.

I'm ashamed to admit that I went through likelySubtags and selected locales that I consider to be more significant and likely to have someone specify the minimized version (like ab in requested instead of ab-CD) where there are more than one region/script.

Not sure how to document it and I believe we should eventually do this in a more formal and algorithmic way.

I'd go for a comment saying that then. You could even file an issue for the follow-up and link to that from the comment?

Pike · 2017-03-02T17:33:07Z

fluent-langneg/test/langneg_filtering_test.js

+describe('Basic Language Negotiation without likelySubtags', () => {
+  const nl = negotiateLanguages;
+
+  it('exact match', () => {


Should these it() statements read as full sentences starting with "it ..."? mocha docs say yes, not sure if there's a special style among the fluent repos.

I filed #14.

zbraniecki force-pushed the langneg branch from cf44f98 to 4bb29cf Compare February 17, 2017 18:51

zbraniecki requested a review from Pike February 19, 2017 10:13

Pike suggested changes Feb 19, 2017

View reviewed changes

Pike suggested changes Feb 20, 2017

View reviewed changes

zbraniecki force-pushed the langneg branch from e33e28e to 5973c3b Compare February 22, 2017 03:10

stasm suggested changes Feb 22, 2017

View reviewed changes

zbraniecki force-pushed the langneg branch from 5973c3b to 1a051a3 Compare February 23, 2017 00:36

stasm suggested changes Feb 23, 2017

View reviewed changes

stasm approved these changes Feb 23, 2017

View reviewed changes

zbraniecki force-pushed the langneg branch from 9bfb5b0 to 4883d3a Compare February 23, 2017 18:39

Zibi Braniecki added 2 commits February 27, 2017 12:37

Add Language Negotiation module

05d2487

Add changelog entry

fd8f072

zbraniecki force-pushed the langneg branch from 76fc36d to fd8f072 Compare February 27, 2017 20:38

stasm approved these changes Feb 27, 2017

View reviewed changes

apply feedback

a63b97b

zbraniecki merged commit 053ede7 into projectfluent:master Feb 27, 2017

Pike reviewed Mar 2, 2017

View reviewed changes

zbraniecki deleted the langneg branch December 12, 2017 04:44

Pike mentioned this pull request Apr 9, 2021

Remove circular dependency in fluent-langneg #545

Merged


		## How to use

		Simply `import` or `require` the package somewhere in your code.


		The API supports three negotiation strategies:

		* filtering (defualt)


		The API supports three negotiation strategies:

		### filtering (default)

Language Negotiation #5

Language Negotiation #5

Conversation

zbraniecki commented Feb 14, 2017

zbraniecki commented Feb 14, 2017

Pike commented Feb 15, 2017

zbraniecki commented Feb 17, 2017

zbraniecki commented Feb 17, 2017

Pike commented Feb 17, 2017

zbraniecki commented Feb 17, 2017

Pike commented Feb 17, 2017 via email

zbraniecki commented Feb 19, 2017

Pike left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zbraniecki commented Feb 19, 2017

zbraniecki commented Feb 19, 2017

Pike left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stasm commented Feb 20, 2017

stasm left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stasm Feb 22, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stasm left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zbraniecki commented Feb 23, 2017

stasm left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stasm Feb 22, 2017 •

edited

Loading

stasm left a comment •

edited

Loading

stasm commented Feb 23, 2017 •

edited

Loading