Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Per-extension language preferences #641

Open
wants to merge 11 commits into
base: main
Choose a base branch
from
234 changes: 234 additions & 0 deletions proposals/per-extension-language-preferences.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,234 @@
# Proposal: Per-extension language preferences

**Summary**

This proposal allows developers (users) to set a specific language for their extension (which may be different from the browser or system default language) and create a language selection menu for users.

**Document Metadata**

**Author:** hanguokai

**Sponsoring Browser:** Chromium
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forget if we had already agreed this - was there a discussion somewhere? If not I will take it to the team.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, let me confirm then. I definitely like the change, I'm just being extra careful any time we add ourselves as the sponsoring browser since we agreed on some strict rules about that meaning we will implement in a timely fashion.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey Jackie - I spoke to the team about this. We're supportive of the idea, but it is very unlikely we would be able to implement this ourselves any time soon. With that in mind, we'd only be comfortable sponsoring this if there was an external contributor able to implement it.

@xeenon / @Rob--W With that in mind, would either Apple or Mozilla be more likely to implement this soon and be interested in sponsoring?


**Contributors:**

**Created:** 2024-06-16

**Related Issues:** [#258](https://github.com/w3c/webextensions/issues/258)

## Motivation

### Objective

Prior to this proposal, browsers automatically selected a `browser.i18n` language for extensions, and developers (users) could not set a specific language.

For multilingual users, sometimes they want to set a specific application (extension) to another language that is different from the system (browser). This need is not uncommon.
[Android 13](https://developer.android.com/guide/topics/resources/app-languages), [iOS 13 and macOS Catalina](https://developer.apple.com/news/?id=u2cfuj88) support a feature
called "Per-app language preferences (settings)". This proposal brings the same function to browser extensions.

#### Use Cases

- Extensions create a language selection menu for users.
- [Optional] Browsers could potentially provide a built-in language selection menu for users, just like Android and iOS.
- [Optional] Extensions could integrate third-party i18n implementations (frameworks) with the browser's built-in language selection menu.

### Known Consumers

This is a generic i18n feature, which means that all types of extensions can use it. Some extensions already provide this functionality via workarounds or non-browser.i18n implementations.

## Specification

### Definitions

In this document:
- **locale** or **language code** or **language tag** is a string that represent a language, defined in [BCP 47](https://www.rfc-editor.org/info/bcp47).
For example, `en-US`, `zh-CN`, `fr`. It is used by `browser.i18n`, `Date`, `Intl` and various other APIs.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

zh-CN is presumably meant to indicate Simplified Chinese, which is also used in Singapore. It might be better to use zh-Hans as the language tag in the example.

See also https://www.w3.org/International/articles/language-tags/index.en.html#script

Copy link
Member Author

@hanguokai hanguokai Jun 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps due to historical reasons, browsers still uses 'zh-CN' instead of 'zh-Hans'. For example, navigator.language returns "zh-CN" in Chrome and Firefox on macOS. And Chrome only supports a list of languages in the /_locales/ directory, which only contains a limited combination of languages and regions.

I think solving this problem seems to go beyond the core issue that this proposal aims to address. This proposal is intended to expand functionality on the basis of existing capabilities.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still would be good and is also relevant for this issue what locales a browser supports / can deal with. See also: #131

For example, the Naver Whale store rejects any language tag which includes the script subtag (Hans, Latn, Cyrl, and others).

- **the user preferred language** is the language set by `i18n.setCurrentLanguage(code)`. The user may not have set it.
- **the extension displayed language** is the language which the extension is displayed in. Note that the extension displayed language may be different from the browser UI language. For example, if the user preferred language is not set, the browser UI language is English, and the extension only supports French (default) and Japanese, then the extension displayed language is French because `i18n.getMessage()` returns French message at this situation.

### Schema

```ts
/**
* Get the extension displayed language.
* return the language tag, e.g. en-US.
hanguokai marked this conversation as resolved.
Show resolved Hide resolved
*/
i18n.getCurrentLanguage() : string

/**
* Set the user preferred language.
* if code is null, revert to the unset state.
* if code is not valid, return a rejected Promise.
hanguokai marked this conversation as resolved.
Show resolved Hide resolved
* else return a Promise resolved when the operation is complete.
*/
i18n.setCurrentLanguage(code: string) : Promise<void>
hanguokai marked this conversation as resolved.
Show resolved Hide resolved

/**
* Get all languages that supported by this extension.
* return a Promise resolved with an array of language tags.
*/
i18n.getAllLanguages(): Promise<string[]>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this one makes sense, given that extensions should already know the languages they support. I think this gets tricky in certain situations (partial strings, parent language tags) so I wonder if we should leave this out. I see that having it provides some convenience but I'm not sure it's essential for the MVP.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extensions should already know the languages they support

Yes. It is just a convenient method. And I think it is easy to implement.

in certain situations (partial strings, parent language tags)

Languages with regional subtags are not a problem for users. For example, a language selection menu that includes French, French (Canada), French (Belgium) and French (France). This method should return all languages if they are there.

For partial strings, from the platform's standpoint, the platform does not need to consider these, but assumes that all these languages are supported.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@oliverdunk That is to say the browser per definition supports all the language tags specified in the extension. I suggest we change the method name to i18n.getAvailableLanguages to let the browser return all the languages the extension can call setCurrentLanguage to.

It is not just out of convenience. For example, for the Whale Store I have to exclude language tags with script variations. (zh-Hant, sr-Latn). Thus those are removed from _locales. Having to specify a list of all supported languages for each browser and store in every area of the extension this is used is not as straight forward as it might sound.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's important to know the difference between a locale and localization here. There can be many locales supported by a single localization. This API seems to want to return the list of available localizations, but users typically need to choose between the list of available locales. The locale can be used do to internationalized formatting/processing (such as in MessageFormat or using Intl), which provides a better-adapted, richer localized experience than merely translating the static messages.

For example, many applications come in just two varieties of English (US vs. UK/International English), but support many locales. The language negotiation/resource fallback (such as used by getMessage) takes care of filling in the localized strings for any requested locale (including using the default language when the requested locale has no available localization), but you still want the locale (in most cases) to provide processing/formatting. Note that many applications also tailor the fallback, so that, for example, the es-419 (Spanish, Latin America) localization serves many seemingly-unrelated (from a tag perspective) Spanish locales (es-CO, es-CR, es-AR, etc.....)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There can be many locales supported by a single localization. This API seems to want to return the list of available localizations

Yes, this proposal and the current browser.i18n.getMessage() mechanism ( MDN doc , Chrome doc ) mainly focus on localization (language translations). In other words, it focuses on languages, not all possible regions.

Regarding expanding locales, I think this may be an area that needs to be improved in the future, or expand by developers themselves.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding expanding locales, I think this may be an area that needs to be improved in the future, or expand by developers themselves.

I agree about alternate fallback paths (although Intl has a proposal for this and it is already the way that CLDR data--and thus Intl--works for some locale-based "best match" selection vs. BCP47 Lookup), but think that this document isn't putting enough thought into the separation of locales and localizations/translations. Trying to use the same mechanism for both will disappoint users. Trivial examples with only a couple of languages don't match the needs of those developers who need to support a fairly large set of locales.

getMessage tries to use a BCP47 Lookup type of fallback to select the correct message "like a resource bundle" (cf. Java or GNU gettext for examples of other resource bundle systems). Resource selection often had a dual fallback capability (the messages files and the specific keys within each message file).

The resource files want to use the least specific locale as their identifier possible (the better to provide coverage for many locales). The user, however, wants to specify the most specific locale possible (including script, region, and various locale extensions where applicable), the better to tailor the runtime experience.

In this document, there are calls to things such as "selecting which locale" to use (i.e. a picker) as well as "selecting which localization" to use (i.e. which language file or files to download. These lists want to be different (the list of locales is long, while the list of localizations is typically shorter).


Note that there is on-going work at W3C related to manifest (they are adopting slightly different mechanisms for managing localization). Also, I am the editor of Developing Localizable Manifests, which tries to enumerate a lot of this material and which readers on this thread might find useful. Lastly, there is work at Unicode on MessageFormat 2 and proposed for MessageResource (which is similar to the message files in getMessage)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aphillips, thanks for taking your time to share your expertize on this issue. I am certainly out of my depth so the extra context and pointers on what to consider are really appreciated. If you'd like to attend a public meeting at some point, we could likely carve out some time to make sure we can have an in depth discussion. Otherwise your time continuing to help with the conversation here is valued regardless.

Based on all of the above, I am still of the opinion that offering a getAllLanguages that is intended for rendering a menu is going to be very hard to get right. Given users want to choose a fairly specific locale, it would likely need to return a much longer list of anything you can setCurrentLanguage to, as @carlosjeurissen suggested. However, that seems like it would be a long list which would be hard to render with the right nesting / structure without additional work. You would also need a mapping from returned codes to strings which are human readable.

I can see the value in a function that returns "which of the folders in _locales were accepted". Perhaps we could lean into that with an API like getParsedLocalesDirectories()? Based on @aphillips' explanation it seems like locales may be the wrong word there but I think we are slightly backed into a corner by the existing usage of _locales as the canonical place for extensions to store messages.json files.

For setCurrentLanguage, I'm convinced after this discussion that you should be able to set it to any valid locale, however specific, and the browser should pick the appropriate language file with fallbacks if needed. That way you can offer a fairly detailed picker for users and rely on the browser to use its built-in fallback behavior.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rendering a menu is going to be very hard to get right ……

Creating a language selection menu has always been a challenge in i18n work, but it is also a necessary work. It can be very simple or very complex. For example, the extension can only list a few languages without variations (like regions and scripts), or list languages with limited or all regions, or let users set languages, regions and extensions (like calendar, date format, etc) separately.

You would also need a mapping from returned codes to strings which are human readable.

Intl.DisplayNames can help it. It also depends on how developers want to design it. For example, a language code can be displayed in two display names (e.g. "zh-CN" can be displayed by "Chinese Simplified (简体中文)"). Anyway, display names are not what this proposal is trying to solve.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For setCurrentLanguage, I'm convinced after this discussion that you should be able to set it to any valid locale, however specific, and the browser should pick the appropriate language file with fallbacks if needed. That way you can offer a fairly detailed picker for users and rely on the browser to use its built-in fallback behavior.

@oliverdunk Do you know why Chrome only allow developers to use a fixed list of languages? It is limited by the browser or Chrome Web Store? If it is a hard limitation, that means the browser doesn't support some locales to use in setCurrentLanguage() and the /_locales/ directory.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hanguokai My understanding is that it is based on the locales supported by some of the internal libraries we use to handle i18n.

hanguokai marked this conversation as resolved.
Show resolved Hide resolved

/**
* After changing to a new language, the browser triggers a language changed event.
* The callback is function(code: string) : void.
* The code parameter in the callback is the new language.
*/
i18n.onLanguageChanged.addListener(callback)
```

### Behavior
hanguokai marked this conversation as resolved.
Show resolved Hide resolved

#### Behavior of `i18n.getCurrentLanguage()`

This method return the language that the extension is displayed in.

- If the extension doesn't use `browser.i18n` (there is no "_locales" directory), return `undefined`.
- If the preferred language is not set by `i18n.setCurrentLanguage()`, returns the current language used by `i18n.getMessage()`, assuming that all languages support all possible keys.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- If the preferred language is not set by `i18n.setCurrentLanguage()`, returns the current language used by `i18n.getMessage()`, assuming that all languages support all possible keys.
- If the preferred language has not been set by `i18n.setCurrentLanguage()`, returns the first language for which a message file exists following the same fallback mechanism used by `i18n.getMessage()

As a sidenote, the fallback mechanism is not the same across browsers. See: #296

- If the preferred language is set by `i18n.setCurrentLanguage()`, and the extension supports this language, then return this language.
- If the preferred language is set by `i18n.setCurrentLanguage()`, but the extension doesn't support this language (no message file for this language), then treat as if the preferred language is not set. This is an edge case, for example, the language was removed when the extension was upgraded.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we allow you to get into this state? I would've expected setCurrentLanguage to throw an error and abort updating the language.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It depends on the implementation. Although setCurrentLanguage() throw an error, but as I said in the "Implementation Notes" section: When an extension is upgraded, the browser should check to see if the languages it supports has changed (especially in the case of deletion). If the browser does this check, it will not get into this state, otherwise it may happen.

From a specification perspective, I'm just listing this possibility. From an implementation perspective, browsers just need to make sure that this problem can be avoided.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned above, following the discussion with @aphillips I'm more in favour of allowing this now. Perhaps we should still throw in extreme cases (if you try to set the language to French, but you only have English message files) but beyond that allowing you to use browser fallbacks seems helpful.

hanguokai marked this conversation as resolved.
Show resolved Hide resolved

##### Use Case 1: use the extension displayed language with other locale-related APIs.

For example, if the preferred language is not set, the language of the browser UI is English (en-US), the extension supports French (default) and Japanese, then `i18n.getCurrentLanguage()` return `"fr"` because `i18n.getMessage()` return French message at this situation.

```js
// use current extension displayed language to format date
const locale = browser.i18n.getCurrentLanguage();
const today_date = new Date().toLocaleString(locale); // format date string in 'fr'

// if don't use current extension displayed language, it may be formatted in other locale
const today_date = new Date().toLocaleString(); // format date string in 'en-US'
```

##### Use Case 2: show the current language in the extension language selection menu.

#### Behavior of `i18n.setCurrentLanguage(code)`

This method sets the user preferred language for this extension.

- If the extension doesn't use `browser.i18n` (there is no "_locales" directory), return a rejected Promise.
hanguokai marked this conversation as resolved.
Show resolved Hide resolved
- If the code is an invalid language tag or an unsupported language by this extension, return a rejected Promise.
- If the code is null, revert to the unset state.
- else do the following:
1. save the language persistently, and make sure that the language is prioritized for use next time.
1. return a resolved Promise.
1. trigger `i18n.onLanguageChanged` event.

In addition, when `i18n.setCurrentLanguage(code)` success:
- the browser should update related browser UI and extension UI, because some values in manifest.json may be changed, like `name`, `short_name`, `description` and `action.default_title`.
hanguokai marked this conversation as resolved.
Show resolved Hide resolved
- Text from `i18n.getMessage()` and css files don't update. These are handled by developers.

##### Persistence of `i18n.setCurrentLanguage(code)`

This setting is persistent when:
- restart or upgrade the browser.
- disable then re-enable the extension.
- upgrade the extension to a new version.

#### Behavior of `i18n.getAllLanguages()`

This method returns an array of language tags that the extension supported, e.g. `["en", "fr", "ja", "zh-CN"]`.
This is a convenient method. Without this method, developers need to hardcode it into their code.

#### Behavior of `i18n.onLanguageChanged` event

When the user preferred language is changed to a new different language by `i18n.setCurrentLanguage(code)`, this event is fired.
Developers use this event to know the language changed, and might do the following:

- Update the content of extension pages in place (without refreshing the page)
- Reload extension pages:
- If the page is stateless, refresh the page directly.
- If the page is stateful, save the current state and then refresh the page.
- Prompts the user that the language setting has changed, and asks the user if they want to refresh the page immediately.
- Re-create the extension context menus, since the title of the context menus shoule be updated.
- Update the action title and badge text if needed.

#### Behavior of `i18n.getMessage()`, manifest and CSS files

When this proposal is implemented, the behavior of these existing functions will change as follows:

- If the preferred language is not set by `i18n.setCurrentLanguage()`, it is consistent with the existing behavior.
- If the preferred language is set by `i18n.setCurrentLanguage()`, and the extension supports this language, then prioritize using that language.
- If the preferred language is set by `i18n.setCurrentLanguage()`, but the extension doesn't support this language, then treat as if the preferred language is not set. This is an edge case, for example, the language was removed when the extension was upgraded.

### Browser built-in language selection menu

Based on the capabilities provided by this proposal, browsers could provide a unified built-in language selection menu for extensions, like Android and iOS.
Whether to provide the built-in language selection menu and how to implement it is up to the browser to decide. This is just a suggestion.

The built-in language selection menu has the following benefits:
- Easy for developers: Developers do not need to implement it themselves, they only need to adapt to this proposal.
- Easy for users: A unified UI makes it easier for users to use, otherwise each extension might be set up differently.
- It is convenient for developers to develop and test i18n functions by switching the extension language.

### Integrate other i18n implementations (frameworks) with browser.i18n

This proposal can also be integrated with third-party i18n frameworks, as these frameworks typically allow developers to specify the language to be used.
For example, developers use `i18n.getCurrentLanguage()` and `i18n.setCurrentLanguage()` for the user preferred language, but don't use `i18n.getMessage()`.
hanguokai marked this conversation as resolved.
Show resolved Hide resolved

```js
otherI18nFramework.setLanguage(browser.i18n.getCurrentLanguage());

let text = otherI18nFramework.getMessage(key);
```

### New Permissions

N/A

### Manifest File Changes

If the browser would like to support the built-in language selection menu for extensions, the manifest file should add a new key for developers to opt-in this feature, like the following:
hanguokai marked this conversation as resolved.
Show resolved Hide resolved
```
{
"name": "__MSG_extName__",
"default_locale": "en",
"builtin_languages_menu": true | false(default)
hanguokai marked this conversation as resolved.
Show resolved Hide resolved
}
```

## Security and Privacy

### Exposed Sensitive Data

N/A

### Abuse Mitigations

N/A

### Additional Security Considerations
hanguokai marked this conversation as resolved.
Show resolved Hide resolved

N/A

## Alternatives

### Existing Workarounds

##### Workaround-1: fetch message files

This is not ideal. Developers need to solve many problems on their own, such as saving the preferred language, implementing placeholders, and fallback mechanisms.

##### Workaround-2: use other i18n frameworks or implement one yourself

There are some JavaScript i18n frameworks that provide similar features, which allow developers to specify the language they want to use.
In fact, this workaround is equivalent to asking developers to give up using `browser.i18n`.
In addition, developers need to save the preferred language by themself.

##### Workaround-3: mix browser.i18n and other workarounds.

Because some text can only be localized by `browser.i18n`, such as the extension name and description, developers often mix different implementations.

### Open Web API

The Web provides some related tools, such as `Intl`, but there is no unified framework that provides functionality such as `browser.i18n.getMessage()`.
There are some third-party i18n frameworks, but they use custom mechanisms rather than the `browser.i18n` mechanism.

## Implementation Notes

When the extension is upgraded, if the new version removes the language that was set by `i18n.setCurrentLanguage(code)`, the user preferred language should be reverted to the unset state.

## Future Work

N/A
Loading