Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add keywords to core blocks for the sake of the translation #6633

Open
maximebj opened this issue May 8, 2018 · 34 comments
Open

Add keywords to core blocks for the sake of the translation #6633

maximebj opened this issue May 8, 2018 · 34 comments
Labels
[Feature] Blocks Overall functionality of blocks Internationalization (i18n) Issues or PRs related to internationalization efforts [Type] Enhancement A suggestion for improvement.
Milestone

Comments

@maximebj
Copy link
Contributor

maximebj commented May 8, 2018

Hi, for now core blocks doesn't seems to use keywords.

Many languages are not as easy as english, for example in French there are several translations possible for the subhead block:

It can be :

  • sous-titre (but it's more like subtitle than subhead)
  • chapô (used by journalists)
  • extrait (excerpt)

So as none of theses translation are perfect, the use of keywords could help people to find the block without knowing its official name.

To go further with this issue :

In fact each language has specific needs (some won't need any keywords, others will need 2 or 3 according to the context).
And from one language to another the keywords are not necessarily the same.

I really don't see how it can be done technically (mayby just set some keywords in english and other language will adapt instead of literally translate).

cc @audrasjb from the french translation team with whom I have discussed this issue

@Soean Soean added the Internationalization (i18n) Issues or PRs related to internationalization efforts label May 8, 2018
@danielbachhuber danielbachhuber added [Type] Enhancement A suggestion for improvement. [Feature] Blocks Overall functionality of blocks labels May 10, 2018
@danielbachhuber danielbachhuber added this to the Bonus Features milestone May 10, 2018
@mcsf
Copy link
Contributor

mcsf commented Aug 21, 2018

Thanks for bringing this up, @maximebj. I would agree with the issue, but it also seems to me, in this particular case, that the real issue is the ambiguous semantics of the Subhead block, especially when looking at extrait but also to some extent at chapô. Perhaps the problem lies right here:

description: __( 'What’s a subhead? Smaller than a headline, bigger than basic text.' ),

Subhead is loosely defined (and its definition based on size), which incidentally may also encourage using it for cosmetic purposes more than functional. What do you think? Could we start acting there instead?

I'm not saying that we shouldn't work on a way to interpolate alternate translations into a block type's keywords, but I'd rather only do that if and when we have compelling examples.

@chrisvanpatten
Copy link
Contributor

@mcsf I think I have a compelling example! See #8365.

@mcsf
Copy link
Contributor

mcsf commented Aug 21, 2018

@chrisvanpatten, thanks for the cross-ref! In that issue:

So I expect here the issue is actually allowing translators to provide additional, language-specific options for a block’s keywords. This would be useful as in the case in the issue report […] or [for] any languages that might have multiple words to refer to a concept that only has one word in English.

Thinking about this a bit, and intending to keep things simple, I'm think string context (with _n) to be enough. Thusly:

 title: __( 'Separator' ),
- keywords: [ __( 'horizontal-line' ), 'hr', __( 'divider' ) ],
+ keywords: [ __( 'horizontal-line' ), 'hr', __( 'divider' ), _n( '', 'synonyms for Separator block' ) ],

Observations from this approach:

  • Keywords are limited to three per type, so this example raises a warning.
  • By default, _n( '', 'synonyms for foo' ) will return '' (we could also choose null or undefined), which is automatically ignorable — and could even be explicitly cleaned up with lodash#compact.
  • If more than one synonym is provided for a locale, no special token needs to be used to separate synonyms: a simple space will do. This is because matching already takes word boundaries into account. So a translator could provide a string like so:
String Context Translation
(empty) synonyms for Twitter block sns, tweet, ツイッター

@swissspidy
Copy link
Member

_n is for plural forms, not context. For context _x() has to be used.

If more than one synonym is provided for a locale, no special token needs to be used to separate synonyms: a simple space will do. This is because matching already takes word boundaries into account. So a translator could provide a string like so:

A translation could consist of multiple words though, so a space is not enough. A comma is more safe, as you have used in your example as well.

@swissspidy
Copy link
Member

swissspidy commented Nov 13, 2018

I just want to point out that right now it's difficult for translators to translate dozens of keywords for all the blocks. It would be way easier if for each block there was just a single translatable string like _x( 'horizontal line, divider, separator, block keywords' ) instead of an array of words with no context or anything.

This way it's also possible for some locales to keep the english strings if necessary, which makes using the block inserter autocomplete much easier. This is similar to the list of words in wptexturize or the list of stopwords in WP_Query.

@mcsf
Copy link
Contributor

mcsf commented Nov 13, 2018

_n is for plural forms, not context. For context _x() has to be used.

A translation could consist of multiple words though, so a space is not enough. A comma is more safe, as you have used in your example as well.

Must’ve been half asleep when I wrote this. :/ Thanks for making sense of it.

easier if for each block there was just a single translatable string

This sounds apt to me as well. It could mean that the block API also accepts keywords as a string that it then splits on comma. This would, as a bonus, prevent translations from bypassing Gutenberg’s imposed limit of three keywords per block type. /cc @mtias

@swissspidy
Copy link
Member

Note sure if the 3-keywords-limit is still appropriate for translations though.

@mcsf
Copy link
Contributor

mcsf commented Nov 13, 2018

Note sure if the 3-keywords-limit is still appropriate for translations though.

As a translator, I agree; as someone who knows how these things can easily be exploited, I don't so much. :)

@gziolo
Copy link
Member

gziolo commented Feb 13, 2019

#13848 has landed which removes the limit of 3 keywords altogether. Does it solve this issue?

@maximebj
Copy link
Contributor Author

I'm not sure it could specifically fix this issue, but it's a start.

If a block only has 2 keywords, all translators should use only 2 translations. Which doesn't fit in any cases.

In my opinion, keywords should be a simple string as @swissspidy suggested before

keywords: _( 'word, another, thing, stuff'),

So translator could use only the amount of keyword needed in each language.
In JS the keyword would still be searchable.

I think we still need to discuss this issue before closing it

@mcsf
Copy link
Contributor

mcsf commented Feb 13, 2019

In my opinion, keywords should be a simple string as swissspidy suggested before

I see no compelling argument against adopting this.

@maximebj
Copy link
Contributor Author

This would just require some backward compatibility to existing blocks using arrays, but not a big deal.

@swissspidy
Copy link
Member

Let's explore this in a PR then 🙂

@mcsf
Copy link
Contributor

mcsf commented Feb 15, 2019

To be clear: are you owning that, @swissspidy? Or would you like someone to help?

@swissspidy
Copy link
Member

It would be great if someone else could help with this as I don't have that much time to devote to this at the moment.

@mcsf
Copy link
Contributor

mcsf commented Feb 15, 2019

Alright then. Any takers, @maximebj, @bisko?

@bisko
Copy link

bisko commented Feb 15, 2019

I can take on that sometime next week! It seems like a good problem to use to dive into the Gutenberg world!

@bisko
Copy link

bisko commented Feb 26, 2019

I've been playing a bit with this last week and I think I won't be able to fully solve the issue short of what @gziolo suggested here.

The main issue is that we get an already translated block name/title when we build up the autocomplete cache. Relevant portion is in the loadOptions method - # - the data that gets passed contains all the data already translated and I'm not sure what's a good way to get the "source" data, without the translation.

If we look at the Quote block configuration for example - # - we can see the issue is how we define the data:

export const settings = {
	title: __( 'Quote' ),
	description: __( 'Maybe someone else said it better -- add some quoted text.' ),
	icon: <SVG ...(shorted)... /SVG>,
	category: 'common',
	keywords: [ __( 'blockquote' ) ],

If we didn't specify the title with translation directly, then we can use it to build up better set of keywords. Unfortunately if we only put the non-translated version there we have no good way of providing the translation markup that Babel uses to detect the strings.

I'm thinking of several possible solutions, none of which is ideal, but people with more experience can provide more feedback and thoughts here.

  1. Provide untranslated block title in the set of keywords. This seems the best option if we want to keep the current settings syntax. It's now possible after Error: The block "xxx" can have a maximum of 3 keywords. #11949 got merged and lifted the 3 keywords limit.
  2. Dynamically add the block id to the list of keywords after some cleanup as suggested by @gziolo here
  3. Add a title_native or title_untranslated property to the block settings and add that to the list of keywords
  4. This is just a theory, as I'm not that well versed in the JS build systems - During build time, capture all the translatable strings from title and keywords, make the list unique and then add this list to the list of keywords.
    • For example in the Quote block above - during build, the process is going to grab 'Quote' out of title and put it into the keywords list, so in the end we'll have a list like keywords: [ __( 'blockquote' ), 'blockquote', __( 'Quote' ), 'Quote' ]

Options 1 and 3 will require manual work and maintenance on all current and future blocks to keep the keywords lists up to date.

Option 2 is a bit of a hack that solves the issue in the immediate future but as mentioned it's going to cause problems with generated names.

Option 4 seems (to me) as most future proof if the syntax keeps as it is now. Blocks that didn't follow the same build process will "revert" to the current behavior and not appear in the list during search, while blocks that had the "new" build process will appear properly translated.

I'm not sure what option is the best, as I mentioned above, so let's discuss that! :)

@swissspidy
Copy link
Member

Can we perhaps leave it to translators to add the original untranslated string if they want to? Because always including the untranslated keywords doesn't make sense in every case because it adds unnecessary noise and leads to unexpected results for non-English speaking people.

This is easily possible if the keywords are just a comma separated string.

Take the Quote block as an example:

Instead of keywords: [ __( 'blockquote' ) ], we could use comma-separated strings, e.g. keywords: __( 'blockquote,quote' ),.

For the German translation, polyglots can then translate this to quote,zitat.

@bisko
Copy link

bisko commented Feb 26, 2019

I want to add a bit of a background on why I got involved in this discussion out of nowhere, to hopefully clarify what the issue I'm trying to solve is.

I'm constantly switching between English and Bulgarian layouts throughout the day and also using WordPress(.com but that doesn't matter in this case) translated to Bulgarian (so I can get a sense of problematic translations and update them) to write on our internal P2s using Gutenberg.

With that constant switching, I'm often typing in the wrong language when I go back to an app and start typing in the wrong language.

Slack and PHPStorm handle this very well as I can go in and start typing the "English" version of what I want to type and it gives me what I wanted:

screenshot 2019-02-26 at 15 11 28
screenshot 2019-02-26 at 15 09 21

(note: I understand that's more of transliteration between Cyrillic and Latin scripts, but it has to do with the usability and user expectations)

Gutenberg on the other hand doesn't give me anything that's not an exact string match:

screenshot 2019-02-26 at 15 13 37

In this case I'm trying to add a Title block, which in Bulgarian is translated to Заглавие. Since I'm writing in English for my colleagues, I have to switch to Bulgarian, type Загл insert the block, switch back to English and continue typing.

An ideal case (for me) would be that I would be able to insert this title block with all the following options: Title, Титле, Заглавие, Zaglavie or a substring of that word.

Can we perhaps leave it to translators to add the original untranslated string if they want to?

With the above said, I don't fully agree to making this optional as it will create an annoying disparity between blocks that support that and blocks that do not, especially in multi-lingual setups.

Instead of keywords: [ __( 'blockquote' ) ], we could use comma-separated strings, e.g. keywords: __( 'blockquote,quote' ),.

For the German translation, polyglots can then translate this to quote,zitat.

I'm a bit worried of the manual approach here for translators. What happens if the plugin author updates the keywords? Wouldn't they need to be re-translated? It seems a bit more stable to have both translated and untranslated versions in the keywords list so if a new keyword is added only that one will be translated (if not already translated).

I'm sorry if I'm derailing the discussion off the main topic of the issue. I can open another issue to discuss the above if needed.

@swissspidy
Copy link
Member

It seems a bit more stable to have both translated and untranslated versions in the keywords list so if a new keyword is added only that one will be translated (if not already translated).

True, it's more stable as in reliable, but as I mentioned sometimes also not necessarily wanted. Hence my suggestion to leave it to the locale managers.

What happens if the plugin author updates the keywords? Wouldn't they need to be re-translated?

Yes, but that is also the case when it's an array and the plugin author changes the keywords… So I don't see your point here.

Please see #6633 (comment) for my original reason for suggesting comma-separated strings.

If it's an array of 3 keywords, often times in German we have 4 or 5 keywords to describe the block. There's currently no way to support that.

That's why a comma-separated string is preferable.

I don't feel strongly about whether that ends up in comma-separated originals plus comma-separated translations being used for the autocomplete. That would solve both problems, no?

@bisko
Copy link

bisko commented Feb 26, 2019

Yes, but that is also the case when it's an array and the plugin author changes the keywords… So I don't see your point here.

I think I gave a bad example here.

A single string would be as you mentioned above:

keywords: __( 'blockquote,quote' ) which the translator has to translate to blockquote,quote,zitat for German.

Then if the author wants to make the plugin accessible via the lyrics keyword (for song lyrics for example) and add lyrics to the mix, the string will become 'blockquote,quote,lyrics', which will require re-translation of the whole string in all languages as it has changed and no longer matches.

If it's kept as an array - keywords: [ __( 'blockquote' ), 'blockquote', __( 'quote' ), 'quote' ], the addition of lyrics will become just 2 more entries at the end of the array: [ __( 'blockquote' ), 'blockquote', __( 'quote' ), 'quote', __( 'lyrics' ), 'lyrics' ], meaning that the translators will have to translate only the word lyrics (which can also be already translated).

The autocomplete search already uses an array loop to find a match ( # ), so this change would be only required for the block configurations, not the search code.

If it's an array of 3 keywords, often times in German we have 4 or 5 keywords to describe the block. There's currently no way to support that.

That's a whole another set of problems for i18n :( Is there a supported way to add aliases for translations in the engine that's used by WordPress? That would be a great candidate.

Another way I'm thinking of right now would be to add a "dynamic" entry to the translation file, since it's already machine-generated (gutenberg.pot) that's something along the lines of <block-id>-aliases and translators can add all the aliases to that block for the language they're translating for?

This way we don't add too much syntax to the configuration and have the option of multiple aliases for the same block.

@mcsf
Copy link
Contributor

mcsf commented Feb 26, 2019

I had a whole tirade with wild ideas about managing breaking changes to strings, etc., :) but the reality is that this is why there is a cycle to WP and Gutenberg development. In WordPress core, there is an actual schedule that encompasses string freezes. There is no such thing in the Gutenberg plugin, though; the closest equivalent would be the narrow window between a plugin release candidate and its release (typically occurring 48 hours later). So there's room for improvement if we want to provide some localisation stability for users of the plugin who have chosen something other than English.

Going back to the issue at hand, I think that, overall, comma-separated strings would solve most issues. As Pascal points out, it lets locale managers deal with each locale's idiosyncrasies. There are even more technical precedents, such as delegating decisions on font families, so it seems more than fair that they should now decide not only what synonyms to provide for each keyword, but also whether to include fallback strings.

@mcsf
Copy link
Contributor

mcsf commented Feb 26, 2019

Then if the author wants to make the plugin accessible via the lyrics keyword (for song lyrics for example) and add lyrics to the mix, the string will become 'blockquote,quote,lyrics', which will require re-translation of the whole string in all languages as it has changed and no longer matches.

This is arguably a more tangible concern for third-party blocks than core ones, that's true. Honestly, though, given the flexibility of the comma-separated method, I think it would be fine if a block author particularly concerned with break translations and discoverability were to do the following:

  1. Starts with keywords: [ __('blockquote,quote') ], which has been translated to target locales.
  2. Decides to add lyrics: keywords: [ __('blockquote,quote,lyrics') ].
  3. Wants to allow time for translators to catch up, so temporarily declares keywords: [ __('blockquote,quote,lyrics'), __('blockquote,quote') ].

It's up to the block author to manage these deprecations and clean up keywords afterwards, but it would solve the problem. Naturally, keywords would be compiled by flattening the arrays and uniq'ing.

@swissspidy
Copy link
Member

That's a whole another set of problems for i18n :(

It's exactly what this issue here tries to address though. Quote:

In fact each language has specific needs (some won't need any keywords, others will need 2 or 3 according to the context).

@youknowriad youknowriad modified the milestones: WordPress 5.x, Future Mar 25, 2019
miminari added a commit to gutenbergfreaks/gutenberg that referenced this issue Jun 5, 2020
@miminari
Copy link
Member

miminari commented Jun 5, 2020

I read this Issue. And I think It's better than now.

Japanese users can't use these shortcuts actually.

So, I tested some keywords add to the button block like 3c7ec60 , I could get the button block by the keyword 'link' and 'button' in Japanese, not only 'ボタン' or 'リンク'. I also tested Heading blocks adding the keywords __( 'title,subtitle,heading' ),, I could get the heading block by 'heading'.

It's a big change for me (and maybe the other not English language users), so if this way looks good for you, I would like to add the keywords to the other core blocks ASAP.

@paaljoachim
Copy link
Contributor

What is still needed for this issue?

@mcsf
Copy link
Contributor

mcsf commented Jul 13, 2021

@gziolo: now that we have block.json with i18n support underway (wp-cli, register_block_type), do the circumstances of this PR change?

@swissspidy
Copy link
Member

It's still an issue IMO, because block.json does not address the concern that other languages might need more (or less) keywords for a block.

@gziolo
Copy link
Member

gziolo commented Jul 13, 2021

It's now automated so we always wrap the keyword with _x( keyword, 'block keyword', 'textdomain' ). Should we also inject some other keywords while registering a block?

@swissspidy
Copy link
Member

It's necessarily about adding more keywords. In some languages less keywords would be sufficient.

I think the agreed-upon approach in the discussion on this ticket so far was to have a comma-separated list of keywords. That would solve most issues.

So in block.json:

{
  "keywords": "foo,bar,baz"
}

@mcsf
Copy link
Contributor

mcsf commented Jul 13, 2021

I think the agreed-upon approach in the discussion on this ticket so far was to have a comma-separated list of keywords. That would solve most issues.

So in block.json:

{
  "keywords": "foo,bar,baz"
}

Commas seem safe enough, but: is there any chance that there's some locale out there that uses commas (&comma;, not some Unicode variant) as something other than a word-or-clause separator? The only potential obstacle I found on Wikipedia was:

[…] The comma therefore functions as a silent letter in a handful of Greek words, principally distinguishing ό,τι (ó,ti, "whatever") from ότι (óti, "that").

@mcsf
Copy link
Contributor

mcsf commented Jul 13, 2021

Presumably, there are ways around it:

  • We make it so that a locale can override the default token separator by providing its own string instead of ,
  • We define the token as , (comma + space) or anything with white space on either side of the comma.

@swissspidy
Copy link
Member

cc @ocean90

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[Feature] Blocks Overall functionality of blocks Internationalization (i18n) Issues or PRs related to internationalization efforts [Type] Enhancement A suggestion for improvement.
Projects
None yet
Development

No branches or pull requests