Add keywords to core blocks for the sake of the translation #6633

maximebj · 2018-05-08T08:44:59Z

Hi, for now core blocks doesn't seems to use keywords.

Many languages are not as easy as english, for example in French there are several translations possible for the subhead block:

It can be :

sous-titre (but it's more like subtitle than subhead)
chapô (used by journalists)
extrait (excerpt)

So as none of theses translation are perfect, the use of keywords could help people to find the block without knowing its official name.

To go further with this issue :

In fact each language has specific needs (some won't need any keywords, others will need 2 or 3 according to the context).
And from one language to another the keywords are not necessarily the same.

I really don't see how it can be done technically (mayby just set some keywords in english and other language will adapt instead of literally translate).

cc @audrasjb from the french translation team with whom I have discussed this issue

mcsf · 2018-08-21T16:49:54Z

Thanks for bringing this up, @maximebj. I would agree with the issue, but it also seems to me, in this particular case, that the real issue is the ambiguous semantics of the Subhead block, especially when looking at extrait but also to some extent at chapô. Perhaps the problem lies right here:

gutenberg/packages/block-library/src/subhead/index.js

Line 18 in 1d92884

    
           description: __( 'What’s a subhead? Smaller than a headline, bigger than basic text.' ),

Subhead is loosely defined (and its definition based on size), which incidentally may also encourage using it for cosmetic purposes more than functional. What do you think? Could we start acting there instead?

I'm not saying that we shouldn't work on a way to interpolate alternate translations into a block type's keywords, but I'd rather only do that if and when we have compelling examples.

chrisvanpatten · 2018-08-21T16:52:17Z

@mcsf I think I have a compelling example! See #8365.

mcsf · 2018-08-21T17:15:57Z

@chrisvanpatten, thanks for the cross-ref! In that issue:

So I expect here the issue is actually allowing translators to provide additional, language-specific options for a block’s keywords. This would be useful as in the case in the issue report […] or [for] any languages that might have multiple words to refer to a concept that only has one word in English.

Thinking about this a bit, and intending to keep things simple, I'm think string context (with _n) to be enough. Thusly:

 title: __( 'Separator' ),
- keywords: [ __( 'horizontal-line' ), 'hr', __( 'divider' ) ],
+ keywords: [ __( 'horizontal-line' ), 'hr', __( 'divider' ), _n( '', 'synonyms for Separator block' ) ],

Observations from this approach:

Keywords are limited to three per type, so this example raises a warning.
By default, _n( '', 'synonyms for foo' ) will return '' (we could also choose null or undefined), which is automatically ignorable — and could even be explicitly cleaned up with lodash#compact.
If more than one synonym is provided for a locale, no special token needs to be used to separate synonyms: a simple space will do. This is because matching already takes word boundaries into account. So a translator could provide a string like so:

String	Context	Translation
(empty)	synonyms for Twitter block	sns, tweet, ツイッター

swissspidy · 2018-11-11T13:49:49Z

_n is for plural forms, not context. For context _x() has to be used.

If more than one synonym is provided for a locale, no special token needs to be used to separate synonyms: a simple space will do. This is because matching already takes word boundaries into account. So a translator could provide a string like so:

A translation could consist of multiple words though, so a space is not enough. A comma is more safe, as you have used in your example as well.

swissspidy · 2018-11-13T10:37:48Z

I just want to point out that right now it's difficult for translators to translate dozens of keywords for all the blocks. It would be way easier if for each block there was just a single translatable string like _x( 'horizontal line, divider, separator, block keywords' ) instead of an array of words with no context or anything.

This way it's also possible for some locales to keep the english strings if necessary, which makes using the block inserter autocomplete much easier. This is similar to the list of words in wptexturize or the list of stopwords in WP_Query.

mcsf · 2018-11-13T11:28:56Z

_n is for plural forms, not context. For context _x() has to be used.

A translation could consist of multiple words though, so a space is not enough. A comma is more safe, as you have used in your example as well.

Must’ve been half asleep when I wrote this. :/ Thanks for making sense of it.

easier if for each block there was just a single translatable string

This sounds apt to me as well. It could mean that the block API also accepts keywords as a string that it then splits on comma. This would, as a bonus, prevent translations from bypassing Gutenberg’s imposed limit of three keywords per block type. /cc @mtias

swissspidy · 2018-11-13T11:30:50Z

Note sure if the 3-keywords-limit is still appropriate for translations though.

mcsf · 2018-11-13T11:38:33Z

Note sure if the 3-keywords-limit is still appropriate for translations though.

As a translator, I agree; as someone who knows how these things can easily be exploited, I don't so much. :)

gziolo · 2019-02-13T08:42:33Z

#13848 has landed which removes the limit of 3 keywords altogether. Does it solve this issue?

maximebj · 2019-02-13T12:09:41Z

I'm not sure it could specifically fix this issue, but it's a start.

If a block only has 2 keywords, all translators should use only 2 translations. Which doesn't fit in any cases.

In my opinion, keywords should be a simple string as @swissspidy suggested before

keywords: _( 'word, another, thing, stuff'),

So translator could use only the amount of keyword needed in each language.
In JS the keyword would still be searchable.

I think we still need to discuss this issue before closing it

mcsf · 2019-02-13T13:13:56Z

In my opinion, keywords should be a simple string as swissspidy suggested before

I see no compelling argument against adopting this.

maximebj · 2019-02-13T13:22:23Z

This would just require some backward compatibility to existing blocks using arrays, but not a big deal.

swissspidy · 2019-02-13T14:27:36Z

Let's explore this in a PR then 🙂

mcsf · 2019-02-15T11:44:35Z

To be clear: are you owning that, @swissspidy? Or would you like someone to help?

swissspidy · 2019-02-15T14:54:01Z

It would be great if someone else could help with this as I don't have that much time to devote to this at the moment.

mcsf · 2019-02-15T16:45:40Z

Alright then. Any takers, @maximebj, @bisko?

bisko · 2019-02-15T17:46:24Z

I can take on that sometime next week! It seems like a good problem to use to dive into the Gutenberg world!

bisko · 2019-02-26T12:25:59Z

I've been playing a bit with this last week and I think I won't be able to fully solve the issue short of what @gziolo suggested here.

The main issue is that we get an already translated block name/title when we build up the autocomplete cache. Relevant portion is in the loadOptions method - # - the data that gets passed contains all the data already translated and I'm not sure what's a good way to get the "source" data, without the translation.

If we look at the Quote block configuration for example - # - we can see the issue is how we define the data:

export const settings = {
	title: __( 'Quote' ),
	description: __( 'Maybe someone else said it better -- add some quoted text.' ),
	icon: <SVG ...(shorted)... /SVG>,
	category: 'common',
	keywords: [ __( 'blockquote' ) ],

If we didn't specify the title with translation directly, then we can use it to build up better set of keywords. Unfortunately if we only put the non-translated version there we have no good way of providing the translation markup that Babel uses to detect the strings.

I'm thinking of several possible solutions, none of which is ideal, but people with more experience can provide more feedback and thoughts here.

Provide untranslated block title in the set of keywords. This seems the best option if we want to keep the current settings syntax. It's now possible after Error: The block "xxx" can have a maximum of 3 keywords. #11949 got merged and lifted the 3 keywords limit.
Dynamically add the block id to the list of keywords after some cleanup as suggested by @gziolo here
Add a title_native or title_untranslated property to the block settings and add that to the list of keywords
This is just a theory, as I'm not that well versed in the JS build systems - During build time, capture all the translatable strings from title and keywords, make the list unique and then add this list to the list of keywords.
- For example in the Quote block above - during build, the process is going to grab 'Quote' out of title and put it into the keywords list, so in the end we'll have a list like keywords: [ __( 'blockquote' ), 'blockquote', __( 'Quote' ), 'Quote' ]

Options 1 and 3 will require manual work and maintenance on all current and future blocks to keep the keywords lists up to date.

Option 2 is a bit of a hack that solves the issue in the immediate future but as mentioned it's going to cause problems with generated names.

Option 4 seems (to me) as most future proof if the syntax keeps as it is now. Blocks that didn't follow the same build process will "revert" to the current behavior and not appear in the list during search, while blocks that had the "new" build process will appear properly translated.

I'm not sure what option is the best, as I mentioned above, so let's discuss that! :)

swissspidy · 2019-02-26T13:02:42Z

Can we perhaps leave it to translators to add the original untranslated string if they want to? Because always including the untranslated keywords doesn't make sense in every case because it adds unnecessary noise and leads to unexpected results for non-English speaking people.

This is easily possible if the keywords are just a comma separated string.

Take the Quote block as an example:

Instead of keywords: [ __( 'blockquote' ) ], we could use comma-separated strings, e.g. keywords: __( 'blockquote,quote' ),.

For the German translation, polyglots can then translate this to quote,zitat.

bisko · 2019-02-26T13:30:16Z

I want to add a bit of a background on why I got involved in this discussion out of nowhere, to hopefully clarify what the issue I'm trying to solve is.

I'm constantly switching between English and Bulgarian layouts throughout the day and also using WordPress(.com but that doesn't matter in this case) translated to Bulgarian (so I can get a sense of problematic translations and update them) to write on our internal P2s using Gutenberg.

With that constant switching, I'm often typing in the wrong language when I go back to an app and start typing in the wrong language.

Slack and PHPStorm handle this very well as I can go in and start typing the "English" version of what I want to type and it gives me what I wanted:

(note: I understand that's more of transliteration between Cyrillic and Latin scripts, but it has to do with the usability and user expectations)

Gutenberg on the other hand doesn't give me anything that's not an exact string match:

In this case I'm trying to add a Title block, which in Bulgarian is translated to Заглавие. Since I'm writing in English for my colleagues, I have to switch to Bulgarian, type Загл insert the block, switch back to English and continue typing.

An ideal case (for me) would be that I would be able to insert this title block with all the following options: Title, Титле, Заглавие, Zaglavie or a substring of that word.

Can we perhaps leave it to translators to add the original untranslated string if they want to?

With the above said, I don't fully agree to making this optional as it will create an annoying disparity between blocks that support that and blocks that do not, especially in multi-lingual setups.

Instead of keywords: [ __( 'blockquote' ) ], we could use comma-separated strings, e.g. keywords: __( 'blockquote,quote' ),.

For the German translation, polyglots can then translate this to quote,zitat.

I'm a bit worried of the manual approach here for translators. What happens if the plugin author updates the keywords? Wouldn't they need to be re-translated? It seems a bit more stable to have both translated and untranslated versions in the keywords list so if a new keyword is added only that one will be translated (if not already translated).

I'm sorry if I'm derailing the discussion off the main topic of the issue. I can open another issue to discuss the above if needed.

swissspidy · 2019-02-26T13:50:40Z

It seems a bit more stable to have both translated and untranslated versions in the keywords list so if a new keyword is added only that one will be translated (if not already translated).

True, it's more stable as in reliable, but as I mentioned sometimes also not necessarily wanted. Hence my suggestion to leave it to the locale managers.

What happens if the plugin author updates the keywords? Wouldn't they need to be re-translated?

Yes, but that is also the case when it's an array and the plugin author changes the keywords… So I don't see your point here.

Please see #6633 (comment) for my original reason for suggesting comma-separated strings.

If it's an array of 3 keywords, often times in German we have 4 or 5 keywords to describe the block. There's currently no way to support that.

That's why a comma-separated string is preferable.

I don't feel strongly about whether that ends up in comma-separated originals plus comma-separated translations being used for the autocomplete. That would solve both problems, no?

bisko · 2019-02-26T14:12:15Z

Yes, but that is also the case when it's an array and the plugin author changes the keywords… So I don't see your point here.

I think I gave a bad example here.

A single string would be as you mentioned above:

keywords: __( 'blockquote,quote' ) which the translator has to translate to blockquote,quote,zitat for German.

Then if the author wants to make the plugin accessible via the lyrics keyword (for song lyrics for example) and add lyrics to the mix, the string will become 'blockquote,quote,lyrics', which will require re-translation of the whole string in all languages as it has changed and no longer matches.

If it's kept as an array - keywords: [ __( 'blockquote' ), 'blockquote', __( 'quote' ), 'quote' ], the addition of lyrics will become just 2 more entries at the end of the array: [ __( 'blockquote' ), 'blockquote', __( 'quote' ), 'quote', __( 'lyrics' ), 'lyrics' ], meaning that the translators will have to translate only the word lyrics (which can also be already translated).

The autocomplete search already uses an array loop to find a match ( # ), so this change would be only required for the block configurations, not the search code.

If it's an array of 3 keywords, often times in German we have 4 or 5 keywords to describe the block. There's currently no way to support that.

That's a whole another set of problems for i18n :( Is there a supported way to add aliases for translations in the engine that's used by WordPress? That would be a great candidate.

Another way I'm thinking of right now would be to add a "dynamic" entry to the translation file, since it's already machine-generated (gutenberg.pot) that's something along the lines of <block-id>-aliases and translators can add all the aliases to that block for the language they're translating for?

This way we don't add too much syntax to the configuration and have the option of multiple aliases for the same block.

mcsf · 2019-02-26T14:13:32Z

I had a whole tirade with wild ideas about managing breaking changes to strings, etc., :) but the reality is that this is why there is a cycle to WP and Gutenberg development. In WordPress core, there is an actual schedule that encompasses string freezes. There is no such thing in the Gutenberg plugin, though; the closest equivalent would be the narrow window between a plugin release candidate and its release (typically occurring 48 hours later). So there's room for improvement if we want to provide some localisation stability for users of the plugin who have chosen something other than English.

Going back to the issue at hand, I think that, overall, comma-separated strings would solve most issues. As Pascal points out, it lets locale managers deal with each locale's idiosyncrasies. There are even more technical precedents, such as delegating decisions on font families, so it seems more than fair that they should now decide not only what synonyms to provide for each keyword, but also whether to include fallback strings.

mcsf · 2019-02-26T14:19:07Z

Then if the author wants to make the plugin accessible via the lyrics keyword (for song lyrics for example) and add lyrics to the mix, the string will become 'blockquote,quote,lyrics', which will require re-translation of the whole string in all languages as it has changed and no longer matches.

This is arguably a more tangible concern for third-party blocks than core ones, that's true. Honestly, though, given the flexibility of the comma-separated method, I think it would be fine if a block author particularly concerned with break translations and discoverability were to do the following:

Starts with keywords: [ __('blockquote,quote') ], which has been translated to target locales.
Decides to add lyrics: keywords: [ __('blockquote,quote,lyrics') ].
Wants to allow time for translators to catch up, so temporarily declares keywords: [ __('blockquote,quote,lyrics'), __('blockquote,quote') ].

It's up to the block author to manage these deprecations and clean up keywords afterwards, but it would solve the problem. Naturally, keywords would be compiled by flattening the arrays and uniq'ing.

swissspidy · 2019-02-26T14:27:06Z

That's a whole another set of problems for i18n :(

It's exactly what this issue here tries to address though. Quote:

In fact each language has specific needs (some won't need any keywords, others will need 2 or 3 according to the context).

miminari · 2020-06-05T14:58:02Z

I read this Issue. And I think It's better than now.
 Japanese users can't use these shortcuts actually.

So, I tested some keywords add to the button block like 3c7ec60 , I could get the button block by the keyword 'link' and 'button' in Japanese, not only 'ボタン' or 'リンク'. I also tested Heading blocks adding the keywords __( 'title,subtitle,heading' ),, I could get the heading block by 'heading'.

It's a big change for me (and maybe the other not English language users), so if this way looks good for you, I would like to add the keywords to the other core blocks ASAP.

paaljoachim · 2021-02-16T01:10:09Z

What is still needed for this issue?

mcsf · 2021-07-13T15:44:11Z

@gziolo: now that we have block.json with i18n support underway (wp-cli, register_block_type), do the circumstances of this PR change?

swissspidy · 2021-07-13T15:48:07Z

It's still an issue IMO, because block.json does not address the concern that other languages might need more (or less) keywords for a block.

gziolo · 2021-07-13T15:50:07Z

It's now automated so we always wrap the keyword with _x( keyword, 'block keyword', 'textdomain' ). Should we also inject some other keywords while registering a block?

swissspidy · 2021-07-13T16:04:11Z

It's necessarily about adding more keywords. In some languages less keywords would be sufficient.

I think the agreed-upon approach in the discussion on this ticket so far was to have a comma-separated list of keywords. That would solve most issues.

So in block.json:

{
  "keywords": "foo,bar,baz"
}

mcsf · 2021-07-13T16:14:44Z

I think the agreed-upon approach in the discussion on this ticket so far was to have a comma-separated list of keywords. That would solve most issues.

So in block.json:
{
  "keywords": "foo,bar,baz"
}

Commas seem safe enough, but: is there any chance that there's some locale out there that uses commas (,, not some Unicode variant) as something other than a word-or-clause separator? The only potential obstacle I found on Wikipedia was:

[…] The comma therefore functions as a silent letter in a handful of Greek words, principally distinguishing ό,τι (ó,ti, "whatever") from ότι (óti, "that").

mcsf · 2021-07-13T16:25:13Z

Presumably, there are ways around it:

We make it so that a locale can override the default token separator by providing its own string instead of ,
We define the token as , (comma + space) or anything with white space on either side of the comma.

swissspidy · 2021-07-13T16:30:36Z

cc @ocean90

Soean added the Internationalization (i18n) Issues or PRs related to internationalization efforts label May 8, 2018

danielbachhuber added [Type] Enhancement A suggestion for improvement. [Feature] Blocks Overall functionality of blocks labels May 10, 2018

danielbachhuber added this to the Bonus Features milestone May 10, 2018

mcsf mentioned this issue Aug 21, 2018

Slash inserter: enable searching blocks with more than one string match #8365

Closed

mtias added Future and removed Future labels Oct 7, 2018

mtias modified the milestones: Bonus Features, Future: 5.1 Onwards Oct 7, 2018

This was referenced Nov 21, 2018

Error: The block "xxx" can have a maximum of 3 keywords. #11949

Closed

Using "Keywords" to keep the native block name #11370

Closed

youknowriad modified the milestones: WordPress 5.x, Future Mar 25, 2019

youknowriad removed the Future label Mar 25, 2019

simison mentioned this issue Mar 20, 2020

Block picker; retain english terms when translating keywords #21044

Open

miminari added a commit to gutenbergfreaks/gutenberg that referenced this issue Jun 5, 2020

Add to keyword WordPress#6633

3c7ec60

cuemarie mentioned this issue Sep 7, 2022

Block Inserter not showing results for foreign character searches Automattic/wp-calypso#67352

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add keywords to core blocks for the sake of the translation #6633

Add keywords to core blocks for the sake of the translation #6633

maximebj commented May 8, 2018

mcsf commented Aug 21, 2018

chrisvanpatten commented Aug 21, 2018

mcsf commented Aug 21, 2018

swissspidy commented Nov 11, 2018

swissspidy commented Nov 13, 2018 •

edited

Loading

mcsf commented Nov 13, 2018

swissspidy commented Nov 13, 2018

mcsf commented Nov 13, 2018

gziolo commented Feb 13, 2019

maximebj commented Feb 13, 2019

mcsf commented Feb 13, 2019

maximebj commented Feb 13, 2019

swissspidy commented Feb 13, 2019

mcsf commented Feb 15, 2019

swissspidy commented Feb 15, 2019

mcsf commented Feb 15, 2019

bisko commented Feb 15, 2019

bisko commented Feb 26, 2019

swissspidy commented Feb 26, 2019

bisko commented Feb 26, 2019

swissspidy commented Feb 26, 2019

bisko commented Feb 26, 2019

mcsf commented Feb 26, 2019

mcsf commented Feb 26, 2019

swissspidy commented Feb 26, 2019

miminari commented Jun 5, 2020 •

edited

Loading

paaljoachim commented Feb 16, 2021

mcsf commented Jul 13, 2021

swissspidy commented Jul 13, 2021

gziolo commented Jul 13, 2021

swissspidy commented Jul 13, 2021

mcsf commented Jul 13, 2021 •

edited

Loading

mcsf commented Jul 13, 2021

swissspidy commented Jul 13, 2021

Add keywords to core blocks for the sake of the translation #6633

Add keywords to core blocks for the sake of the translation #6633

Comments

maximebj commented May 8, 2018

mcsf commented Aug 21, 2018

chrisvanpatten commented Aug 21, 2018

mcsf commented Aug 21, 2018

swissspidy commented Nov 11, 2018

swissspidy commented Nov 13, 2018 • edited Loading

mcsf commented Nov 13, 2018

swissspidy commented Nov 13, 2018

mcsf commented Nov 13, 2018

gziolo commented Feb 13, 2019

maximebj commented Feb 13, 2019

mcsf commented Feb 13, 2019

maximebj commented Feb 13, 2019

swissspidy commented Feb 13, 2019

mcsf commented Feb 15, 2019

swissspidy commented Feb 15, 2019

mcsf commented Feb 15, 2019

bisko commented Feb 15, 2019

bisko commented Feb 26, 2019

swissspidy commented Feb 26, 2019

bisko commented Feb 26, 2019

swissspidy commented Feb 26, 2019

bisko commented Feb 26, 2019

mcsf commented Feb 26, 2019

mcsf commented Feb 26, 2019

swissspidy commented Feb 26, 2019

miminari commented Jun 5, 2020 • edited Loading

paaljoachim commented Feb 16, 2021

mcsf commented Jul 13, 2021

swissspidy commented Jul 13, 2021

gziolo commented Jul 13, 2021

swissspidy commented Jul 13, 2021

mcsf commented Jul 13, 2021 • edited Loading

mcsf commented Jul 13, 2021

swissspidy commented Jul 13, 2021

swissspidy commented Nov 13, 2018 •

edited

Loading

miminari commented Jun 5, 2020 •

edited

Loading

mcsf commented Jul 13, 2021 •

edited

Loading