Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal for i18n support #355

Closed
yuhattor opened this issue Sep 15, 2022 · 10 comments · Fixed by #412
Closed

Proposal for i18n support #355

yuhattor opened this issue Sep 15, 2022 · 10 comments · Fixed by #412
Assignees
Labels
documentation enhancement New feature or request

Comments

@yuhattor
Copy link
Member

yuhattor commented Sep 15, 2022

Proposal

The Japanese community is gradually growing, and the innersourcecommons.org page should be its landing page. Therefore, I would like to propose the internationalization of the site.

Scope

contents, menus and footers.
However, details and reports of past events and Learning Path will be excluded from this translation

Where to start

We would like to translate the Japanese page first. However, if this is done well, the site will be able to be translated easily into other languages as well. We will try Japanese first as an experiment, and then will try to onboard other languages if there is someone who are willing to volunteer.

Some people in InnerSource Commons Japan will participate in the work, so I want to put the repository in an easy-to-understand place. If possible, it would be useful for me to have write permission for the repo. But I don't want to break the environment since I don't yet know some of the things on the innersourcecommmons.org page. (Well, I won't be committing to the main branch for the near future, so I think I'll be fine.)
But I know there is governance in managing innersourcecommons/innersourcecommmons.org, so if there is a proper way to do it, please let me know sometimes...

I'm currently working on a repository at yuhattor/innersourcecommons.org.
We believe that a Japanese page is essential for the Japanese community. The amount of translation is not so large, so we hope to release it in November.

Experiment

I set up an experimental site here, and working at jp-translation branch on my github repository

I've only translated the top page, so the rest will get 404.

image

What I learned after experiment

As I worked on the translation, I noticed several things.

  • There is an i18n settings folder, but it is not used
  • config.yaml can be split into config/_default/config.yaml, config/_default/languages.yaml and etc. to make it easier to i18n things like menu notation.
  • Since hugo is used, the md files are in a format that allows for easy translation to begin, and we can translate them in the same style as we have done with Learning Path. However, files like layouts/shortcodes/about-text.html have some English embedded in them. it would be easier to manage these parts of html by moving them out to i18n/en.yaml and so on.
  • Right now, we have to go to the Learning Path page to get the language selection screen in the upper right corner. For supported languages, I would like to display the language selection on all pages
  • *Optional: There are several relatively large png, jpg and pdf files in the repository. This is also causing the build time to be very long. It might be a good idea to compress image files into webp, etc. and manage pdfs in Object Storage, etc. Especially since pdfs are not updated often.

I would like to hear your opinions.
Especially about the structure of hugo and how to operate it.
I wonder why Learning Path is in different repositories, is there some background that prevents integration?

+ sorry, the bug label is not applicable for this. Please change it...

@yuhattor yuhattor added the bug Something isn't working label Sep 15, 2022
@lenucksi lenucksi added enhancement New feature or request help wanted Extra attention is needed and removed bug Something isn't working labels Sep 17, 2022
@lenucksi
Copy link
Member

Maybe @voborgus knows more on the magic required here as he's been really instrumental with the transition to the new Hugo powered site.

@yuhattor
Copy link
Member Author

yuhattor commented Sep 17, 2022

@lenucksi Thank you!
I have started contacting him:)

I've also been trying to experiment with the HUGO site and have an update.
As a matter of fact, the foundation for i18n support is almost complete. The experiment went rather well.

I've created a sample site with multilingual support. Please take a look at the page. This URL is a .jp domain that I took for testing purposes when I decided to create a InnerSource Commons Japan. It is not intended to be used in production or indexed by search engines. I plan to delete it when I have finished checking it.
https://www.innersourcecommons.jp/ja/

And also, please see my jp-translation branch of the repository for code changes.

Here are the main points

  • i18n configuration files for each language are added under i18n/.
  • Split config.yaml and defined menus for each language.
  • Translated the Japanese main page as a sample. Whole Japanese translations will be made soon.
  • Since multilingual support was not considered, all content was under content/, but I have split the directory structure into content/en, content/ja, etc.
  • All content was under content/ because multilingual support was not considered. I have divided the directory structure into content/en, content/ja, and so on.
  • All English content has been ported under en/.
  • learning-path content has also been placed in the appropriate location for each language.
  • Learning Path contents are written in asciidoc form, and I confirmed that it is automatically committed to the innersourcecommons.org repository via GitHub Actions. From now on, we need to make sure that it can support new structure. We need a GitHub Actions pipeline for that.
  • Hugo Multilingual Hack
    • To make the Hugo site multilingual, we need to prepare all files for each language, not just some of them.
    • I have established an easy and simple way to manage this using GitHub Action.
    • Added detailed procedure for local test to README.md. For more information on how to manage Hugo, please refer to the README on jp-translation.
      https://github.com/yuhattor/innersourcecommons.org/tree/jp-translation
  • Japanese translation is OK. The Japanese community will do it. Let's recruit people to translate de, es, fr, it, ru, zh!
  • Need the function to open an Issue for each language when the content of en is changed. By doing so, we can update the languages we write in without omissions. This could be done via GitHub Action.
  • Many sites are supposed to display English content when there is a missing page in specific language. However, it is inconvenient that every time you go to a missing page, it leaves your language and changes the locale of the page. So, I just want to be able to see the untranslated pages properly even if the page is on /ja, /de, etc. But on the other hand, I don't think it's a good idea for Google to index untranslated pages in specific locale. I guess Google's crawlers are smart enough these days to eliminate duplicates. But if we concern about it, we may want to make sure that pages copied from en are properly listed in robots.txt so that they are not indexed by search engines.
  • Untranslated pages in each language are marked with "This page is untranslated. Please contribute!" on the untranslated pages in each language. This might attract new contributors.
  • How about putting a message "This page is untranslated. Please contribute!" on the untranslated pages in each language? Doing so may attract new contributors.
    fyi: @voborgus

@voborgus
Copy link
Collaborator

voborgus commented Oct 1, 2022

I'm back @yuhattor !
Wow you've did a truly impressive work here. Thanks!
The main issue I see in the subsequent maintenance of the language versions. Since we're small, we don't have resources to translate all dynamic content to all supported language: events, announcements, etc. (and even static – there are a lots of untranslated learning paths). And if we'll have an issues created for any change in English pages, so we can easily become over-flooded by opened issues :)

Text in {{ shortcodes }} can be easily internationalised with the syntax like{{ T "about_text" }}

How about putting a message

Awesome idea. A quick solution comes to me is to play around {{ if .IsTranslated }} and matching the list of languages with the /[CURRENT_LANGUAGE/ pattern in the .Permalink using in. But maybe there are more elegant solutions exists :)

Many sites are supposed to display English content when there is a missing page in specific language. However, it is inconvenient that every time you go to a missing page, it leaves your language and changes the locale of the page.

I think it can lead also to the content inconsistency, and this can be a real problem. If we want to save the language setting even if we go to untranslated pages, we can consider the other solution is to store the state of the language in Cookie file, and then if you're on English but your setting is Japanise, and the translation is available – redirect you to the localised versoin. This hack is still a little bit ugly, but maybe have less concerns. What do you think?

Let's recruit people to translate de, es, fr, it, ru, zh!

Happy to do that for Russian!

Thanks a lot for this contribution! Just a few steps and we'll have a multilingual site. Amazing! :)

@yuhattor
Copy link
Member Author

@voborgus
Thank you:)

The main issue I see in the subsequent maintenance of the language versions. Since we're small, we don't have resources to translate all dynamic content to all supported language: events, announcements, etc. (and even static – there are a lots of untranslated learning paths). And if we'll have an issues created for any change in English pages, so we can easily become over-flooded by opened issues :)

I can't agree more. I think events and announcements should be the last things to be done.
I believe GitHub Actions can help solve this problem. For example, a workflow could be added to add a comment to an Issue each time content is edited in the original English version of the translated file.
This is worth experimenting with. It would also be great if this could be done, as it could be applied to existing InnerSource Patterns repositories that have internationalization support.

How about putting a message

Awesome idea. A quick solution comes to me is to play around {{ if .IsTranslated }} and matching the list of languages with the /[CURRENT_LANGUAGE/ pattern in the .Permalink using in. But maybe there are more elegant solutions exists :)

yea:) let's do that:)

Many sites are supposed to display English content when there is a missing page in specific language. However, it is inconvenient that every time you go to a missing page, it leaves your language and changes the locale of the page.

I think it can lead also to the content inconsistency, and this can be a real problem. If we want to save the language setting even if we go to untranslated pages, we can consider the other solution is to store the state of the language in Cookie file, and then if you're on English but your setting is Japanise, and the translation is available – redirect you to the localised versoin. This hack is still a little bit ugly, but maybe have less concerns. What do you think?

Actually, I've already solved the issue:)
As for this one, probably the easiest way to hack it. In the GitHub Action workflow, I added the job below. This allows us to copy the contents of /content/en that are missing for each locale and add them to the artifacts.
By doing this in the pipeline and supplying the missing parts only to the gh-pages branch, we can keep the original edited file clean. This also allows us to distinguish between translated and untranslated files in the repository. New news, for example, can be displayed in English.

      - name: Copy the missing files from /content/en for publishing each language site
        run: |
          for i in de es fr it ja ru zh; do
            rsync -rv --ignore-existing content/en/ content/$i/ --log-file=content/.gitignore;
            sed -i "s/^.\{37\} /$i\//g" content/.gitignore;
            sed -i '/total size\|file list/d' content/.gitignore;
          done

The sed part is also a hack. This is a setting that does not actually go into the pipeline and will be removed from the pipeline, but is necessary locally. For more information and procedure, please take a look at the README.md

When you try to start the hugo server locally, if there are missing pages, it will not work as expected. So we need to supplement the files for each locale with rsync locally as well. I am adding the target files to content/.gitignore for automatic addition and cleaning up after the edits are done.

This is a bit tricky, but it sure works. These things don't affect people working in an English environment at all, but for people who want to check their language translations locally, it is a bit confusing at first and needs to be made a bit more elegant by putting it into commands.

Let's recruit people to translate de, es, fr, it, ru, zh!
Happy to do that for Russian!

If you don't mind, can you give me write access?
Initially there will be nothing to do with master branch. I'd like to start by creating a jp-translation branch of innersourcecommons/innersourcecommons.org and work on it there. That way, Japanese translation members of InnerSource Commons will feel like they are contributing to InnerSource Commons itself, not only "InnerSource Commons Japan" project.
This would be great and better than committing to a specific person's(my) forked repository...

I would also like to do a PoC on i18n support and managing updates to translated pages to bring consistency to the content:)

@voborgus
Copy link
Collaborator

If you don't mind, can you give me write access?

Sure, you should have access to push to the branches except master since it protected.

@yuhattor
Copy link
Member Author

Thank you!!!

@yuhattor
Copy link
Member Author

yuhattor commented Nov 1, 2022

Structure for i18n Support

If we want to support i18n, the directory structure of the site must be changed.
A structure change is as simple as adding one more layer of directories like en/, ja/ and ru/ under contents/. However, from a git perspective, it involves a major change to the codebase. If we work on a structure change and a translation project at the same time, the differences from the master will be large and merging will be very difficult.

For this reason, I propose a two-step process to achieve i18n compatibility.

  1. Change only the structure. Hide internationalization support. At this point, there are zero changes in appearance
  2. Release of each language

Especially the first step would be easy. There is already a sample at innersourcecommons.jp and you can see that all the pages are working fine.

Dependency

However, there is one challenge. That is the dependency of Learning-Path. Every time Learning-Path is updated, the github-bot commits to the innersourcecommons.org repository.
This time, we need to figure out how to adapt the contents of Learning-Path to the new directory structure.

There would be four ways to resolve this dependency

Name Difficulty Detail Pros Cons
1. Let innersourcecommons.org pull LearningPath content. Currently, Learning Path GitHub Actions push to innersourcecommons.org, but this should be eliminated so that innersourcecommons.org pulls Learning Path content. The GitHub Actions pipeline will move the pulled files to the appropriate location in the new i18n-compliant directory structure and commit the changes. On the LearningPath side, all you need to do is to stop Actions. this is a simple method since most of the work is completed at innersourcecommons.org. It is not possible to trigger GitHub Actions on Innersourcecommons.org when the Learning Path repository changes. Therefore, it is difficult to reflect Learning Path changes to innersourcecommons.org in real time. To solve this problem, either run an Action on innersourcecommons.org that periodically checks for changes in the Learning Path every 15 minutes or so, or place a manually triggered Action on the innersourcecommons.org side and run it on innersourcecommons.org. LearningPath side and kick it from the LearningPath side.
2. Update Learning Path GitHub Actions The GitHub Actions for the Learning Path copy changes to the innersourcecommons.org repository for updates. To make innersourcecommons.org i18n compliant, it is only necessary to change this copying method to copying to different directories for different languages. This can be done using command line techniques. Only command line techniques can solve this problem; you only need to change one file in GitHub Actions. Nothing else is affected. Changes need to be made to two repositories: innersourcecommons.org and Learning Path.
3. Use GitHub Actions on the innersourcecommons.org side ★★ Since no changes are going to be made to LearningPath, the GitHub Actions for LearningPath will continue to be committed to the contents/ directory of innersourcecommons.org. On the other hand, after internationalization, we want the correct LearningPath module to be placed under contents/en in the repository on the innersourcecommons.org side. Therefore, new GitHub Actions Pipeline will be created. In the pipeline, the contents of LearningPath will be moved to the new directory structure at the time of publishing GitHub Pages. The migration can be completed without any changes on the LearningPath side. When debugging locally on innersourcecommons.org, the Learning Path files are not in the correct location in the i18n-supported directory structure, so it is necessary to run a script that moves the files locally each time, which is cumbersome.
4. Move the Learning Path repository itself to the innersourcecommons.org repository ★★★★★ I'm reluctant to do this, because I think it's a long way to go, and because of the background of having separate repositories. However, in case there is a plan to merge the repositories into innersourcecommons.org, this might be a good opportunity to do so. I know nothing about the InnerSource Learning Path and would like others to consider this a possibility. I don't think it's a matter for me to decide. The cleanest way to achieve this. It requires a very large amount of effort including transition of TC, rule making, and governance.

@yuhattor
Copy link
Member Author

yuhattor commented Nov 1, 2022

@voborgus @rrrutledge
Any opinion...? I think 2 is the best one.

@rrrutledge
Copy link
Contributor

Yes #2. And there will be additional learning path content, and additional translations of that content. Great job figuring this out.

@yuhattor yuhattor linked a pull request Nov 28, 2022 that will close this issue
@yuhattor yuhattor reopened this Dec 6, 2022
@yuhattor yuhattor removed the help wanted Extra attention is needed label Dec 6, 2022
@yuhattor
Copy link
Member Author

yuhattor commented Dec 6, 2022

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants