Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gurmukhi transliteration: addressed overapplication of virama, normalized nukta characters #698

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

bgo-eiu
Copy link

@bgo-eiu bgo-eiu commented Aug 28, 2022

Addresses https://phabricator.wikimedia.org/T91159

  1. Halant/virama no longer applies by default. A tilde ~ may simply be typed instead. The reasons to use conjunct characters in Gurmukhi are too rare to justify placing this everywhere by default.
  2. While it is true that there are only three common conjuncts as pointed out in the issue, the tilde would allow input of the 3 common ones and the uncommon ones alike.
  3. Numerals left alone per request from other users.
  4. Full stop may be added now with 'Z'; this is what Hindi transliteration input already does

Additional: normalized the nukta characters by using the standalone unicode characters for them wherever possible rather than combining characters. Made 'q' kaka pair bindi because even though this is not that common, it is still more common than udaat, which I have changed to 'Q'. Added ways to type all the common nukta/bindi combinations.

@kartikm
Copy link
Member

kartikm commented Aug 29, 2022

Can you please check for failing tests and update the pull request?

@bgo-eiu
Copy link
Author

bgo-eiu commented Sep 2, 2022

Yes thank you for pointing that out. I will update the tests when I get a chance

@bgo-eiu
Copy link
Author

bgo-eiu commented Sep 4, 2022

There is a block here actually which becomes a problem - Wikipedia applies NFC normalization to Gurmukhi which changes the nukta characters to their legacy decomposed forms. This breaks a number of URLs to Punjabi external links which have characters like ਫ਼ in them. It also forces users to press backspace more than once to delete single letters, and can result in some typographic inconsistency. Wikimedia needs to support theses characters: ਫ਼ ਲ਼ ਸ਼ ‍ਗ਼ ਖ਼ ਗ਼ without decomposing them

@ramSeraph
Copy link

ramSeraph commented Oct 29, 2024

There is a block here actually which becomes a problem - Wikipedia applies NFC normalization to Gurmukhi which changes the nukta characters to their legacy decomposed forms. This breaks a number of URLs to Punjabi external links which have characters like ਫ਼ in them. It also forces users to press backspace more than once to delete single letters, and can result in some typographic inconsistency. Wikimedia needs to support theses characters: ਫ਼ ਲ਼ ਸ਼ ‍ਗ਼ ਖ਼ ਗ਼ without decomposing them

@bgo-eiu Can you explain the 'typographical inconsistency' part? I am mostly curious. I thought NFC was not supposed to affect the rendering shaping given that harfbuzz probably also does NFC internally. Is this related to a different rendering shaping engine?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants