-
-
Notifications
You must be signed in to change notification settings - Fork 98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow Unicode characters in GDScript identifiers #916
Comments
I think @vnen was working on adding this a few days ago. |
BTW, I did this very naively in this example, accepting anything that's beyond basic ASCII range. This would accept symbols and things that look like space to be part of identifiers. Doing this properly would require following the Unicode Standard Annex 31: https://unicode.org/reports/tr31 Or maybe we can expect users to use this responsibly and any report of those characters being allowed will be closed as not-a-bug. |
@vnen We could probably disallow having irregular whitespace characters anywhere else than in strings and comments. |
I would like to support the use of unicode for identifiers。 |
The popular c# and python both support unicode, and if godot wants to have more users in the non-English speaking world, it must support unicode |
This is an open discussion, there's no conclusion yet. There's a few challenges to overcome:
That's already the case, I think. But to forbid those inside identifiers would need a blacklist of sorts. |
Yes, I very much want it. But perhaps it might cause trouble to implement something like that in declarations. For strings & comments however I am fully aboard. |
It already works on strings and comments. The problem, as I mentioned, is that the code editor font doesn't have full Unicode and it doesn't allow fallback fonts. So if you want emoji or something, you have to change to a font which support those (like I did in the example image) and I couldn't find a monospace font that worked. |
So what we would need to do is find a monospace font that would look good in Godot's code editor, has full unicode support and the proper licensing... then we can make it a proposal so that emoji support could be implemented, correct? |
That's probably it, I don't know. |
@agameraaron I don't know of any open source monospace font that includes good emoji support. Hack and its parent DejaVu Sans Mono have a very extensive character set, but they don't support colored emoji. (Monochrome emoji can be tough to understand, so I wouldn't recommend settling for them.) Also, why wouldn't the code editor font allow fallbacks? It uses a DynamicFont just like everything else in the Godot editor. |
The problem is that the settings only ask for a font path, not a DynamicFont. It doesn't have any fallback option. That's probably easy to solve but right now it's an issue. |
@agameraaron
No, it is already supported (in comments and strings that is). It's just that the editor default font doesn't have emojis. So if you have emojis in there they won't be shown, which can be confusing (but nothing is really stopping one from doing it). If you use an external editor you can see those characters. What we need is fallback font setting to show all characters by default. It doesn't matter much if emojis are monospaced IMO. But the regular characters should be. The proposal here is for identifiers, which currently don't allow anything other than basic ASCII letters and numbers (and underscore). But that requires following the standard, at least in my view, which is not trivial. |
@vnen Right, that makes sense. We should probably find a way to load the system emoji font as a fallback, as emoji fonts are notoriously large in terms of file size (bundling them in the binary would likely enlarge it significantly). However, the exact paths for these fonts are OS-specific and often require guesswork. |
Even if this was supported, isn't it usually considered best practice to write code in English? Also, I have cross-language portability concerns with this proposal, since EDIT: But yes let's use ID_Start and ID_Continue for identifiers. |
It is already possible to name a GDScript function (And GDNative can already assign arbitrary character arrays for function names, likely including empty strings. Restricting those so that all languages are "happy" is not going to be elegant.) |
It's not valid in Python either. Hence my concern with following the proper UAX#31 standard, which would correctly disallow some weird stuff in identifiers in general, while still allowing multi-language support. |
What I'm interested in is that variable names can use Chinese characters or French characters. I'm not interested in using emoticons for variable names. |
Unless 'weird stuff' covers scientific symbols like https://en.wikipedia.org/wiki/Astronomical_symbols (mostly interested in Sun and Earth) or ρ (the lower case Greek letter rho) which is used for density, or other such stuff, I don't care. Emoji aren't the reason I posted the proposal, after all. |
@Zireael07 BTW those don't seem to be allowed in Python. Not sure about other languages. For me weird are things that gets confusing, like other types of space or a quote character that make it look like a string (my main concern about this is editors/keyboards that might insert them, or a copy-paste with formatting). But again, maybe we can just don't care at all and let users use whatever they want in the identifiers. |
I think we shouldn't focus discussion too much on esoteric "joke" uses like using emojis as identifiers, but indeed as @txj-mssl expresses it, being able to write identifiers in say Chinese is an important part of making the engine accessible. Some of us might have a strong bias as native speakers of languages using Latin script, and comfortable with English, that identifiers/code should be in English as much as possible, but this is not true of everyone and some users might have valid reasons for wanting to write their code with e.g. Chinese characters. So if it's easy to enable and doesn't impact performance, I think we should do it. There might be additional complexity if some users want to use e.g. RTL scripts in their code (like Arabic identifiers), but I believe @bruvzg's work on Complex Text Layouts might already be capable of handling it (or close to it). |
If we're concerned about users adding non-printable Unicode characters by mistake, we could maybe add a warning for that (disabled by default, to allow Unicode out of the box). So users who suspect something can enable the warning and be notified of non-ascii characters used in identifiers. |
I agree 100% with this. The question is mostly how (or rather, whether we should filter the list of allowed characters not). It's quite easy to just allow everything beyond ASCII range, as those are not used in any token. Just like I did in my small sample (#916 (comment)). It would also not be hard to add a warning if any of such characters are used, but if the user is already using identifiers like this, they won't be able to tell if there's some invisible character somewhere (which include characters not supported by the editor font), as they would be flooded by this warning if turned on. I believe we should follow the UAX#31 standard, which is meant for this purpose (it defines what is a valid identifier). This requires much more work (including reading the standard itself), but makes everything safer. Although I don't think it allows some special symbols, which some people wants to use (e.g. godotengine/godot#24785 (comment)). We could potentially break the standard to allow some extra symbols that people want to use, but I would rather avoid it if possible. This still would cover the case of people writing identifiers in their preferred language, which is the important part IMO. |
Also, before I had some concern with Unicode support across platforms, but now with the changes by @bruvzg this is not an issue anymore. |
It might be a good idea/ workaround to have a warning triggered when
It wouldn't be risky anymore to use unicode and it wouldn't make it necessary to have a big font in godot (and we all care love godot beeing a small binary so thats <3). |
I'm a little confused on why we just don't let someone use whatever characters they want? I mean it's their choice if they use this emoji 💩 or letter in their native language. No matter how silly anyone thinks it is. I understand that this might add more complexity to the language and editor but I think it's worth it in the end because accessibility is worth it. Also if we need a font that supports this we could just use JetBrains Mono. It's relatively small coming in at 2.8mb, it free and open source, supports 145 languages, and it looks quite pretty as it was designed with code in mind. |
See godotengine/godot#36198. Note that the character set of Hack (the current script editor font) is actually more extensive than JetBrains Mono's. |
rustc also warns you about potentially confusing identifier names:
|
Add an option "Allow any Unicode" which is turned off by default in the next version? |
This should not be implemented as an option, as GDScript behavior should not depend on project settings to be more predictable across projects. We either go all in, or don't do it 🙂 |
Did the introduction of ICU (via complex text layout) make following UAX#13 easier? |
The complex text layout server may be disabled at build-time, so GDScript behavior shouldn't rely on it either. Godot still has a lightweight fallback text server to use when you want to optimize the binary size. You can select this Fallback text server by building with |
IMO we should follow UAX #31 as suggested by some here. If there's a standard for this exact use case, we should aim to follow it. |
I mean, if we were to make GDScript support Unicode Identifiers, does it make sense to make GDScript rely on ICU? |
We could potentially make this only available if ICU is present. I assume anyone using it in identifiers are also using for other things and thus needed it already anyway. I took a look and the ICU library does makes this much easier to implement. |
This might be an issue for headless usage, since ICU would become a dependency for your project on the server side any of your scripts (or add-ons!) happen to use Unicode identifiers. In other words, I don't think this behavior should be made an option: #916 (comment) |
CC @bruvzg |
ICU do not have full UAX#31 implementation, but provide sufficient amount of glyph properties to implement it. Here's a draft for UAX#31 - godotengine/godot#53956 |
UAX#31 itself do not include any features to check for confusing elements. ICU have it implemented in the If similar functionality is desired, we'll need to include ICU |
very thanks Rémi Verschelde and everybody,l see 3.4rc1 support editer chineses document,my heart is jump,I will try to promote the use of Godot in China。 |
could this be allowed by parsing an annotation like |
While better than project settings, I'm still not fond of keywords/annotations that allow changing language behavior. It'll break code if you copy-paste it between scripts without also copying the keyword/annotation. |
Maybe it's late to reply :o, but I would like to point out that "best practice to write code in English", is for common categories. The words involved are often common words and technical terms, both are easy to translate, easy to understand for people who use them. But in game-dev, there are many things can't or hard to translate into meaningful English words. There are many unique words Let's take an example. In Chinese mythology, there are things called "Jing Luo"(经络), which refer to the paths through which certain mystical substances (generally translated as Qi, but it's not accurate) flow within the human body. This word does have translation, it's "meridian", which usually means "one of the lines that is drawn from the North Pole to the South Pole on a map of the world". They are unrelated, which make things getting worse. And what about “小周天”(Xiao Zhou Tian, means a certain way of Qi running in Jing Luo)? How can this translate into a word? So, we common use it's pinyin (A method of recording the pronunciation of Chinese characters, similar to romaji in Japanese) to give these identifiers a name. Here is a horrible code snippet: # Xian Ren -> 仙人,Similar to "God" in Roman mythology but not
class_name XianRen extends CharacterBody2D
## Zhen Qi Capacity -> 真气容量,Similar concept to "Qi" or ether but not,
## this means the capacity of Zhen Qi for this character
export var zhen_qi_capacity: int
# Zhen Qi Yun Zhuan Fang Shi -> 真气运转方式,means how Zhen Qi running in body.
enum ZhenQiYunZhuanFangShi {
# Xiao Zhou Tian -> 小周天,A certain way of Qi running in body
XiaoZhouTian,
# Da Zhou Tian -> 大周天,Another way of Qi running in body ...
DaZhouTian,
...
}
...
# more terrible functions and variables below ...
Even Chinese programmers feel pain to understand the code above. And it's completely unreadable for people living in English-Speaking countries. The idea behind using "best practice" is to make communication easier, but in this case, it brings nothing good for anyone. And I think this is a common problem in non-English-Speaking countries. |
woot woot! It's taken so long I've forgotten I was the person who originally requested this :P M☉, here I come! |
Is there any chance that this could be backported to 3.x? |
3.x doesn't have major changes to GDScript planned (and 3.6 is in feature freeze). Also, 3.x doesn't have TextServer which is a prerequisite for this to be implemented. Therefore, there are no plans to backport this to 3.x. Not to mention this change could break expectations in third-party tools that read GDScript code written for Godot 3.x (while this change has been part of Godot 4.x from the start). |
Describe the project you are working on:
2d space game
Describe the problem or limitation you are having in your project:
Can't use scientific symbols or accented letters (and my native language has some, often creating minimal pairs with unaccented ones) in variable names (scientific symbols would massively shorten some variables I use)
Another example use case: godotengine/godot#24785 (comment)
Describe the feature / enhancement and how it helps to overcome the problem or limitation:
Allow unicode characters in GDScript identifiers
Describe how your proposal will work, with code, pseudocode, mockups, and/or diagrams:
If this enhancement will not be used often, can it be worked around with a few lines of script?:
Nope, requires core changes (parser)
Is there a reason why this should be core and not an add-on in the asset library?:
Not possible to do via add-on due to parser changes.
Original issue: godotengine/godot#24785
IIRC this is not covered in @vnen's GDScript rework.
The text was updated successfully, but these errors were encountered: