Skip to content

Commit

Permalink
Document non-regex constants, closes #10
Browse files Browse the repository at this point in the history
  • Loading branch information
janlelis committed Oct 20, 2024
1 parent 37d08f2 commit 7aa5987
Show file tree
Hide file tree
Showing 6 changed files with 62 additions and 2 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,7 @@ Regex | Description | Example Matches | Example Non-Matc
11) Basic Default Text Presentation Characters or Basic Emoji with Text Presentation Selector
12) Non-Emoji (unqualified) keycap sequence

Regex | 1 RGI/FQE | 2 RGI/MQE | 3 RGI/UQE | 4 Non-RGI | 5 Valid Region | 6 Any Region | 7 RGI Tag | 8 Valid Tag | 9 Any Tag | 10 Basic Emoji | 11 Basic Text | 12 Text Keycap
Regex | 1 RGI/FQE | 2 RGI/MQE | 3 RGI/UQE | 4 Non-RGI | 5 Valid Re­gion | 6 Any Re­gion | 7 RGI Tag | 8 Valid Tag | 9 Any Tag | 10 Basic Emoji | 11 Basic Text | 12 Text Key­cap
-|-|-|-|-|-|-|-|-|-|-|-|-
REGEX | ✅ | ❌ | ❌ | ❌ | ✅ | ❌ | ✅ | ❌ | ❌ | ✅ | ❌ | ❌
REGEX INCLUDE TEXT | ✅ | ❌ | ❌ | ❌ | ✅ | ❌ | ✅ | ❌ | ❌ | ✅ | ✅ | ✅
Expand Down
2 changes: 1 addition & 1 deletion data/generate_constants.rb
Original file line number Diff line number Diff line change
Expand Up @@ -223,7 +223,7 @@ def compile(emoji_character:, emoji_modifier:, emoji_modifier_base:, emoji_compo
)
)

# Matches any emoji-related codepoint - Use with caution (returns partial matches)
# Same as \p{Emoji} - to be removed or renamed
regexes[:REGEX_ANY] = Regexp.compile(emoji_character)

# Combined REGEXes which also match for TEXTUAL emoji
Expand Down
5 changes: 5 additions & 0 deletions lib/unicode/emoji.rb
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,10 @@ module Emoji
autoload const_name, File.join(generated_constants_dirpath, const_name.downcase)
end

# Return Emoji properties of character as an Array or nil
# See PROPERTY_NAMES constant for possible properties
#
# Source: see https://www.unicode.org/Public/16.0.0/ucd/emoji/emoji-data.txt
def self.properties(char)
ord = get_codepoint_value(char)
props = INDEX[:PROPERTIES][ord]
Expand All @@ -45,6 +49,7 @@ def self.properties(char)
end
end

# Returns ordered list of Emoji, categorized in a three-level deep Hash structure
def self.list(key = nil, sub_key = nil)
return LIST unless key || sub_key
if LIST_REMOVED_KEYS.include?(key)
Expand Down
16 changes: 16 additions & 0 deletions lib/unicode/emoji/constants.rb
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ module Emoji
DATA_DIRECTORY = File.expand_path('../../../data', __dir__).freeze
INDEX_FILENAME = (DATA_DIRECTORY + "/emoji.marshal.gz").freeze

# Unicode properties, see https://www.unicode.org/Public/16.0.0/ucd/emoji/emoji-data.txt
PROPERTY_NAMES = {
E: "Emoji",
B: "Emoji_Modifier_Base",
Expand All @@ -17,13 +18,28 @@ module Emoji
X: "Extended_Pictographic",
}.freeze

# Variation Selector 16 (VS16), enables emoji presentation mode for preceding codepoint
EMOJI_VARIATION_SELECTOR = 0xFE0F

# Variation Selector 15 (VS15), enables text presentation mode for preceding codepoint
TEXT_VARIATION_SELECTOR = 0xFE0E

# First codepoint of tag-based subdivision flags
EMOJI_TAG_BASE_FLAG = 0x1F3F4

# Last codepoint of tag-based subdivision flags
CANCEL_TAG = 0xE007F

# Tags characters allowed in tag-based subdivision flags
SPEC_TAGS = [*0xE0030..0xE0039, *0xE0061..0xE007A].freeze

# Combining Enclosing Keycap character
EMOJI_KEYCAP_SUFFIX = 0x20E3

# Zero-width-joiner to enable combination of multiple Emoji in a sequence
ZWJ = 0x200D

# Two regional indicators make up a region
REGIONAL_INDICATORS = [*0x1F1E6..0x1F1FF].freeze
end
end
36 changes: 36 additions & 0 deletions lib/unicode/emoji/lazy_constants.rb
Original file line number Diff line number Diff line change
Expand Up @@ -2,19 +2,55 @@

module Unicode
module Emoji
# The current list of codepoints with the "Emoji" property
# Same characters as \p{Emoji}
# (Emoji version of this gem might be more recent than Ruby's Emoji version)
EMOJI_CHAR = INDEX[:PROPERTIES].select{ |ord, props| props.include?(:E) }.keys.freeze

# The current list of codepoints with the "Emoji_Presentation" property
# Same characters as \p{Emoji Presentation} or \p{EPres}
# (Emoji version of this gem might be more recent than Ruby's Emoji version)
EMOJI_PRESENTATION = INDEX[:PROPERTIES].select{ |ord, props| props.include?(:P) }.keys.freeze

# The current list of codepoints with the "Emoji" property that lack the "Emoji Presentation" property
TEXT_PRESENTATION = INDEX[:PROPERTIES].select{ |ord, props| props.include?(:E) && !props.include?(:P) }.keys.freeze

# The current list of codepoints with the "Emoji_Component" property
# Same characters as \p{Emoji Component} or \p{EComp}
# (Emoji version of this gem might be more recent than Ruby's Emoji version)
EMOJI_COMPONENT = INDEX[:PROPERTIES].select{ |ord, props| props.include?(:C) }.keys.freeze

# The current list of codepoints with the "Emoji_Modifier_Base" property
# Same characters as \p{Emoji Modifier Base} or \p{EBase}
# (Emoji version of this gem might be more recent than Ruby's Emoji version)
EMOJI_MODIFIER_BASES = INDEX[:PROPERTIES].select{ |ord, props| props.include?(:B) }.keys.freeze

# The current list of codepoints with the "Emoji_Modifier" property
# Same characters as \p{Emoji Modifier} or \p{EMod}
# (Emoji version of this gem might be more recent than Ruby's Emoji version)
EMOJI_MODIFIERS = INDEX[:PROPERTIES].select{ |ord, props| props.include?(:M) }.keys.freeze

# The current list of codepoints with the "Extended_Pictographic" property
# Same characters as \p{Extended Pictographic} or \p{ExtPict}
# (Emoji version of this gem might be more recent than Ruby's Emoji version)
EXTENDED_PICTOGRAPHIC = INDEX[:PROPERTIES].select{ |ord, props| props.include?(:X) }.keys.freeze

# The current list of codepoints with the "Extended_Pictographic" property that don't have the "Emoji" property
EXTENDED_PICTOGRAPHIC_NO_EMOJI= INDEX[:PROPERTIES].select{ |ord, props| props.include?(:X) && !props.include?(:E) }.keys.freeze

# The list of characters that can be used as base for keycap sequences
EMOJI_KEYCAPS = INDEX[:KEYCAPS].freeze

# The list of valid regions
VALID_REGION_FLAGS = INDEX[:FLAGS].freeze

# The list of valid subdivisions in regex character class syntax
VALID_SUBDIVISIONS = INDEX[:SD].map{_1.sub(/(.)~(.)/, '[\1-\2]') }

# The list RGI tag sequence flags
RECOMMENDED_SUBDIVISION_FLAGS = INDEX[:TAGS].freeze

# The list of fully-qualified RGI Emoji ZWJ sequences
RECOMMENDED_ZWJ_SEQUENCES = INDEX[:ZWJ].freeze
end
end
3 changes: 3 additions & 0 deletions lib/unicode/emoji/list.rb
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,10 @@

module Unicode
module Emoji
# Contains an ordered and group list of all currently recommended Emoji (RGI/FQE)
LIST = INDEX[:LIST].freeze.each_value(&:freeze)

# Sometimes, categories change, we issue a warning in these cases
LIST_REMOVED_KEYS = [
"Smileys & People",
].freeze
Expand Down

0 comments on commit 7aa5987

Please sign in to comment.