Explore
By company size
By use case
By industry
View all solutions
Topics
- AI
- DevOps
- Security
- Software Development
- View all
Explore
- GitHub Sponsors
  Fund open source developers
- The ReadME Project
  GitHub community articles
Repositories
- Enterprise platform
  AI-powered developer platform
Available add-ons
Pricing

Search code, repositories, users, issues, pull requests...

Clear

Search syntax tips

Provide feedback

We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Saved searches

Use saved searches to filter your results more quickly

Name

Query

To see all available qualifiers, see our documentation.

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

mideind / Tokenizer Public

Notifications You must be signed in to change notification settings
Fork 7
Star 28

Code
Issues 6
Pull requests
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Projects
Security
Insights

Releases: mideind/Tokenizer

Releases Tags

Releases · mideind/Tokenizer

Version 1.3.0

21 May 11:25

vthorsteinsson

1.3.0

03d3512

Compare

Choose a tag to compare

View all tags

Version 1.3.0

Added TOK.DOMAIN and TOK.HASHTAG token types
Improved handling of capitalized month name Ágúst, which is now recognized as such when it follows an ordinal number
Improved recognition of telephone numbers
Added abbreviations

Assets 2

All reactions

Version 1.2.3

03 May 11:41

vthorsteinsson

1.2.3

62dbc26

Compare

Choose a tag to compare

View all tags

Version 1.2.3

Added abbreviations; updated GitHub URLs to point to mideind instead of vthorsteinsson

Assets 2

All reactions

Version 1.2.2

26 Apr 13:17

vthorsteinsson

1.2.2

3e2cca2

Compare

Choose a tag to compare

View all tags

Version 1.2.2

Added support for composites with more than two parts, i.e. „dómsmála-, ferðamála-, iðnaðar- og nýsköpunarráðherra“; added support for ± sign; added several abbreviations

Assets 2

All reactions

Version 1.2.1

18 Feb 19:19

vthorsteinsson

1.2.1

f700774

Compare

Choose a tag to compare

View all tags

Version 1.2.1

Fixed bug where the name 'Ágúst' was recognized as a month name. Unicode nonbreaking and invisible space characters are now removed before tokenization.

Assets 2

All reactions

Version 1.2.0

07 Feb 16:34

vthorsteinsson

1.2.0

c24eef8

Compare

Choose a tag to compare

View all tags

Version 1.2.0

Added support for Unicode fraction characters; enhanced handing of degrees (°, °C, °F); fixed bug in cubic meter measurement unit; more abbreviations

Assets 2

All reactions

Version 1.1.2

10 Jan 11:37

vthorsteinsson

1.1.2

720aacb

Compare

Choose a tag to compare

View all tags

Version 1.1.2

Fixed bug in liter measurement unit (l and ltr); was 1000 times too large

Assets 2

All reactions

Version 1.1.1

04 Jan 18:23

vthorsteinsson

1.1.1

6e5b75b

Compare

Choose a tag to compare

View all tags

Version 1.1.1

Added the mark_paragraphs() function

Assets 2

All reactions

Version 1.1.0

02 Jan 14:38

vthorsteinsson

1.1.0

c81c3e9

Compare

Choose a tag to compare

View all tags

Version 1.1.0

All abbreviations in Abbrev.conf are now returned with their meaning in a tuple in token.val; handling of 'mbl.is' fixed

Assets 2

All reactions

Version 1.0.9

29 Dec 13:08

vthorsteinsson

1.0.9

f2e8f6f

Compare

Choose a tag to compare

View all tags

Version 1.0.9

Added MAST abbreviation; harmonized copyright headers

Assets 2

All reactions

Version 1.0.7

25 Sep 12:22

vthorsteinsson

1.0.7

9316ed2

Compare

Choose a tag to compare

View all tags

Version 1.0.7

Added NUMWLETTER token type, for numbers with a single-letter suffix (12a, 80D). This will mainly be useful for parsing addresses. Note that if a conflict occurs between NUMWLETTER and MEASUREMENT (such as 16A, meaning 16 ampere), the latter takes precedence.

Assets 2

All reactions

Previous 1 2 3 4 5 Next

Previous Next

Footer

Footer navigation

Terms
Privacy
Security
Status
Docs
Contact

You can’t perform that action at this time.