Skip to content

Navigation Menu

Explore
By company size
By use case
By industry
View all solutions
Topics
- AI
- DevOps
- Security
- Software Development
- View all
Explore
- GitHub Sponsors
  Fund open source developers
- The ReadME Project
  GitHub community articles
Repositories
- Enterprise platform
  AI-powered developer platform
Available add-ons
Pricing

Search code, repositories, users, issues, pull requests...

Search

Clear

Search syntax tips

Provide feedback

We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Saved searches

Use saved searches to filter your results more quickly

Name

Query

To see all available qualifiers, see our documentation.

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

pemistahl / lingua Public

Notifications You must be signed in to change notification settings
Fork 63
Star 706

Code
Issues 4
Pull requests 3
Discussions
Actions
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Security
Insights

Releases: pemistahl/lingua

Releases · pemistahl/lingua

Lingua 1.2.2

02 Aug 13:26

Compare

Choose a tag to compare

Loading

Lingua 1.2.2 Latest

Latest

Bug Fixes

Due to a bug in the Moshi JSON serialization library, language detection was not possible in certain cases. (#144, #147)
Lingua could not be used properly when a security manager was enabled in the JVM. (#141)

Assets 6

Loading

All reactions

Lingua 1.2.1

09 Jun 10:15

Compare

Choose a tag to compare

Loading

Lingua 1.2.1

Bug Fixes

An exception was thrown when trying to detect the language of unigrams and bigrams in low accuracy mode which operates only with trigrams and larger strings. This has been fixed.

Assets 6

Loading

All reactions

Lingua 1.2.0

07 Jun 18:52

Compare

Choose a tag to compare

Loading

Lingua 1.2.0

Features

The library can now be used as a Java 9 module. Thanks to @Marcono1234 for helping with the implementation. (#120, #138)
The new method LanguageDetectorBuilder.withLowAccuracyMode() has been introduced. By activating it, detection accuracy for short text is reduced in favor of a smaller memory footprint and faster detection performance. (#136)

Improvements

The memory footprint has been reduced significantly by applying several internal optimizations. Thanks to @Marcono1234, @fvasco and @sigpwned for their help. (#101, #127)
Several language model files have become obsolete and could be deleted without decreasing detection accuracy. This results in a smaller memory footprint and a 36% smaller jar file.

Bug Fixes

A bug in the rule engine has been fixed that caused incorrect language detection for certain texts. Thanks to @bdecarne who has found it.

Other changes

Due to a refactoring of how the internal thread pool works, the method LanguageDetector.destroy() has been deprecated in favor of the newly introduced method LanguageDetector.unloadLanguageModels().

Contributors

sigpwned, bdecarne, and 2 other contributors

Assets 6

Loading

laxika and TimSielemann reacted with hooray emoji

All reactions

🎉 2 reactions

2 people reacted

Lingua 1.1.1

12 Dec 11:57

Compare

Choose a tag to compare

Loading

Lingua 1.1.1

Improvements

The new method LanguageDetector.destroy() has been introduced that frees internal resources to prevent memory leaks within application server deployments. (#110, #116)
Language model loading performance has been improved by creating a manually optimized internal thread pool. This replaces the coroutines used in the previous release. (#116)

Bug Fixes

The character â was erroneously not treated as a possible indicator for the French language. (#115)
Language detection was non-deterministic when multiple alphabets had the same occurrence count. (#105)

Assets 6

Loading

All reactions

Lingua 1.1.0

02 May 16:12

Compare

Choose a tag to compare

Loading

Lingua 1.1.0

Languages

There is now support for the Maori language which was contributed to the Rust implementation of Lingua. (#93)

Features

Language models are now loaded asynchronously and in parallel using Kotlin coroutines, making this step more performant. (#84)
Language Models can now be loaded either lazily (default) or eagerly. (#79)
Instead of loading multiple copies of the language models into memory for each separate instance of LanguageDetector, multiple instances now share the same language models and access them asynchronously. (#91)

Improvements

Language detection for sentences with more than 120 characters now performs more quickly by iterating through trigrams only which is enough to achieve high detection accuracy.
Textual input that includes logograms from Chinese, Japanese or Korean is now split at each logogram and not only at whitespace. This provides for more reliable language detection for sentences that include multi-language content. (#85)

Bug Fixes

For an odd number of words as input, the method LanguageDetector.computeLanguageConfidenceValues computed wrong values under certain circumstances. (#87)
When Lingua was used in projects with an explictly set Kotlin version which differed from Lingua's implicitly set version in the Gradle script, several errors occurred during runtime. By explicitly setting Lingua's Kotlin version, these errors are now hopefully gone. (#88, #89)
Errors in the rule engine for the Latvian language have been resolved. (#92)

Assets 6

Loading

All reactions

Lingua 1.0.3

15 Oct 17:39

Compare

Choose a tag to compare

Loading

Lingua 1.0.3

Bug Fixes

When two languages had exactly the same confidence values, one of them was erroneously removed from the result map.
Thanks to @mmedek for reporting this bug. (#72)
There was still a problem with the classification of texts consisting of certain alphabets.
Thanks to @nicolabertoldi for reporting this bug. (#76)
The language detection for Spanish did not take the rarely used accented characters á, é, í, ó, ú and ü into account.
Thanks to @joeporter for reporting this bug. (#73)
A bug in the rule engine led to weak detection accuracy for Macedonian and Serbian. This has been fixed.

Other Changes

The Kotlin compiler and runtime have been updated to version 1.4. This includes the current stable release 1.0.0 of the kotlinx-serialization framework.
The accuracy report files have been moved to their own Gradle source set. This allows for separate compilation of unit tests and accuracy report tests, leading to more flexible and slightly faster compilation.

Assets 6

Loading

All reactions

Lingua 1.0.2

09 Aug 12:46

pemistahl

Compare

Choose a tag to compare

Loading

Lingua 1.0.2

Bug Fixes

The language mapping for character ë was incorrect which has been fixed.
Thanks to @sandernugterenedia for reporting this bug. (#66)
The implementation of LanguageDetector made use of functionality that was
introduced in Java 8 which made the library unusable for Java 6 and 7.
Thanks to @levant916 for reporting this bug. (#69)
The Gradle shadow plugin has been
added so that ./gradlew jarWithDependencies produces a jar file whose dependencies
do not conflict anymore with the same dependencies of different versions in the same project. (#67)

Assets 6

Loading

All reactions

Lingua 1.0.1

04 Jul 13:46

pemistahl

Compare

Choose a tag to compare

Loading

Lingua 1.0.1

Bug Fixes

If no ngram probabilities were found for a given input text, a NullPointerException would be thrown.
Thanks to @fsonntag for finding and fixing this bug. (#63)

Assets 6

Loading

All reactions

Lingua 1.0.0

24 Jun 16:53

pemistahl

Compare

Choose a tag to compare

Loading

Lingua 1.0.0

Languages

added 9 new languages, this time with a focus on Africa: Ganda, Shona, Sotho, Swahili, Tsonga, Tswana, Xhosa, Yoruba, Zulu
removed language Norwegian in favor of Bokmal and Nynorsk (#59)

Features

LanguageDetector can now provide confidence scores for each evaluated language. (#11)
The public API for creating language model (LanguageModelFilesWriter) and test data files (TestDataFilesWriter) has been stabilized. (#37)
New convenience methods have been added to LanguageDetectorBuilder in order to build LanguageDetector from languages written in a certain script. (#61)

Improvements

The rule-based detection algorithm has been made less sensitive so that single words in a different language cannot mislead the algorithm so easily.
The fastutil library has been added again to reduce memory consumption. (#58)
The language model-based algorithm has been optimized so that language detection performs approximately 25% faster now. (#58)
Support for the Kotlin linter ktlint has been added to help with a consistent coding style. (#47)
Third-party dependencies have been updated to their latest versions. (#36)

Bug Fixes

Incorrect regex character classes caused the library to not work properly on Android. (#32)

Test Coverage

Test coverage has been extended from 59% to 72%.

Documentation

The README contains a new section describing how users can add their own languages to Lingua.

Other changes

There is a breaking change in this release:

Methods with the prefix fromAllBuiltIn... have been renamed to fromAll... to make them more succinct and clear. (#61)

Assets 6

Loading

All reactions

Lingua 0.6.1

06 Feb 21:47

pemistahl

Compare

Choose a tag to compare

Loading

Lingua 0.6.1

Bug Fixes

The rule-based engine did not take language subset filtering from public api into account (#23).
It was possible to pass through Language.UNKNOWN within the public api (#24).
Fixed a bug in the rule-based engine's alphabet detection algorithm which could be misled by single characters (#25).

Assets 6

Loading

All reactions

Previous 1 2 Next

Previous Next

Footer

© 2024 GitHub, Inc.

Footer navigation

Terms
Privacy
Security
Status
Docs
Contact

You can’t perform that action at this time.