🐍 python-vibrato 🎤

Vibrato is a fast implementation of tokenization (or morphological analysis) based on the Viterbi algorithm. This is a Python wrapper for Vibrato.

Installation

Install pre-built package from PyPI

Run the following command:

$ pip install vibrato

Build from source

You need to install the Rust compiler following the documentation beforehand. vibrato uses pyproject.toml, so you also need to upgrade pip to version 19 or later.

$ pip install --upgrade pip

After setting up the environment, you can install vibrato as follows:

$ pip install git+https://github.com/daac-tools/python-vibrato

Example Usage

python-vibrato does not contain model files. To perform tokenization, follow the document of Vibrato to download distribution models or train your own models beforehand.

Check the version number as shown below to use compatible models:

>>> import vibrato
>>> vibrato.VIBRATO_VERSION
'0.5.1'

Examples:

>>> import vibrato

>>> with open('tests/data/system.dic', 'rb') as fp:
...     tokenizer = vibrato.Vibrato(fp.read())

>>> tokens = tokenizer.tokenize('社長は火星猫だ')

>>> len(tokens)
5

>>> tokens[0]
Token { surface: "社長", feature: "名詞,普通名詞,一般,*" }

>>> tokens[0].surface()
'社長'

>>> tokens[0].feature()
'名詞,普通名詞,一般,*'

>>> tokens[0].start()
0

>>> tokens[0].end()
2

Note for distributed models

The distributed models are compressed in zstd format. If you want to load these compressed models, you must decompress them outside the API.

>>> import vibrato
>>> import zstandard  # zstandard package in PyPI

>>> dctx = zstandard.ZstdDecompressor()
>>> with open('tests/data/system.dic.zst', 'rb') as fp:
...     with dctx.stream_reader(fp) as dict_reader:
...         tokenizer = vibrato.Vibrato(dict_reader.read())

License

Licensed under either of

Apache License, Version 2.0 (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
MIT license (LICENSE-MIT or http://opensource.org/licenses/MIT)

at your option.

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
.github/workflows		.github/workflows
docs/source		docs/source
src		src
tests		tests
.readthedocs.yaml		.readthedocs.yaml
CONTRIBUTING.md		CONTRIBUTING.md
Cargo.toml		Cargo.toml
LICENSE-APACHE		LICENSE-APACHE
LICENSE-MIT		LICENSE-MIT
README.md		README.md
noxfile.py		noxfile.py
pyproject.toml		pyproject.toml
requirements-dev.txt		requirements-dev.txt
vibrato.pyi		vibrato.pyi

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Repository files navigation

🐍 python-vibrato 🎤

Installation

Install pre-built package from PyPI

Build from source

Example Usage

Note for distributed models

License

About

Licenses found

Releases 3

Packages

Contributors 2

Languages

License

Licenses found

daac-tools/python-vibrato

Folders and files

Latest commit

History

Repository files navigation

🐍 python-vibrato 🎤

Installation

Install pre-built package from PyPI

Build from source

Example Usage

Note for distributed models

License

About

Topics

Resources

License

Licenses found

Stars

Watchers

Forks

Releases 3

Packages 0

Contributors 2

Languages

Packages