Skip to content

Service of wrapped DeepPavlov NER ML models for a quick entities extraction from cells of long tabular data, powered by AREkit pipelines

License

Notifications You must be signed in to change notification settings

nicolay-r/bulk-ner

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

65 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

bulk-ner 0.25.0

twitter PyPI downloads

A no-strings inference implementation framework Named Entity Recognition (NER) service of wrapped AI models powered by AREkit and the related text-processing pipelines.

The key benefits of this tiny framework are as follows:

  1. ☑️ Native support of batching;
  2. ☑️ Native long-input contexts handling.

Installation

pip install bulk-ner==0.25.0

Usage

API

Please take a look at the related Wiki page

Shell

NOTE: You have to install source-iter package

This is an example for using DeepPavlov==1.3.0 as an adapter for NER models passed via --adapter parameter:

python -m bulk_ner.annotate \
    --src "test/data/test.tsv" \
    --prompt "{text}" \
    --batch-size 10 \
    --adapter "dynamic:models/dp_130.py:DeepPavlovNER" \
    --output "test-annotated.jsonl" \
    %% \
    --model "ner_ontonotes_bert_mult"

You can choose the other models via --model parameter.

List of the supported models is available here: https://docs.deeppavlov.ai/en/master/features/models/NER.html

Deploy your model

Quick example: Check out the default DeepPavlov wrapper implementation

All you have to do is to implement the BaseNER class that has the following protected method:

  • _forward(sequences) -- expected to return two lists of the same length:
    • terms -- related to the list of atomic elements of the text (usually words)
    • labels -- B-I-O labels for each term.

Powered by

The pipeline construction components were taken from AREkit [github]