MITIE - named-entity recognition, binary relation detection, and text categorization - for Ruby
- Finds people, organizations, and locations in text
- Detects relationships between entities, like
PERSON
was born inLOCATION
Add this line to your application’s Gemfile:
gem "mitie"
And download the pre-trained models for your language:
Load an NER model
model = Mitie::NER.new("ner_model.dat")
Create a document
doc = model.doc("Nat works at GitHub in San Francisco")
Get entities
doc.entities
This returns
[
{text: "Nat", tag: "PERSON", score: 0.3112371212688382, offset: 0},
{text: "GitHub", tag: "ORGANIZATION", score: 0.5660115198329334, offset: 13},
{text: "San Francisco", tag: "LOCATION", score: 1.3890524313885309, offset: 23}
]
Get tokens
doc.tokens
Get tokens and their offset
doc.tokens_with_offset
Get all tags for a model
model.tags
Load an NER model into a trainer
trainer = Mitie::NERTrainer.new("total_word_feature_extractor.dat")
Create training instances
tokens = ["You", "can", "do", "machine", "learning", "in", "Ruby", "!"]
instance = Mitie::NERTrainingInstance.new(tokens)
instance.add_entity(3..4, "topic") # machine learning
instance.add_entity(6..6, "language") # Ruby
Add the training instances to the trainer
trainer.add(instance)
Train the model
model = trainer.train
Save the model
model.save_to_disk("ner_model.dat")
Detect relationships betweens two entities, like:
PERSON
was born inLOCATION
ORGANIZATION
was founded inLOCATION
FILM
was directed byPERSON
There are 21 detectors for English. You can find them in the binary_relations
directory in the model download.
Load a detector
detector = Mitie::BinaryRelationDetector.new("rel_classifier_organization.organization.place_founded.svm")
And create a document
doc = model.doc("Shopify was founded in Ottawa")
Get relations
detector.relations(doc)
This returns
[{first: "Shopify", second: "Ottawa", score: 0.17649169745814464}]
Load an NER model into a trainer
trainer = Mitie::BinaryRelationTrainer.new(model)
Add positive and negative examples to the trainer
tokens = ["Shopify", "was", "founded", "in", "Ottawa"]
trainer.add_positive_binary_relation(tokens, 0..0, 4..4)
trainer.add_negative_binary_relation(tokens, 4..4, 0..0)
Train the detector
detector = trainer.train
Save the detector
detector.save_to_disk("binary_relation_detector.svm")
Load a model into a trainer
trainer = Mitie::TextCategorizerTrainer.new("total_word_feature_extractor.dat")
Add labeled text to the trainer
trainer.add("This is super cool", "positive")
Train the model
model = trainer.train
Save the model
model.save_to_disk("text_categorization_model.dat")
Load a saved model
model = Mitie::TextCategorizer.new("text_categorization_model.dat")
Categorize text
model.categorize("What a super nice day")
Check out Trove for deploying models.
trove push ner_model.dat
View the changelog
Everyone is encouraged to help improve this project. Here are a few ways you can help:
- Report bugs
- Fix bugs and submit pull requests
- Write, clarify, or fix documentation
- Suggest or add new features
To get started with development:
git clone https://github.com/ankane/mitie-ruby.git
cd mitie-ruby
bundle install
bundle exec rake vendor:all
export MITIE_MODELS_PATH=path/to/MITIE-models/english
bundle exec rake test