Skip to content

ankane/mitie-ruby

Repository files navigation

MITIE Ruby

MITIE - named-entity recognition, binary relation detection, and text categorization - for Ruby

  • Finds people, organizations, and locations in text
  • Detects relationships between entities, like PERSON was born in LOCATION

Build Status

Installation

Add this line to your application’s Gemfile:

gem "mitie"

And download the pre-trained models for your language:

Getting Started

Named Entity Recognition

Load an NER model

model = Mitie::NER.new("ner_model.dat")

Create a document

doc = model.doc("Nat works at GitHub in San Francisco")

Get entities

doc.entities

This returns

[
  {text: "Nat",           tag: "PERSON",       score: 0.3112371212688382, offset: 0},
  {text: "GitHub",        tag: "ORGANIZATION", score: 0.5660115198329334, offset: 13},
  {text: "San Francisco", tag: "LOCATION",     score: 1.3890524313885309, offset: 23}
]

Get tokens

doc.tokens

Get tokens and their offset

doc.tokens_with_offset

Get all tags for a model

model.tags

Training

Load an NER model into a trainer

trainer = Mitie::NERTrainer.new("total_word_feature_extractor.dat")

Create training instances

tokens = ["You", "can", "do", "machine", "learning", "in", "Ruby", "!"]
instance = Mitie::NERTrainingInstance.new(tokens)
instance.add_entity(3..4, "topic")    # machine learning
instance.add_entity(6..6, "language") # Ruby

Add the training instances to the trainer

trainer.add(instance)

Train the model

model = trainer.train

Save the model

model.save_to_disk("ner_model.dat")

Binary Relation Detection

Detect relationships betweens two entities, like:

  • PERSON was born in LOCATION
  • ORGANIZATION was founded in LOCATION
  • FILM was directed by PERSON

There are 21 detectors for English. You can find them in the binary_relations directory in the model download.

Load a detector

detector = Mitie::BinaryRelationDetector.new("rel_classifier_organization.organization.place_founded.svm")

And create a document

doc = model.doc("Shopify was founded in Ottawa")

Get relations

detector.relations(doc)

This returns

[{first: "Shopify", second: "Ottawa", score: 0.17649169745814464}]

Training

Load an NER model into a trainer

trainer = Mitie::BinaryRelationTrainer.new(model)

Add positive and negative examples to the trainer

tokens = ["Shopify", "was", "founded", "in", "Ottawa"]
trainer.add_positive_binary_relation(tokens, 0..0, 4..4)
trainer.add_negative_binary_relation(tokens, 4..4, 0..0)

Train the detector

detector = trainer.train

Save the detector

detector.save_to_disk("binary_relation_detector.svm")

Text Categorization

Load a model into a trainer

trainer = Mitie::TextCategorizerTrainer.new("total_word_feature_extractor.dat")

Add labeled text to the trainer

trainer.add("This is super cool", "positive")

Train the model

model = trainer.train

Save the model

model.save_to_disk("text_categorization_model.dat")

Load a saved model

model = Mitie::TextCategorizer.new("text_categorization_model.dat")

Categorize text

model.categorize("What a super nice day")

Deployment

Check out Trove for deploying models.

trove push ner_model.dat

History

View the changelog

Contributing

Everyone is encouraged to help improve this project. Here are a few ways you can help:

To get started with development:

git clone https://github.com/ankane/mitie-ruby.git
cd mitie-ruby
bundle install
bundle exec rake vendor:all

export MITIE_MODELS_PATH=path/to/MITIE-models/english
bundle exec rake test