entity-matching

Match entities between CiteSeerX and other digital libraries

In this project, we attempt to develop a machine learning (ML) based method to match paper entities between CiteSeerX and other digital libraries, including but not limited to the IEEE Xplore (IEEE hereafter), DBLP, Web of Science (WoS hereafter). Like most ML-based methods, data preprocessing takes substantial efforts. The purpose of creating this codebase is to centralize working programs that accomplish different tasks so they can be reused for future people that take over corresponding roles.

Models:

HMM (Header Matching Model): This model tries to match paper entities across data bases using information existing in the header of the papers including title, abstract, list of authors and venue. This model is used for matching CiteSeerX to digital libraries without citation information such as DBLP, IEEE and Medline.

HMM_readme shows the details for running HMM model.

CMM (Citation Matching Model): This model leverages citations for matching of the papers if citation information exists.
TEM (Title Evaluation Model): This model evaluates quality of the title. If title has a high quality, HMM model is used otherwise combination of CMM and HMM model is applied for the matching process.
IMM (Integrated Matching Model): This model integrates HMM and CMM with the help of TEM.

Files:

ground truth: This files contains a list of matching papers between CiteSeerX and WoS.

Similarity Profile: This file compares profiles of two papers from different data sources and measures their similarity score. This model has different features for variety of information in header.

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
HMM		HMM
data		data
models		models
results		results
CMM.py		CMM.py
IMM.py		IMM.py
README.md		README.md
TEM.py		TEM.py
all_title_dblp2017_11.txt_DF.csv		all_title_dblp2017_11.txt_DF.csv
groundtruth.txt		groundtruth.txt
header_based_model.py		header_based_model.py
index_reference_papers.py		index_reference_papers.py
name_parser.py		name_parser.py
normalizr.py		normalizr.py
similarityProfile.py		similarityProfile.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

entity-matching

About

Releases

Packages

Contributors 2

Languages

SeerLabs/entity-matching

Folders and files

Latest commit

History

Repository files navigation

entity-matching

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages