Skip to content

Commit

Permalink
Merge pull request #11 from yuchenlin/create-pip-module
Browse files Browse the repository at this point in the history
Create pip module
  • Loading branch information
yuchenlin authored Jan 29, 2021
2 parents 3218bae + 0bf3692 commit 34dd7fb
Show file tree
Hide file tree
Showing 353 changed files with 112 additions and 26 deletions.
39 changes: 29 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,22 +1,42 @@
# Rebiber: A tool for normalizing bibtex with official info.

We often cite papers using their arXiv info without noting that they are already __PUBLISHED__ in some conferences such as ACL, EMNLP, NAACL, ICLR or AAAI. These incorrect bib entries might violate rules about submissions or camera-ready versions for some conferences. __Rebiber__ is a simple tool in Python to fix them automatically, based on their official information from the full ACL anthology and DBLP (for ICLR and other conferences)!
We often cite papers using their arXiv versions without noting that they are already __PUBLISHED__ in some conferences such as ACL, EMNLP, NAACL, ICLR or AAAI. These unofficial bib entries might violate rules about submissions or camera-ready versions for some conferences.
We introduce __Rebiber__, a simple tool in Python to fix them automatically. It is on their official conference information from the DBLP or the ACL anthology (for NLP confernces)! You can check the list of suported conferences [here](#supported-conferences).

## Get started

## Installation

```bash
git clone https://github.com/yuchenlin/rebiber.git
pip install bibtexparser tqdm
cd rebiber
pip install rebiber -U
```

Normalizing the bibtex entries to the official format.
OR

```bash
python normalize.py -i example_input.bib -o example_output.bib -l bib_list.txt
git clone https://github.com/yuchenlin/rebiber.git
cd rebiber/
pip install -e .
```


## Usage
To normalize your bibtex file with the official converence information.

```bash
rebiber -i /path/to/input.bib -o /path/to/output.bib
```
You can find a pair of example input and output files in `rebiber/example_input.bib` and `rebiber/example_output.bib`.
You can also specify your own bib list files by `-l /path/to/bib_list.txt`. If you don't specify any `-o` then it will be the same as the `-i`.
<!-- Or
```bash
python rebiber/normalize.py \
-i rebiber/example_input.bib \
-o rebiber/example_output.bib \
-l rebiber/bib_list.txt
``` -->


## Example Input and Output
An example input entry with the arXiv information (from Google Scholar or somewhere):
```bib
@article{lin2020birds,
Expand Down Expand Up @@ -54,7 +74,7 @@ An example normalized output entry with the official information:
The `bib_list.txt` contains a list of converted json files of the official bib data. In this repo, we now support the full [ACL anthology](https://www.aclweb.org/anthology/), i.e., all papers that are published at *CL conferences (ACL, EMNLP, NAACL, etc.) as well as workshops.
Also, we support any conference proceedings that can be downloaded from DBLP, for example, ICLR2020.

The following conferences are supported and their bib/json files are in our `data` folder. You can turn each item on/off in `bib_list.txt`.
The following conferences are supported and their bib/json files are in our `data` folder. You can turn each item on/off in `bib_list.txt`. **Please feel free to create PR for adding new conferences following [this](#adding-a-new-conference)!**

| Name | Years |
| --- | ----------- |
Expand Down Expand Up @@ -87,9 +107,8 @@ The following conferences are supported and their bib/json files are in our `dat
| WSDM | 2008 -- 2020 |
| WWW (The Web Conf) | 2001 -- 2020 |

**Please feel free to create PR to add your conferences here following the next section!**

Thanks for [Anton Tsitsulin](http://tsitsul.in/)'s great work on collecting such a complete set bib files!
**Thanks for [Anton Tsitsulin](http://tsitsul.in/)'s great work on collecting such a complete set bib files!**

<!--
python bib2json.py -i data/iclr2020.bib -o data/iclr2020.json
Expand Down
13 changes: 13 additions & 0 deletions rebiber/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
"""
Rebiber: A tool for normalizing bibtex with official info.
"""

from rebiber.bib2json import load_bib_file
from rebiber.normalize import construct_bib_db, normalize_bib

__all__ = [
"__version__",
"load_bib_file",
"construct_bib_db",
"normalize_bib"
]
11 changes: 8 additions & 3 deletions bib2json.py → rebiber/bib2json.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,11 @@
import bibtexparser
import argparse
from tqdm import tqdm
import os


filepath = os.path.dirname(os.path.abspath(__file__)) + '/'


def normalize_title(title_str):
title_str = re.sub(r'[^a-zA-Z]',r'', title_str)
Expand Down Expand Up @@ -43,10 +48,10 @@ def build_json(all_bib_entries):

if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument("-i", "--input_bib", default="data/acl.bib",
parser.add_argument("-i", "--input_bib", default=filepath+"data/acl.bib",
type=str, help="The input bib file")
parser.add_argument("-o", "--output_json", default="data/acl.json",
type=str, help="The output bib file")
parser.add_argument("-o", "--output_json", default=filepath+"data/acl.json",
type=str, help="The output json file")
args = parser.parse_args()

all_bib_entries = load_bib_file(args.input_bib)
Expand Down
Loading

0 comments on commit 34dd7fb

Please sign in to comment.