SOParser

SOParser is a parser and analyzer of the StackOverflow data.

The package contains a bash script for downloading and extracting data into a manageable format. In addition to this the, the repository contains topic modeling code in Python.

Execute ./downloadAndPrepareData.sh to download and prepare the data. NB: You will need ~100gb available disc space to be able to run the script.

Files and explanations

downloadAndPrepareData.sh - a bash script that downloads and prepares the data. The script creates one file per month (Jan. 2013 - Dec. 2014), each file contains the questions and answers posted in that month.
SOParser.py - a Python script that 1) extracts all users that will be used in the analysis (users with minimum 50 posts over 2013-2014), 2) extracts the questions and answers (title, text - excluding code snippets, tags, ) written by those users and saves in data files used in later stages of the analysis. The output is one TSV file per month.
TextProcessor.py - performs tokenization, stemming, TF-IDF, and month-by-month LDA on the files generated by SOParser.py.
TopicComparator.py - Compares topics month-by-month, e.g. compares the topics generated for 2013-05 with the topics generated for 2013-06 and 2013-07, etc.
UserComparator.py - Compares topics month-by-month in terms of users

You might need to run nltk.download() to download stopwords.

Name		Name	Last commit message	Last commit date
Latest commit History 63 Commits
.gitignore		.gitignore
.travis.yml		.travis.yml
README.md		README.md
SOParser.py		SOParser.py
TextProcessor.py		TextProcessor.py
TopicComparator.py		TopicComparator.py
TopicStats.py		TopicStats.py
UserComparator.py		UserComparator.py
downloadAndPrepareData.sh		downloadAndPrepareData.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SOParser

Files and explanations

About

Releases

Packages

Contributors 2

Languages

alansaid/SOParser

Folders and files

Latest commit

History

Repository files navigation

SOParser

Files and explanations

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages