Skip to content

Commit

Permalink
add dataset readme
Browse files Browse the repository at this point in the history
  • Loading branch information
ronikaufman committed Dec 6, 2024
1 parent 6ac5b1c commit 2fa5370
Showing 1 changed file with 34 additions and 0 deletions.
34 changes: 34 additions & 0 deletions code/myriad/loam_paper/dataset/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
# Dataset

This is the **Myriad People** dataset. All these files have been generated by running `script.py` in the `../mining` folder. Here are their descriptions.

`all_loggedin_contributors.json`: list of all logged-in (i.e. not anonymous) GitHub contributors, with:
- `type`: type of contributor, `User` or `Bot`
- `id`: GitHub username
- `contributions`: list of repositories they contributed to, with:
- `repo_name`: name of repository
- `contributions`: number of contributions that they made to this project

`categories_info.json`: list of categories, with:
- `category`: name of the category
- `repos`: list of names of repositories in that category

`repos_info.json`: list of all repositories for which the GitHub API managed to fetch the data, with:
- `name`: name of the repository
- `category`: category it belongs to
- `exclusivity`: either the name of an artwork if this repository was exclusively used in that artwork, or `null` if it was used in at least two artworks
- `created_at`: creation date of the repository, in the Python `datetime` format
- `total_contributions`: total number of contributions
- `anonymous_contributors`: number of anonymous contributors
- `loggedin_contributors`: number of logged-in contributors

`gh_api_failures.json`: list of repositories for which the GitHub API failed (because they are too big), with `name`, `category` and `exclusivity`, as in `repos_info.json`

`individual_repos` folder: one file per repository, in the format `owner&name.json`, with:
- `repo_name`: name of the repository
- `contributors`: list of contributors, with:
- `type`: type of contributor, `User` or `Bot`
- `id`: GitHub username
- `contributions`: number of contributions that they made to this project

In all the files, repository names attributes are in the format `owner/name`.

0 comments on commit 2fa5370

Please sign in to comment.