This project repo is no longer being maintained. For more information on code-gov repos, go to GSA/code-gov.
Code.gov data quality scoring proof of concept
Our objective is to visualize what a scoring engine for Code.gov data quality could look like. This initial step is a extremely naive approach.
In the effort to quantify the data quality of our repos I've assigned points to our schema fields. These points will be given to each field and added to make the "quality total" of the repository. This "quality total" will then be added to the data we index in Elasticsearch.
It's important to note that this is not a search score but our quality score. The regular search score that we use from Elasticsearch will not be affected or substituted by this.
- name
- description
- permissions.licenses
- permissions.licenses.URL
- permissions.licenses.name
- permissions.usageType
- permissions.exemptionText
- organization
- contact.email
- contact.name
- contact.URL
- contact.phone
- tags
- laborHours
- languages
- repositoryURL
- homepageURL
- downloadURL
- vcs
- date.created
- date.lastModified
- date.metadataLastUpdated
- version
- status
- disclaimerURL
- disclaimerText
- relatedCode.name
- relatedCode.URL
- reusedCode.name
- reusedCode.URL
- partners.name
- partners.email
- target_operating_systems
- additional_information
- Clone the repository to your machine:
git clone [email protected]:GSA/code-gov-data-quality-poc.git
- Change into the project directory:
cd code-gov-data-quality-poc
- Install all dependencies:
npm install
- Run project start command:
npm start
The execution of this project will create a file on your machine called scoredRepos.csv
We are only using three datasources for this POC. They can be found in the app.js
file. To change what datasource is being used you have to edit the line with the following code getCodeJson(codeJsons.dot)
. The options are codeJsons.gsa
, codeJsons.dot
, and codeJsons.nasa
.
Ex.
getCodeJson(codeJsons.dot)
Change to:
getCodeJson(codeJsons.gsa)