"Best of UI5" is the new entry page for the ui5-community.
This repository will crawl and supply the data for the website.
Just create a issue with this template in the bestofui5-data repo
with your package and just check if you meet the prerequisites.
The crawler is written in Typescript and will get the latest data every day with a GitHub action worklow.
It will look at every package defined in the source.json
file.
Currently it´s looking at data from GitHub and NPM.
If you´re looking for the latest data files, they are in the live-data
branch and in the data
folder.
The source code is written in typescript and in the src
folder.
The workflow is will run every day with a GitHub action and triggers the build
command in the package.json
file.
Data is collected via GitHub and NPM APIs. For GitHub only the authenticated access makes sense because the API rate limit is 60 requests per hour.
It will collect metadata from GitHub and NPM, Readme and Historic Downloads.
Index.ts is the initial file for all following processes.
It starts to read the sources.json
file to determine which packages have to be read.
Since the file is based on the GitHub repositories, the process starts to read them in gh-repos.ts
.
The next step is to enrich the data from GitHub with NPM data.
From the returned data, the unique types and tags are selected, as well as the individual versions.
The package data and types/tags are written to data.json
and the versions to versions.json
.
The GitHub process starts with the get
method.
Before the data is retrieved, a distinction must be made whether the repo is a mono or single repo.
For example, the repo ui5-ecosystem-showcase is a monorepo with many middleware and tasks.
With getRepoInfo
the metadata is retrieved from the GitHub repo.
This is done using this GitHub Repositories API.
Additionally, updatedAt
is determined by when the last commit was on the default branch (currently only by generators
).
With fetchRepo
data is retrieved directly from the repository.
Here the package.json
and the README.md
for the later representation on the web page.
The JSDoc are also retrieved with getJsdoc
if it exists. Currently this is done for the types "task" and "middleware".
For correct processing the ui5.yaml
is also parsed here.
Because the types generator
are not on NPM, an attempt is made to generate a key figure with the cloning statistics.
This happens with the method updateCloningStats
. For this API special permissions are needed and therefore a special GitHub token must be used (WORKFLOW_CRAWL_GITHUB_TOKEN
) which has more permissions than the default token.
The class NpmProvider
is there to enrich the GitHub data.
Therefore the packages array is passed here.
It retrieves the metadata from NPM, as well as the historical download counts.
To optimize the downloads they are combined for bulk retrieval and retrieved with getDownloadsBulk
.
The following download numbers are currently retrieved:
- current fortnight
- last fortnight
- last 30 days
- last year
- last year per month
Metadata is also retrieved. Currently the following data is used by NPM:
- created At
- updated At
- all versions
To retrieve the clones statistics from Github, a special GitHub token is used in the workflow (WORKFLOW_CRAWL_GITHUB_TOKEN
).
This token has more permissions than the default token.
The workflow uses ubuntu-latest
with node 16.
Basically, the latest data is always published on the 'live-data' branch. These data are partly rebuilt from scratch(data.json
, versions.json
) and partly enriched (clones.json
).
First the main branch and then the branch live-data
is cloned to perform a rebase in the main
branch.
This ensures that the data is reused.
After that the module is installed and with npm run build
the typescript script is executed.
Thereby the data files are updated.
After that the committed update is pushed to the live-data
branch.
For this file there is a type definition how the content should look like.
For this file there is a type definition how the content should look like. A singlerepo needs:
- owner --> username or organization name
- repo --> Repository name
- subpath --> for monorepos, path were the subpackages are located
- subpackages --> for monorepos, list of subpackages
- addedToBoUI5 --> timestamp when this package was added to BestofUI5
- type --> type of the package, see enum
BoUI5Types
for this - tags --> list of tags
There are two arrays in this file.
In the array Packages are all packages with the information.
In the second array all types/tags are present. This is used in the Tags View.
This file is generated from all NPM packages and their versions.
This file is used in the Timeline View.
This file is used only for the generators. Since these do not have an NPM package, an attempt is made to collect a measure via the number of clones of the repository.
Since the API only displays the last 15 days, this file will store historical data.
git clone:
git clone https://github.com/ui5-community/bestofui5-data
install:
npm install
set github token (check which one is four your OS):
export GITHUB_TOKEN=<your token>
set GITHUB_TOKEN=<your token>
$env:GITHUB_TOKEN="<your token>"
run crawl:
npm run build
When you run the build command without a github token, the workflow will probably run soon into a rate limit.
The crawl will probably also fail when retrieving the clone statistics. However, this section is in a try/catch and will only show the error. The rest should go through normally.
This project is licensed under the Apache Software License, version 2.0 except as noted otherwise in the LICENSE file.