Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[maintainance] Collect OWNERship metrics for every repository #340

Closed
gaocegege opened this issue Jun 4, 2020 · 11 comments
Closed

[maintainance] Collect OWNERship metrics for every repository #340

gaocegege opened this issue Jun 4, 2020 · 11 comments

Comments

@gaocegege
Copy link
Member

Ref https://groups.google.com/d/msgid/kubeflow-discuss/858f1953-1808-4ebc-9a8e-1661895522ac%40Spark?utm_medium=email&utm_source=footer

From @jlewi

Towards that end, I think it would be very helpful to think about OWNERship metrics. Here are some initial metrics that I think would be very useful. For every repo I'd like to know who the OWNERs are and what percentage of the repo they own. To start we can measure OWNERSHIP as percent % of files in the repo they own.

Does anyone want to volunteer to collect those metrics?

I also think it is necessary, thus open an issue in the community for it.

@issue-label-bot
Copy link

Issue-Label Bot is automatically applying the labels:

Label Probability
kind/feature 0.70

Please mark this comment with 👍 or 👎 to give our bot feedback!
Links: app homepage, dashboard and code for this bot.

@gaocegege
Copy link
Member Author

Comments from @animeshsingh

  1. Not only your code contributions, but contributions an individual/organization had made towards efforts like Product Management/Project Management/Release Management/Community Management - they are all a reflection of the commitment and contributions.
  2. Past sustained contributions and commitment to Kubeflow are definitely a metric, but even if for future you are basing a longer strategy and roadmap in your organization, and want to help out. please feel free to nominate yourself as an Kubeflow Application WG member as well.

@terrytangyuan
Copy link
Member

terrytangyuan commented Jun 4, 2020

@jlewi What would be a good identifier for ownership of files? Do you know if GitHub API provides a way to do this?

@gaocegege
Copy link
Member Author

gaocegege commented Jun 4, 2020

We have http://devstats.kubeflow.org/ but we cannot get the statistics for repositories.

We also have https://www.stackalytics.com/unaffiliated?project_type=kubeflow-group&release=all&metric=commits but it does not have all contributions information and it only collects the PR metrics.

We know that people like @terrytangyuan help the community to do the release, these kinds of contributions cannot be collected in devstats or stackalytics.

@terrytangyuan
Copy link
Member

terrytangyuan commented Jun 4, 2020

We probably want to use pure GitHub API to retrieve all information that we need.

Copying my reply on another email thread here in case anyone is interested:

I’ve been writing something similar as a side project to automate the process of writing issues/pull requests summary and release notes (been using it for release notes of a couple of Kubeflow operators). Currently everything is written in R in this repo: https://github.com/terrytangyuan/maintainer-tools
Understanding contributions and ownerships of individuals/organizations is definitely interesting and is part of the next steps.

@gaocegege
Copy link
Member Author

We still have the problem, some contributions made in GDoc or community meeting cannot be recorded. But it will be better to use devstats or stackalytics.

@terrytangyuan Are you interested in helping the community to do it?

@terrytangyuan
Copy link
Member

Those types of contributions are probably non-trivial to capture. I am uncertain whether I’ll have bandwidth to do it recently though. I’ll definitely keep you posted here if I do.

@jlewi
Copy link
Contributor

jlewi commented Jun 4, 2020

@terrytangyuan OWNERs files will be named "OWNERs". I believe the GitHub data in BigQuery lists files so you could probably easily do a BigQuery to identify all the OWNERs files and then fetch them with "curl" or some other way.
Here's a notebook illustrating how to query github data from bigquery
https://github.com/kubeflow/community/blob/master/scripts/github_stats.ipynb

Although you could also just check out all the repositories.

We still have the problem, some contributions made in GDoc or community meeting cannot be recorded. But it will be better to use devstats or stackalytics.

@gaocegege I don't want to measure contributions. I want to measure accountability. At the simplest level, OWNERs files list who can approve PRs. So the OWNERs files are pretty good indication of who understands a piece of code and is actively involved in a project.

There are incentives to keep OWNERs files up to date because if people in an OWNERs files are no longer active it slows everyone down having to reassign PRs to someone else.

For example, suppose someone has been very active in a project but then they switch jobs and are no longer involved in KF. At this point, they would likely be removed from the OWNERs files indicating they are no longer active. So the OWNERs file but not contribution metrics would reflect the fact that they will no longer be actively involved going forward.

@gaocegege
Copy link
Member Author

I don't want to measure contributions. I want to measure accountability. At the simplest level, OWNERs files list who can approve PRs. So the OWNERs files are pretty good indication of who understands a piece of code and is actively involved in a project.

Make sense.

@gaocegege
Copy link
Member Author

Updated OWNERS for Katib, tf-operator and pytorch-operator.

@stale
Copy link

stale bot commented Sep 22, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants