Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue Classification Bot #95086

Closed
JacksonKearl opened this issue Apr 13, 2020 · 5 comments
Closed

Issue Classification Bot #95086

JacksonKearl opened this issue Apr 13, 2020 · 5 comments
Assignees
Milestone

Comments

@JacksonKearl
Copy link
Contributor

JacksonKearl commented Apr 13, 2020

Background

We have too many issues to manually classify them all, and even if that isn't strictly true now it will be as we continue to grow. We previously had a bot to classify them, but its training fell out of date and it eventually had to be killed.

Proposal

We implement a new set of actions which automate the processes of:

1. Collecting Data:

  • An initial long-running scrape of all existing issue data, followed by as-generated collection of future data
    • Update: Scraping all issue data took much less time than I expected (<1hr), we could do the entire scrape monthly, which gives us more flexibility to change the data we use and also gets us out of keeping customer data around.
  • Collect for each labeling event:
    • Label name
    • Name of user/bot adding the label
    • Issue title&body at time of label
    • Issue final title&body
  • Unknown: where is the raw data stored? (~30 MB compressed)
    • See update above. May not need to store.
  • Unknown: GDPR implications of keeping data?
    • See update above. May not need to store.

2. Running Training

  • Monthly retraining
  • C#/ML.NET, either hosted via GitHub Action (Docker) or using a more powerful Azure machine
  • Potentially implemented with AutoML, which is has been used by the dotnet/corefx repo for similar issue feature-area classification with good accuracy.
  • Unknown: where is the model stored? (dotnet/corefx model is in the 10's of MB (compressed))
  • Unknown: GDPR implications of storing model?

3. Labeling Issues

  • Either live, or in batches (every half hour? hour?)
  • C# running as a GitHub Action (Docker)
@usernamehw
Copy link
Contributor

Have you thought about letting community users set assignee #78432 ?

@JacksonKearl
Copy link
Contributor Author

@usernamehw ideally this bot would come with that for free. I.e. you could type “integrated terminal” and it would know whom to assign.

@usernamehw
Copy link
Contributor

usernamehw commented Apr 14, 2020

Well, sometimes the issue has many keywords from different areas and bot would be helpless in that case.

JacksonKearl pushed a commit that referenced this issue May 4, 2020
@JacksonKearl
Copy link
Contributor Author

JacksonKearl commented May 4, 2020

Pushing back, ran into some unexpected behaviours when deploying the full pipeline.

@JacksonKearl
Copy link
Contributor Author

JacksonKearl commented May 8, 2020

Data for recall (how many correctly labeled vs how many in dataset) and precision (how many correctly labeled vs how many total labeled) for each label:

For more info: https://en.wikipedia.org/wiki/Precision_and_recall

assignee:

JacksonKearl: 
        Recall 0% (0/33)
        Precision  (0/0)
RMacfarlane: 
        Recall 0% (0/101)
        Precision  (0/0)
Tyriar: 
        Recall 68% (870/1266)
        Precision 89% (870/976)
aeschli: 
        Recall 27% (224/810)
        Precision 73% (224/303)
alexdima: 
        Recall 12% (112/927)
        Precision 84% (112/132)
alexr00: 
        Recall 1% (3/239)
        Precision 75% (3/4)
bpasero: 
        Recall 18% (262/1444)
        Precision 68% (262/384)
chrmarti: 
        Recall 6% (14/229)
        Precision 93% (14/15)
connor4312: 
        Recall 0% (0/25)
        Precision  (0/0)
dbaeumer: 
        Recall 16% (53/320)
        Precision 63% (53/84)
deepak1556: 
        Recall 0% (0/77)
        Precision  (0/0)
eamodio: 
        Recall 0% (0/23)
        Precision  (0/0)
gregvanl: 
        Recall 0% (0/21)
        Precision  (0/0)
isidorn: 
        Recall 26% (321/1200)
        Precision 65% (321/489)
joaomoreno: 
        Recall 41% (544/1298)
        Precision 83% (544/653)
jrieken: 
        Recall 16% (168/1004)
        Precision 85% (168/197)
misolori: 
        Recall 0% (0/83)
        Precision  (0/0)
mjbvz: 
        Recall 50% (746/1470)
        Precision 81% (746/916)
rebornix: 
        Recall 0% (0/360)
        Precision  (0/0)
roblourens: 
        Recall 31% (237/759)
        Precision 68% (237/348)
sandy081: 
        Recall 10% (69/670)
        Precision 75% (69/92)
sbatten: 
        Recall 7% (18/235)
        Precision 75% (18/24)
stevencl: 
        Recall 0% (0/22)
        Precision  (0/0)
weinand: 
        Recall 9% (38/402)
        Precision 73% (38/52)

Overall: 
Recall 28% (3679/13018)
Precision 78% (3679/4669)

area:

L10N: 
        Recall 56% (9/16)
        Precision 90% (9/10)
VIM: 
        Recall 0% (0/9)
        Precision  (0/0)
api: 
        Recall 15% (39/246)
        Precision 73% (39/53)
authentication: 
        Recall 9% (1/11)
        Precision 100% (1/1)
breadcrumbs: 
        Recall 55% (21/38)
        Precision 91% (21/23)
callhierarchy: 
        Recall 0% (0/6)
        Precision  (0/0)
color-palette: 
        Recall 0% (0/6)
        Precision  (0/0)
comments: 
        Recall 0% (0/5)
        Precision  (0/0)
config: 
        Recall 0% (0/55)
        Precision  (0/0)
context-keys: 
        Recall 0% (0/2)
        Precision  (0/0)
css-less-scss: 
        Recall 29% (21/72)
        Precision 84% (21/25)
custom-editors: 
        Recall 0% (0/16)
        Precision  (0/0)
debug: 
        Recall 81% (632/771)
        Precision 86% (632/727)
debug-console: 
        Recall 0% (0/33)
        Precision  (0/0)
dialogs: 
        Recall 0% (0/2)
        Precision  (0/0)
diff-editor: 
        Recall 0% (0/24)
        Precision  (0/0)
dropdown: 
        Recall 0% (0/10)
        Precision  (0/0)
editor: 
        Recall 1% (2/144)
        Precision 66% (2/3)
emmet: 
        Recall 38% (23/60)
        Precision 88% (23/26)
error-list: 
        Recall 7% (4/52)
        Precision 100% (4/4)
explorer-custom: 
        Recall 0% (0/38)
        Precision  (0/0)
extension-host: 
        Recall 0% (0/17)
        Precision  (0/0)
extensions: 
        Recall 38% (75/193)
        Precision 82% (75/91)
extensions-development: 
        Recall 0% (0/3)
        Precision  (0/0)
file-decorations: 
        Recall 0% (0/13)
        Precision  (0/0)
file-encoding: 
        Recall 16% (1/6)
        Precision 100% (1/1)
file-explorer: 
        Recall 45% (105/230)
        Precision 79% (105/132)
file-glob: 
        Recall 0% (0/1)
        Precision  (0/0)
file-guess-encoding: 
        Recall 0% (0/2)
        Precision  (0/0)
file-io: 
        Recall 0% (0/24)
        Precision  (0/0)
file-watcher: 
        Recall 0% (0/29)
        Precision  (0/0)
font-rendering: 
        Recall 0% (0/20)
        Precision  (0/0)
formatting: 
        Recall 6% (3/45)
        Precision 75% (3/4)
git: 
        Recall 50% (79/157)
        Precision 71% (79/110)
gpu: 
        Recall 0% (0/19)
        Precision  (0/0)
grammar: 
        Recall 0% (0/91)
        Precision 0% (0/1)
grid-view: 
        Recall 0% (0/3)
        Precision  (0/0)
html: 
        Recall 23% (13/56)
        Precision 65% (13/20)
i18n: 
        Recall 26% (14/52)
        Precision 82% (14/17)
icon-brand: 
        Recall 0% (0/12)
        Precision  (0/0)
icons-product: 
        Recall 0% (0/20)
        Precision  (0/0)
install-update: 
        Recall 11% (6/52)
        Precision 75% (6/8)
integrated-terminal: 
        Recall 66% (207/313)
        Precision 80% (207/258)
integration-test: 
        Recall 0% (0/5)
        Precision  (0/0)
intellisense-config: 
        Recall 0% (0/3)
        Precision  (0/0)
ipc: 
        Recall 0% (0/1)
        Precision  (0/0)
issue-bot: 
        Recall  (0/0)
        Precision  (0/0)
issue-reporter: 
        Recall 6% (1/16)
        Precision 50% (1/2)
javascript: 
        Recall 21% (57/260)
        Precision 71% (57/80)
json: 
        Recall 13% (6/46)
        Precision 85% (6/7)
keybindings: 
        Recall 1% (1/59)
        Precision 100% (1/1)
keybindings-editor: 
        Recall 0% (0/9)
        Precision  (0/0)
keyboard-layout: 
        Recall 0% (0/17)
        Precision  (0/0)
languages-basic: 
        Recall 0% (0/71)
        Precision  (0/0)
languages-diagnostics: 
        Recall 0% (0/5)
        Precision  (0/0)
layout: 
        Recall 13% (10/73)
        Precision 90% (10/11)
lcd-text-rendering: 
        Recall 0% (0/4)
        Precision  (0/0)
list: 
        Recall 0% (0/16)
        Precision  (0/0)
log: 
        Recall 0% (0/4)
        Precision  (0/0)
markdown: 
        Recall 60% (57/94)
        Precision 76% (57/75)
marketplace: 
        Recall 0% (0/2)
        Precision  (0/0)
menus: 
        Recall 0% (0/24)
        Precision  (0/0)
merge-conflict: 
        Recall 9% (1/11)
        Precision 100% (1/1)
notebook: 
        Recall 58% (20/34)
        Precision 90% (20/22)
outline: 
        Recall 15% (3/20)
        Precision 60% (3/5)
output: 
        Recall 6% (1/15)
        Precision 100% (1/1)
perf: 
        Recall 0% (0/44)
        Precision  (0/0)
perf-bloat: 
        Recall 0% (0/1)
        Precision  (0/0)
perf-startup: 
        Recall 0% (0/10)
        Precision  (0/0)
php: 
        Recall 12% (3/24)
        Precision 75% (3/4)
portable-mode: 
        Recall 0% (0/1)
        Precision  (0/0)
proxy: 
        Recall 9% (1/11)
        Precision 100% (1/1)
quick-pick: 
        Recall 7% (6/80)
        Precision 100% (6/6)
references-viewlet: 
        Recall 0% (0/9)
        Precision  (0/0)
release-notes: 
        Recall 0% (0/9)
        Precision  (0/0)
remote: 
        Recall 0% (0/22)
        Precision  (0/0)
remote-explorer: 
        Recall 0% (0/6)
        Precision  (0/0)
rename: 
        Recall 0% (0/7)
        Precision  (0/0)
scm: 
        Recall 14% (8/55)
        Precision 72% (8/11)
screencast-mode: 
        Recall 0% (0/1)
        Precision  (0/0)
search: 
        Recall 35% (56/159)
        Precision 60% (56/92)
search-editor: 
        Recall 0% (0/12)
        Precision  (0/0)
search-replace: 
        Recall 0% (0/10)
        Precision  (0/0)
semantic-tokens: 
        Recall 5% (1/19)
        Precision 100% (1/1)
settings-editor: 
        Recall 28% (37/131)
        Precision 82% (37/45)
settings-sync: 
        Recall 54% (27/50)
        Precision 79% (27/34)
simple-file-dialog: 
        Recall 0% (0/7)
        Precision  (0/0)
smart-select: 
        Recall 0% (0/6)
        Precision  (0/0)
smoke-test: 
        Recall 0% (0/11)
        Precision 0% (0/1)
snap: 
        Recall 0% (0/8)
        Precision  (0/0)
snippets: 
        Recall 52% (24/46)
        Precision 63% (24/38)
splitview: 
        Recall 0% (0/3)
        Precision  (0/0)
suggest: 
        Recall 5% (4/79)
        Precision 100% (4/4)
tasks: 
        Recall 61% (100/162)
        Precision 85% (100/117)
telemetry: 
        Recall 0% (0/4)
        Precision  (0/0)
themes: 
        Recall 22% (23/103)
        Precision 63% (23/36)
timeline: 
        Recall 0% (0/7)
        Precision  (0/0)
timeline-git: 
        Recall 0% (0/4)
        Precision  (0/0)
titlebar: 
        Recall 0% (0/23)
        Precision  (0/0)
tokenization: 
        Recall 0% (0/4)
        Precision  (0/0)
tree: 
        Recall 0% (0/33)
        Precision  (0/0)
typescript: 
        Recall 13% (21/155)
        Precision 77% (21/27)
undo-redo: 
        Recall 0% (0/6)
        Precision  (0/0)
unit-test: 
        Recall 0% (0/2)
        Precision  (0/0)
uri: 
        Recall 0% (0/6)
        Precision  (0/0)
ux: 
        Recall 0% (0/85)
        Precision  (0/0)
variable-resolving: 
        Recall 0% (0/2)
        Precision  (0/0)
vscode-build: 
        Recall 0% (0/18)
        Precision  (0/0)
web: 
        Recall 4% (1/23)
        Precision 100% (1/1)
webview: 
        Recall 20% (9/44)
        Precision 69% (9/13)
workbench: 
        Recall 0% (0/39)
        Precision  (0/0)
workspace-edit: 
        Recall 0% (0/6)
        Precision  (0/0)
workspace-symbols: 
        Recall 0% (0/3)
        Precision  (0/0)
zoom: 
        Recall 0% (0/7)
        Precision  (0/0)

Overall: 
Recall 29% (1733/5784)
Precision 80% (1733/2150)

editor:

editor-autoclosing: 
        Recall 53% (7/13)
        Precision 77% (7/9)
editor-autoindent: 
        Recall 82% (28/34)
        Precision 90% (28/31)
editor-bracket-matching: 
        Recall 18% (2/11)
        Precision 100% (2/2)
editor-clipboard: 
        Recall 60% (9/15)
        Precision 90% (9/10)
editor-code-actions: 
        Recall 72% (24/33)
        Precision 100% (24/24)
editor-code-lens: 
        Recall 70% (7/10)
        Precision 87% (7/8)
editor-color-picker: 
        Recall 77% (7/9)
        Precision 100% (7/7)
editor-columnselect: 
        Recall 60% (3/5)
        Precision 60% (3/5)
editor-commands: 
        Recall 12% (2/16)
        Precision 50% (2/4)
editor-comments: 
        Recall 0% (0/2)
        Precision 0% (0/1)
editor-contrib: 
        Recall 0% (0/21)
        Precision  (0/0)
editor-drag-and-drop: 
        Recall 0% (0/3)
        Precision  (0/0)
editor-error-widget: 
        Recall 0% (0/4)
        Precision 0% (0/1)
editor-find: 
        Recall 91% (53/58)
        Precision 89% (53/59)
editor-folding: 
        Recall 93% (40/43)
        Precision 90% (40/44)
editor-hover: 
        Recall 57% (11/19)
        Precision 78% (11/14)
editor-indent-guides: 
        Recall 0% (0/1)
        Precision 0% (0/1)
editor-input: 
        Recall 0% (0/5)
        Precision  (0/0)
editor-input-IME: 
        Recall 42% (6/14)
        Precision 85% (6/7)
editor-insets: 
        Recall 100% (1/1)
        Precision 100% (1/1)
editor-minimap: 
        Recall 73% (11/15)
        Precision 84% (11/13)
editor-multicursor: 
        Recall 6% (1/16)
        Precision 50% (1/2)
editor-parameter-hints: 
        Recall 70% (7/10)
        Precision 100% (7/7)
editor-render-whitespace: 
        Recall 0% (0/3)
        Precision 0% (0/1)
editor-rendering: 
        Recall 23% (12/52)
        Precision 70% (12/17)
editor-scrollbar: 
        Recall 33% (1/3)
        Precision 100% (1/1)
editor-symbols: 
        Recall 64% (11/17)
        Precision 84% (11/13)
editor-synced-region: 
        Recall 0% (0/2)
        Precision  (0/0)
editor-textbuffer: 
        Recall 66% (4/6)
        Precision 57% (4/7)
editor-theming: 
        Recall 0% (0/3)
        Precision  (0/0)
editor-wordnav: 
        Recall 0% (0/2)
        Precision  (0/0)
editor-wrapping: 
        Recall 43% (7/16)
        Precision 100% (7/7)

Overall: 
Recall 54% (254/462)
Precision 85% (254/296)

workbench:

workbench-cli: 
        Recall 5% (1/18)
        Precision 50% (1/2)
workbench-diagnostics: 
        Recall 60% (9/15)
        Precision 100% (9/9)
workbench-dnd: 
        Recall 9% (1/11)
        Precision 100% (1/1)
workbench-editor-grid: 
        Recall 60% (12/20)
        Precision 85% (12/14)
workbench-editors: 
        Recall 38% (22/57)
        Precision 68% (22/32)
workbench-electron: 
        Recall 0% (0/4)
        Precision  (0/0)
workbench-feedback: 
        Recall 0% (0/3)
        Precision  (0/0)
workbench-history: 
        Recall 27% (3/11)
        Precision 100% (3/3)
workbench-hot-exit: 
        Recall 64% (16/25)
        Precision 94% (16/17)
workbench-launch: 
        Recall  (0/0)
        Precision  (0/0)
workbench-link: 
        Recall 75% (3/4)
        Precision 100% (3/3)
workbench-multiroot: 
        Recall 86% (59/68)
        Precision 92% (59/64)
workbench-notifications: 
        Recall 77% (17/22)
        Precision 89% (17/19)
workbench-os-integration: 
        Recall 20% (6/30)
        Precision 60% (6/10)
workbench-rapid-render: 
        Recall 25% (1/4)
        Precision 100% (1/1)
workbench-run-as-admin: 
        Recall 50% (1/2)
        Precision 100% (1/1)
workbench-state: 
        Recall 8% (1/12)
        Precision 100% (1/1)
workbench-status: 
        Recall 37% (3/8)
        Precision 100% (3/3)
workbench-tabs: 
        Recall 67% (43/64)
        Precision 75% (43/57)
workbench-touchbar: 
        Recall 80% (4/5)
        Precision 100% (4/4)
workbench-views: 
        Recall 61% (24/39)
        Precision 85% (24/28)
workbench-welcome: 
        Recall 58% (14/24)
        Precision 100% (14/14)
workbench-window: 
        Recall 12% (2/16)
        Precision 100% (2/2)
workbench-zen: 
        Recall 77% (7/9)
        Precision 100% (7/7)

Overall: 
Recall 52% (249/471)
Precision 85% (249/292)

@github-actions github-actions bot locked and limited conversation to collaborators Jun 21, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants