Skip to content
This repository has been archived by the owner on Oct 22, 2021. It is now read-only.

kubecf-ci: pull-request-resource not picking up updates to PRs #903

Open
bikramnehra opened this issue May 20, 2020 · 10 comments
Open

kubecf-ci: pull-request-resource not picking up updates to PRs #903

bikramnehra opened this issue May 20, 2020 · 10 comments
Labels
Priority: High Size: 5 Type: CI Issues for working on the project's CI pipelines

Comments

@bikramnehra
Copy link
Member

Describe the bug
We are seeing issues in pipeline where updates to PRs are not triggering new builds. This is mostly happening in cases where the PRs are old.

To Reproduce
Make an update to existing PR and see if it gets picked up by: https://concourse.suse.dev/teams/main/pipelines/kubecf/resources/kubecf-pr

Expected behavior
concourse should pick all updates to PRs and build them

Environment

  • CF Version: NA
  • Component/Plugin Name: NA
  • Others: NA

Additional context
Also, see: telia-oss/github-pr-resource#26

@bikramnehra bikramnehra added Priority: High SUSE SUSE is pursuing a solution Type: CI Issues for working on the project's CI pipelines labels May 20, 2020
@andreas-kupries
Copy link
Contributor

andreas-kupries commented May 20, 2020

I am proposing to open an issue at the aforementioned github repo with the following text.

Note, the only way I see to fix the issue locally is to fork the resource, change it, then make our own image for it and use that in the pipeline.

Proposed ticket

Hello. We believe to have found a serious issue in the implementation of this resource which causes it to semi-deterministically ignore eligible PRs. While this issue may look at first glance to be (# 26) it is not the same.

The problem looks to be in the interaction of the internal optimization to quickly skip already seen commits and filtering strategies like required_review_approvals, labels, etc.

The issue is that these filter strategies can make any arbitrarily old PR visible much later, namely when such a PR changes its state, i.e. getting the necessary approvals, getting one of the looked-for labels, etc.

There is a high chance that by the time the PR changes state and thus becomes eligible the already-seen threshold has moved beyond it and thus prevents the remainder of the Check code to see this PR. Despite it being eligible now.

While there are workarounds (See (x) below), these are very inconvenient, and then there is the necessity and problem of actually having to recognize when this issue has happened, and/or having to be continuously on the lookout for such.

IMHO the better solution is to completely remove the already-seen optimization, so that any old PR will be caught when it changes eligibility. This could be mixed with checking if there are filters like required_review_approvals in play at all and disabling the optimization only if that is true.

I can see that this may not be something you wish to do, as it may slow the check down if the set of possible PRs is ever-growing, or simply always quite large.

A suitable compromise may be the addition of (another) parameter which enables the user of the resource to control the already-seen optimization at their discretion. This could be a simple boolean, i.e. on/off, or something like a time interval. The latter would control how far back from the actual threshold to still check old PRs for state changes. Then, if it is, for example, expected that a PR will get an approval within 2 days, the parameter can be set accordingly, ensuring that all PRs within 2 days of the threshold are still checked. A special value for the interval could be used to completely disable the threshold, if the user has the desire to do so.


(x) Any git operation which updates the tip of the PR to a commit whose timestamp is before the threshold. I.e. pushing an empty change, or amend the head commit of the PR, etc. And even this is subject to race as it may require re-approval, re-labeling, etc. and while that is done by the user the threshold may have jumped over it again because the resource checked just while that was going on.

@andreas-kupries
Copy link
Contributor

@jandubois
Copy link
Member

as it may slow the check down if the set of possible PRs is ever-growing, or simply always quite large.

Given that we aim to do trunk-based development, I think it is sensible to assume that we'll keep open PRs to a sensible low number. If we end up with over 50 open PRs at a single point in time, we are obviously failing somehow.

The biggest threat I see from bot-created PRs, like buildpack and stemcell bumps. They just need to be either merged or closed in a timely manner.

For other PRs, I think it would be good if CI could skip Draft PRs; not sure if this is already supported or not.

@andreas-kupries
Copy link
Contributor

Note that the text is aimed at upstream, trying to anticipate an objection. For ourselves I see no problem to have this thing completely disabled based on the number of PRs we usually have open. Especially as this is also written in go, i.e. in the end it is machine code that runs, not some scripting lang VM.

@andreas-kupries
Copy link
Contributor

Created upstream ticket: telia-oss/github-pr-resource#204

@viovanov
Copy link
Member

viovanov commented Jun 18, 2020

Things left to do:

  • Update the pipeline resource definition to use our fork
  • Fly the pipeline and see if the issue was fixed

@jimmykarily jimmykarily self-assigned this Jun 22, 2020
@jimmykarily
Copy link
Collaborator

This is the fix we need to apply: SUSE/github-pr-resource@3ee0816

@jimmykarily
Copy link
Collaborator

Deployed the pipeline with a manually built image from our fork. Let's monitor the pipeline for a while to see if the issue if fixed.

@jandubois
Copy link
Member

Not sure if should be part of the acceptance criteria for this issue, or if it requires a new one, but I expected that also external PRs would be triggered once they are approved. This still does not seem to be the case in #967

@jandubois
Copy link
Member

Deployed the pipeline with a manually built image from our fork.

I added an additional commit to Andreas' PR: SUSE/github-pr-resource@e92ede5

I've also deployed the pipeline with a locally built image: #1168

Note that the ak-kubecf-903 branch does not build because it fails some tests; here are Andreas' build instructions, that I also followed to get the working image:

I removed "github.com/telia-oss/github-pr-resource/fakes" and the entire func TestCheck locally

@fargozhu fargozhu removed the SUSE SUSE is pursuing a solution label Oct 12, 2020
@fargozhu fargozhu added this to the 2.7.0 milestone Oct 22, 2020
@jimmykarily jimmykarily removed their assignment Nov 11, 2020
@jandubois jandubois removed this from the 2.7.0 milestone Mar 16, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Priority: High Size: 5 Type: CI Issues for working on the project's CI pipelines
Projects
None yet
Development

No branches or pull requests

6 participants