-
-
Notifications
You must be signed in to change notification settings - Fork 8.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[JENKINS-73835] Do not allow builds to be deleted while they are still running and ensure build discarders run after builds are fully complete #9810
Conversation
…mplete and always call Job.logRotate after build finalization
|
||
try { | ||
getParent().logRotate(); | ||
} catch (Exception e) { | ||
LOGGER.log(Level.SEVERE, "Failed to rotate log", e); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved this into GlobalBuildDiscarderListener
so that isLogUpdated() == true
when log rotation runs. It still runs synchronously, it just runs a little bit later now down in onEndBuilding
. This allows us to check isLogUpdated()
elsewhere to avoid race conditions with Pipeline builds.
@@ -250,7 +250,7 @@ private boolean shouldKeepRun(Run r, Run lsb, Run lstb) { | |||
LOGGER.log(FINER, "{0} is not to be removed or purged of artifacts because it’s the last stable build", r); | |||
return true; | |||
} | |||
if (r.isBuilding()) { | |||
if (r.isLogUpdated()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is to avoid race conditions involving Pipeline builds. Previously it was possible for log rotation to delete builds which had not yet fully completed, which meant that a build.xml
file could be written back out into the build directory after log rotation had deleted the build. I will try to demonstrate this downstream in workflow-job
once I have an incremental build here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you confident that this is safe for freestyle builds? I.e., that !logUpdated → !building
or conversely that building → logUpdated
? The lifecycles for AbstractBuild
and WorkflowRun
are pretty different and the Javadoc has always been vague.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am pretty sure it is safe, but if you prefer we can switch to isBuilding() || isLogUpdated()
like the check for the UI in Jelly here (or maybe we should simplify that to only check isLogUpdated
as well). See the non-Pipeline implementations here and the state enum here.
For !logUpdated → !building
,!isLogUpdated
is only true in State.COMPLETED
, which ensures !isBuilding
since COMPLETED
is after POST_PRODUCTION
.
The converse building → logUpdated
should be ok as well, since if isBuilding
then we must be in State.NOT_STARTED
or State.BUILIDING
, both of which are not COMPLETED
, so isLogUpdated
will be true.
For Pipeline things are simpler since WorkflowRun.isLogUpdated
delegates to WorkflowRun.isBuilding
.
Also just as a data point, if isLogUpdated
could not be used on its own for non-pipelines, I think we would have seen it cause issues in tests due to the usage in JenkinsRule.waitForCompletion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(And as far as I know the states for a Run
can only move forward, although looking at the code I guess a subclass that calls some of the protected methods incorrectly could cause random state changes.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as far as I know the states for a
Run
can only move forward
That looks right.
like the check for the UI in Jelly
Not sure what you mean.
<j:if test="${!it.building and !it.keepLog}"> |
building
and ought to be switched to only check logUpdated
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤦 I misread keepLog
as logUpdated
. I'll update that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I fixed that in f676be5.
} catch (Exception e) { | ||
LOGGER.log(Level.WARNING, e, () -> "Failed to rotate log for " + run); | ||
} | ||
// Avoid calling Job.logRotate twice in case JobGlobalBuildDiscarderStrategy is configured globally. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit confusing, but essentially the old behavior was that Run.run
called Job.logRotate
while the build was in POST_PRODUCTION
state, but also ever since #4368 if the configuration of GlobalBuildDiscarderConfiguration
included JobGlobalBuildDiscarderStrategy
(which it does by default), we also called Job.logRotate
here while the build was in COMPLETED
state. For backwards compatibility I think we need to ensure it gets called at least once regardless of the global configuration, and it's preferable to call it here in onFinalized
so that we can check Run.isLogUpdated
in LogRotator.shouldKeepRun
and Run.delete
to avoid race conditions with Pipeline builds. We avoid the redundant call by filtering out the relevant strategy when processing the globally-configured discarders.
Also, the old behavior is why I am not concerned about removing this call in workflow-job
and also why I am not worried about making the above call asynchronous for the sake of Pipeline builds. BackgroundGlobalBuildDiscarder.processJob
has been (redundantly) calling logRotate
synchronously for years now in default configurations, so it seems that the asynchronous behavior in jenkinsci/workflow-job-plugin@63fdbe8 is no longer critical.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(jenkinsci/workflow-job-plugin#70 for better linking)
@@ -96,6 +101,17 @@ public void successVsFailureWithRemoveLastBuild() throws Exception { | |||
assertEquals(2, numberOf(project.getLastFailedBuild())); | |||
} | |||
|
|||
@Test | |||
public void ableToDeleteCurrentBuild() throws Exception { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, this was to check that the switch to isLogUpdated
in LogRotator.shouldKeepRun
did not break anything, since previously the call in Run.run
would not have been able to delete the build with the switch. Because of the redundant call via GlobalBuildDiscarderListener
though it worked anyway, the only difference was a log warning from the deleted code in Run.run
, which we can't assert against any more. So maybe this test is no longer checking anything useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess, in principle, someone could want to keep zero builds if they were using publishers (GitHub Checks, Slack, etc.) to track failures?
…essEvenTheBuildIsRunning to match new behavior
@@ -139,24 +136,18 @@ public class DeleteBuildsCommandTest { | |||
assertThat(result.stdout(), containsString("Deleted 0 builds")); | |||
} | |||
|
|||
@Test public void deleteBuildsShouldSuccessEvenTheBuildIsRunning() throws Exception { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not notice this test before. It was added in #2310 along with all of the other tests for delete-builds
.
I still think the previous behavior was undesirable - deleting builds that are still running and reachable in memory does not make sense, can lead to tons of errors in the Jenkins logs, and any builds deleted while running are very likely to come back after a Jenkins restart given that build completion will cause build.xml
to be rewritten.
I can see the argument though that Run.delete
should work like Job.delete
and cancel the build if it is running, waiting up to 15 seconds and only throwing an exception if the build still hasn't completed at that point. @jglick suggested this approach as well, but I was hoping to avoid adding complexity here unless we really think that behavior is necessary. Any opinions on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No strong opinion. I think the proposed error behavior is intuitive enough. You can explicitly abort the build first if that is what you intended.
/label ready-for-merge This PR is now ready for merge, after ~24 hours, we will merge it if there's no negative feedback. Thanks! |
|
I'll take a look at |
jenkinsci/lockable-resources-plugin#716 deletes the test. |
…l running and ensure build discarders run after builds are fully complete (jenkinsci#9810) * [JENKINS-73835] Do not allow builds to be deleted while they are still running * [JENKINS-73835] Avoid redundant calls to Job.logRotate when builds complete and always call Job.logRotate after build finalization * [JENKINS-73835] Add issue reference to RunTest.buildsMayNotBeDeletedWhileRunning * [JENKINS-73835] Adjust DeleteBuildsCommandTest.deleteBuildsShouldSuccessEvenTheBuildIsRunning to match new behavior * [JENKINS-73835] Run/delete.jelly should check Run.isLogUpdated, not Run.isBuilding
…l running and ensure build discarders run after builds are fully complete (jenkinsci#9810) * [JENKINS-73835] Do not allow builds to be deleted while they are still running * [JENKINS-73835] Avoid redundant calls to Job.logRotate when builds complete and always call Job.logRotate after build finalization * [JENKINS-73835] Add issue reference to RunTest.buildsMayNotBeDeletedWhileRunning * [JENKINS-73835] Adjust DeleteBuildsCommandTest.deleteBuildsShouldSuccessEvenTheBuildIsRunning to match new behavior * [JENKINS-73835] Run/delete.jelly should check Run.isLogUpdated, not Run.isBuilding (cherry picked from commit d34b17e)
Ever since this PR was merged,
|
Yes, looks like the test needs to forcibly stop the build. I will file a PR to fix it in a bit. |
I filed #9876 to stabilize that test. |
logRotator.setRemoveLastBuild(true); | ||
p.setBuildDiscarder(logRotator); | ||
j.buildAndAssertStatus(Result.SUCCESS, p); | ||
assertNull(p.getBuildByNumber(1)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Failure in https://github.com/jenkinsci/jenkins/pull/9921/checks?check_run_id=32166988156 (#9921) looks like a flake? Missing await
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know what we would await
for, because log rotation runs synchronously during build completion. Unless this is a test-specific timing issue due to JenkinsRule.buildAndAssertSuccess
just waiting for QueueTaskFuture.get
instead of !Run.isLogUpdated
, then this test probably just needs to be skipped on Windows.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From some testing, j.buildAndAssertStatus(Result.SUCCESS, p)
is guaranteed to block until GlobalBuildDiscarderListener.onFinalized
has completed. Perhaps the BuildWatcher
background thread tried to copy the logs at an inopportune time and caused log rotation to fail. Seems best to delete the BuildWatcher
here in case it made any of the other tests flaky on Windows and then probably skip this test explicitly on Windows as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also the AccessDeniedException
seems strange. Either way, I filed #9923.
See JENKINS-73835 and jenkinsci/workflow-job-plugin#470. I noticed this while investigating #9790, but it is a distinct issue.
This PR makes
Run.delete
throw an exception if it is called on a build which has not yet completed. It also adjusts some logic related toLogRotator
, which is the main programmatic caller ofRun.delete
to avoid some race conditions and related issues.Testing done
See new automated PRs.
Proposed changelog entries
Proposed upgrade guidelines
N/A
Submitter checklist
Desired reviewers
Before the changes are marked as
ready-for-merge
:Maintainer checklist