Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Massive hack to prevent attempted mount of non-existent PVC 💀 #1069

Closed
wants to merge 1 commit into from

Conversation

bobcatfish
Copy link
Collaborator

Changes

In #1007 @dlorenc and I tried to fix the case where a PVC wasn't needed
for output -> input linking and it was being created anyway. What we
forgot to do was check to see where that PVC was being mounted. It turns
out that if a TaskRun has an output and is created by a PipelineRun
(this is checked via the owner reference), the TaskRun assumes it needs
to mount the volume and further adds containers to copy the output data
to the (possibly) non-existent PVC. @castlemilk caught this problem in #1068.

The real fix here is probably going to involve an interface change b/c
we can't assume that just being owned by a PipelineRun means that a
linking PVC is going to be involved. This commit is a terrible and
probably race condition hack to make it so that if the PVC the TaskRun
is expecting doesn't exist, it doesn't attempt to add containers that
will copy data to it.

Making the hack even worse is that instead of adding more actual unit
tests, I updated the test to run all the existing unit tests twice, once
with this PVC existing and once with it not, and I manipulated the test
so that in the case where it doesn't exist, the expected outcome is
different. This is a terrible way to write tests and I hope we either
don't merge this or we fix it quickly afterward. @dlorenc and I are
going to work on a better fix tomorrow.

I also modified our end to end PipelineRun test to include an output
resource so we could reproduce the issue that @castlemilk reported.

p.s. the change to output_resource_test isn't as bad as it looks but the fact that everything is wrapped in a for loop really confuses the diff tools :(

Submitter Checklist

These are the criteria that every PR should meet, please check them off as you
review them:

  • Includes tests (if functionality changed/added)
  • [n/a] Includes docs (if user facing)
  • Commit messages follow commit message best practices
  • Some of the most terrible test code I've ever written? Maybe

See the contribution guide for more details.

Double check this list of stuff that's easy to miss:

Release Notes

Works around the issue where if Pipelines use Tasks with Outputs, the TaskRuns will try to mount and use the output -> input linking PVC, even when it doesn't exist, resulting in unschedulable pods :(

@googlebot googlebot added the cla: yes Trying to make the CLA bot happy with ppl from different companies work on one commit label Jul 12, 2019
@tekton-robot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bobcatfish

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@tekton-robot tekton-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Jul 12, 2019
@bobcatfish
Copy link
Collaborator Author

p.s. if we decide not to merge this no hard feelings at all; tomorrow @dlorenc and I are going to see if we can add a better fix quickly, if not, this or something like it could be an okay (ugh) interim solution

In tektoncd#1007 @dlorenc and I tried to fix the case where a PVC wasn't needed
for output -> input linking and it was being created anyway. What we
forgot to do was check to see where that PVC was being mounted. It turns
out that if a TaskRun has an output and is created by a PipelineRun
(this is checked via the owner reference), the TaskRun assumes it needs
to mount the volume and further adds containers to copy the output data
to the (possibly) non-existent PVC. @castlemilk caught this problem in tektoncd#1068.

The real fix here is probably going to involve an interface change b/c
we can't assume that just being owned by a PipelineRun means that a
linking PVC is going to be involved. This commit is a terrible and
probably race condition hack to make it so that if the PVC the TaskRun
is expecting doesn't exist, it doesn't attempt to add containers that
will copy data to it.

Making the hack even worse is that instead of adding more actual unit
tests, I updated the test to run all the existing unit tests twice, once
with this PVC existing and once with it not, and I manipulated the test
so that in the case where it doesn't exist, the expected outcome is
different. This is a terrible way to write tests and I hope we either
don't merge this or we fix it quickly afterward. @dlorenc and I are
going to work on a better fix tomorrow.

I also modified our end to end PipelineRun test to include an output
resource so we could reproduce the issue that @castlemilk reported.
@tekton-robot
Copy link
Collaborator

@bobcatfish: The following tests failed, say /retest to rerun them all:

Test name Commit Details Rerun command
pull-tekton-pipeline-unit-tests c2c1ce2 link /test tekton-pipeline-unit-tests
pull-tekton-pipeline-integration-tests c2c1ce2 link /test pull-tekton-pipeline-integration-tests

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@vdemeester
Copy link
Member

@bobcatfish @dlorenc is it better to add this hack in 0.5.x or try to revert that feature (in 0.5.x) and rework it better for next release ? 👼

@bobcatfish
Copy link
Collaborator Author

omg reverting it is a way better idea XD i should have just done that last night instead of spending a whole evening making this monstrosity (it was kind of fun tho? 😅 )

is it better to add this hack in 0.5.x

If we're gonna fix stuff in feature branches i think it's important we make the same change in master otherwise itll be hard to keep track of what is where

@bobcatfish
Copy link
Collaborator Author

Closing this in favor of #1071

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cla: yes Trying to make the CLA bot happy with ppl from different companies work on one commit size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants