Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: RFC: Rework vmcheck to use STI qcow2 inventory #1362

Closed
wants to merge 1 commit into from

Conversation

cgwalters
Copy link
Member

While we're having reboot+ansible issues in ostree related
to this, I really like the ability to pass it a qcow2 rather
than the "BYO ssh-config" model. Further, the vmcheck code
was full of workarounds for trying to reuse VMs between tests.

The high level of this code is you can now do locally in development:
export TEST_SUBJECTS=/srv/libvirt/images-gold/Fedora-Atomic-27-20180326.1.x86_64.qcow2
or whatever. Then:
make && make vmcheck TESTS="misc-1 misc-2 layering-relayer"
will spawn those tests, each in a clean VM. A much bigger benefit
is that I reworked the tests to use parallel like the others,
and now if you set VMCHECK_PARALLEL=4 you'll get 4 parallel VMs.
Since each VM just has 512MB of RAM today I have it set to 8 locally.

Requires: https://pagure.io/standard-test-roles/pull-request/188

@rh-atomic-bot
Copy link

💥 Invalid .papr.yml: failed to parse 1st testsuite: Schema validation failed:

  • Value '6' is not of type 'str'. Path: '/env/VMCHECK_PARALLEL'..

@cgwalters
Copy link
Member Author

Oh right, need to make this work for the C7 build too.

@jlebon jlebon added the WIP label May 8, 2018
cgwalters added a commit to cgwalters/rpm-ostree that referenced this pull request May 8, 2018
Prep for reworking the primary test to do vm-in-container, which
will temporarily be vm-in-container-in-vm.

See coreos#1362
rh-atomic-bot pushed a commit that referenced this pull request May 9, 2018
Prep for reworking the primary test to do vm-in-container, which
will temporarily be vm-in-container-in-vm.

See #1362

Closes: #1364
Approved by: jlebon
rh-atomic-bot pushed a commit that referenced this pull request May 9, 2018
Prep for reworking the primary test to do vm-in-container, which
will temporarily be vm-in-container-in-vm.

See #1362

Closes: #1364
Approved by: jlebon
rh-atomic-bot pushed a commit that referenced this pull request May 10, 2018
Prep for reworking the primary test to do vm-in-container, which
will temporarily be vm-in-container-in-vm.

See #1362

Closes: #1364
Approved by: jlebon
rh-atomic-bot pushed a commit that referenced this pull request May 10, 2018
Prep for reworking the primary test to do vm-in-container, which
will temporarily be vm-in-container-in-vm.

See #1362

Closes: #1364
Approved by: jlebon
rh-atomic-bot pushed a commit that referenced this pull request May 10, 2018
Prep for reworking the primary test to do vm-in-container, which
will temporarily be vm-in-container-in-vm.

See #1362

Closes: #1364
Approved by: jlebon
rh-atomic-bot pushed a commit that referenced this pull request May 10, 2018
Prep for reworking the primary test to do vm-in-container, which
will temporarily be vm-in-container-in-vm.

See #1362

Closes: #1364
Approved by: jlebon
rh-atomic-bot pushed a commit that referenced this pull request May 10, 2018
Prep for reworking the primary test to do vm-in-container, which
will temporarily be vm-in-container-in-vm.

See #1362

Closes: #1364
Approved by: jlebon
rh-atomic-bot pushed a commit that referenced this pull request May 11, 2018
Prep for reworking the primary test to do vm-in-container, which
will temporarily be vm-in-container-in-vm.

See #1362

Closes: #1364
Approved by: jlebon
cgwalters added a commit to cgwalters/rpm-ostree that referenced this pull request May 11, 2018
Prep for reworking the primary test to do vm-in-container, which
will temporarily be vm-in-container-in-vm.

See coreos#1362
rh-atomic-bot pushed a commit that referenced this pull request May 11, 2018
Prep for reworking the primary test to do vm-in-container, which
will temporarily be vm-in-container-in-vm.

See #1362

Closes: #1364
Approved by: jlebon
Copy link
Member

@jlebon jlebon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I definitely like the benefits of one VM per test. And I also like that we're playing to the strength of STR and let it boil down to just an SSH config.

One thing I'm unsure of is getting rid of all the "undo" logic. How do you see local development in this model? E.g. I have my one pet VM that I use for all testing. And the fact that I can make vmcheck TESTS='foobar' and not have to undo things after is nice. I guess we could keep it around but only actually run it if an SSH config was provided?

.papr.yml Outdated
image: registry.fedoraproject.org/fedora:27
tests:
- cd /etc/yum.repos.d/ && curl -L -O https://copr.fedorainfracloud.org/coprs/walters/oci-kvm-hook/repo/fedora-27/walters-oci-kvm-hook-fedora-27.repo
- rpm-ostree install oci-kvm-hook && rpm-ostree ex livefs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're passing /dev/kvm anyway, then there's no point in installing oci-kvm-hook, right?

dn=$(cd $(dirname $0) && pwd)

# Preparatory work; we have a helper binary
make inject-pkglist
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should already have been roped in by the make target, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, but it's convenient to run the script directly too.

# in separating them.
(for tf in ${tests}; do echo $tf; done) | \
parallel -v -j ${VMCHECK_PARALLEL:-1} --progress --halt soon,fail=1 \
--results ${LOGDIR} --quote /bin/sh -c "${dn}/run-one-test.sh {} 2>&1" |& tail
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need |& tail here?

EOF
exit 1
fi
for subj in ${TEST_SUBJECTS}; do
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does it even mean for us if TEST_SUBJECTS includes multiple qcow2s? Should we check for that and error out if so?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feels like a bit of a "doctor it hurts when I..." situation? I started reading a stackoverflow thing on bash arrays but then my eyes started glazing over...

echo "FAILED: ${tf}"
vm_cmd 'journalctl --no-pager || true' > ${JOURNAL_LOG} || true
if test -z "${TEST_DEBUG:-}" &&
test -n "${VMCHECK_TMPD:-}" &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this something set by a human, or supposed to be set in the STI path? I don't see it set anywhere.

.papr.yml Outdated

env:
HOSTS: vmcheck1 vmcheck2 vmcheck3
# each VM is 1024MB, so this is 3072MB, leaving 1G for the OS
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The commit description says 512M, and here it says 1024M. How can we force it to 512M? If we can get away with that, that'd be awesome!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't currently, but we should investigate configurability there.


export VM=${VM:-vmcheck}
export SSH_CONFIG=${SSH_CONFIG:-${topsrcdir}/ssh-config}
# then use the standard test interface to boot one.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment looks out of place.

rh-atomic-bot pushed a commit that referenced this pull request May 11, 2018
Prep for reworking the primary test to do vm-in-container, which
will temporarily be vm-in-container-in-vm.

See #1362

Closes: #1364
Approved by: jlebon
rh-atomic-bot pushed a commit that referenced this pull request May 11, 2018
Prep for reworking the primary test to do vm-in-container, which
will temporarily be vm-in-container-in-vm.

See #1362

Closes: #1364
Approved by: jlebon
rh-atomic-bot pushed a commit that referenced this pull request May 12, 2018
Prep for reworking the primary test to do vm-in-container, which
will temporarily be vm-in-container-in-vm.

See #1362

Closes: #1364
Approved by: jlebon
rh-atomic-bot pushed a commit that referenced this pull request May 14, 2018
Prep for reworking the primary test to do vm-in-container, which
will temporarily be vm-in-container-in-vm.

See #1362

Closes: #1364
Approved by: jlebon
rh-atomic-bot pushed a commit that referenced this pull request May 14, 2018
Prep for reworking the primary test to do vm-in-container, which
will temporarily be vm-in-container-in-vm.

See #1362

Closes: #1364
Approved by: jlebon
@cgwalters
Copy link
Member Author

One thing I'm unsure of is getting rid of all the "undo" logic. How do you see local development in this model? E.g. I have my one pet VM that I use for all testing. And the fact that I can make vmcheck TESTS='foobar' and not have to undo things after is nice. I guess we could keep it around but only actually run it if an SSH config was provided?

Why have the pet though in this model? Download the qcow2 and spawn a fresh VM each time. If you want to debug, set TEST_DEBUG=1 to have the VM persist after.

rh-atomic-bot pushed a commit that referenced this pull request May 14, 2018
Prep for reworking the primary test to do vm-in-container, which
will temporarily be vm-in-container-in-vm.

See #1362

Closes: #1364
Approved by: jlebon
@jlebon
Copy link
Member

jlebon commented May 14, 2018

Yeah, I could get used to that. Just contrasting it to my current workflow. I do customize the VM to make debugging easier (e.g. set up some dotfiles and a gdb container), though that stuff can easily be (and should be) streamlined.

@cgwalters
Copy link
Member Author

Ah, yeah. Well, one could do the "customize gold image, then save" model. Though for core dumps today I end up extracting them back to my dev container.

@rh-atomic-bot
Copy link

☔ The latest upstream changes (presumably 38b11d3) made this pull request unmergeable. Please resolve the merge conflicts.

rh-atomic-bot pushed a commit that referenced this pull request May 14, 2018
Prep for reworking the primary test to do vm-in-container, which
will temporarily be vm-in-container-in-vm.

See #1362

Closes: #1364
Approved by: jlebon
@cgwalters cgwalters force-pushed the vmcheck-sti branch 2 times, most recently from 42301a3 to ffb079a Compare May 15, 2018 18:15
@rh-atomic-bot
Copy link

☔ The latest upstream changes (presumably 592d605) made this pull request unmergeable. Please resolve the merge conflicts.

@cgwalters cgwalters force-pushed the vmcheck-sti branch 5 times, most recently from 2009e5f to da334eb Compare May 30, 2018 15:24
@cgwalters
Copy link
Member Author

One immediate downside here is that the singleton contexts are quite slow. It feels like this circles back somewhat to projectatomic/papr#62 - ideally we'd do a pod per vmcheck test.

An interesting thing the https://github.com/openshift/ci-operator does is set up a per-PR kube namespace - this makes it possible to more safely have the per-repository code create pods dynamically as well.

@jlebon
Copy link
Member

jlebon commented Jun 4, 2018

bot, retest this please

1 similar comment
@cgwalters
Copy link
Member Author

bot, retest this please

@cgwalters
Copy link
Member Author

Ugh, the rpm-md repo flakes...

bot, retest this please

@cgwalters
Copy link
Member Author

bot, retest this please

1 similar comment
@jlebon
Copy link
Member

jlebon commented Jun 6, 2018

bot, retest this please

@cgwalters cgwalters force-pushed the vmcheck-sti branch 2 times, most recently from d6cae83 to c753617 Compare June 8, 2018 22:13
@cgwalters
Copy link
Member Author

bot, retest this please

@cgwalters
Copy link
Member Author

OK, so this is blocked by the same perf issue. Locally (4 cores, NVMe), running with VMCHECK_PARALLEL=8:

Mon Jun 18 13:03:44 UTC 2018 overlay: Starting
Mon Jun 18 13:03:56 UTC 2018 overlay: Checkout complete
Mon Jun 18 13:04:33 UTC 2018 overlay: Commit complete

But in CI:

Sat Jun 16 12:59:50 UTC 2018 overlay: Starting
Sat Jun 16 13:01:30 UTC 2018 overlay: Checkout complete

And we don't even get to the commit phase. The checkout is 10x slower. And given doing the commit is ~100s locally, that means we're estimated to be looking at ~16minutes just to prepare the VM; that's pretty nuts. I'm going to need to do some perf investigation - whether this is us doing qemu wrong, unexpected nested virt overhead, etc.

@cgwalters
Copy link
Member Author

@cgwalters
Copy link
Member Author

bot, retest this please

3 similar comments
@jlebon
Copy link
Member

jlebon commented Jun 20, 2018

bot, retest this please

@cgwalters
Copy link
Member Author

bot, retest this please

@cgwalters
Copy link
Member Author

bot, retest this please

@cgwalters
Copy link
Member Author

Hm, we definitely have KVM nested, it's like it's still not being accelerated though for some reason.

@rh-atomic-bot
Copy link

☔ The latest upstream changes (presumably caf66d6) made this pull request unmergeable. Please resolve the merge conflicts.

While we're having reboot+ansible issues in ostree related
to this, I really like the ability to pass it a qcow2 rather
than the "BYO ssh-config" model.  Further, the vmcheck code
was full of workarounds for trying to reuse VMs between tests.

The high level of this code is you can now do locally in development:
`export TEST_SUBJECTS=/srv/libvirt/images-gold/Fedora-Atomic-27-20180326.1.x86_64.qcow2`
or whatever.  Then:
`make && make vmcheck TESTS="misc-1 misc-2 layering-relayer"`
will spawn those tests, each in a clean VM.  A much bigger benefit
is that I reworked the tests to use `parallel` like the others,
and now if you set `VMCHECK_PARALLEL=4` you'll get 4 parallel VMs.
Since each VM has 1024MB of RAM today I have it set to 8 locally.

Requires: https://pagure.io/standard-test-roles/pull-request/188
@rh-atomic-bot
Copy link

☔ The latest upstream changes (presumably b6d0748) made this pull request unmergeable. Please resolve the merge conflicts.

@cgwalters
Copy link
Member Author

This is obsoleted by kola.

@cgwalters cgwalters closed this Sep 24, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants