Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ci: Rework installed tests to use Fedora Standard Test interface #1462

Closed

Conversation

cgwalters
Copy link
Member

Reusing the way standard-test-roles has support for booting
a qcow2 actually gets us to the "VM-in-container" flow. Plus
Ansible over shell script is sometimes nicer.

https://fedoraproject.org/wiki/CI/Tests#Testing_an_Atomic_Host

This is just an experiment to see if this works; right now we aren't
even injecting the built ostree into the VM, or doing any interesting tests.

One thing that I'm sure we're going to hit is a need to cache the qcow2 in the
CI environment.

@jlebon
Copy link
Member

jlebon commented Feb 22, 2018

Cool! Though we don't have oci-kvm-hook right now installed on the CI nodes so no /dev/kvm in your container. I'll get it added.

jlebon added a commit to projectatomic/papr that referenced this pull request Feb 22, 2018
For reasons I haven't dug in too much, `oci-kvm-hook` doesn't work on
F25 (though it does on F27). Howver, the nodes are currently stuck at
F25 because of severe issues with `docker cp` in newer version
(rhbz#1489505). So let's just explicitly pass /dev/kvm into the
container, which is all `oci-kvm-hook` is trying to do anyway. I've
verified this works.

This will unblock ostreedev/ostree#1462.
jlebon added a commit to jlebon/papr that referenced this pull request Feb 23, 2018
For reasons I can't quite figure out, `oci-kvm-hook` doesn't work on F25
(though it does on F27). The nodes are currently stuck at F25 because of
severe issues with `docker cp` in newer version (rhbz#1489505). So let's
just explicitly pass /dev/kvm into the container, which is all
`oci-kvm-hook` is trying to do anyway. I've verified this works.

This will unblock ostreedev/ostree#1462.
rh-atomic-bot pushed a commit to projectatomic/papr that referenced this pull request Feb 23, 2018
For reasons I can't quite figure out, `oci-kvm-hook` doesn't work on F25
(though it does on F27). The nodes are currently stuck at F25 because of
severe issues with `docker cp` in newer version (rhbz#1489505). So let's
just explicitly pass /dev/kvm into the container, which is all
`oci-kvm-hook` is trying to do anyway. I've verified this works.

This will unblock ostreedev/ostree#1462.

Closes: #88
Approved by: cgwalters
rh-atomic-bot pushed a commit to projectatomic/papr that referenced this pull request Feb 23, 2018
For reasons I can't quite figure out, `oci-kvm-hook` doesn't work on F25
(though it does on F27). The nodes are currently stuck at F25 because of
severe issues with `docker cp` in newer version (rhbz#1489505). So let's
just explicitly pass /dev/kvm into the container, which is all
`oci-kvm-hook` is trying to do anyway. I've verified this works.

This will unblock ostreedev/ostree#1462.

Closes: #88
Approved by: cgwalters
@jlebon
Copy link
Member

jlebon commented Feb 23, 2018

bot, retest this please

@cgwalters cgwalters force-pushed the fedora-standard-test-roles branch 4 times, most recently from 3bbe908 to 93ed117 Compare February 23, 2018 22:20
@cgwalters
Copy link
Member Author

bot, retest this please

@jlebon jlebon added the WIP label Feb 27, 2018
@cgwalters cgwalters force-pushed the fedora-standard-test-roles branch 2 times, most recently from 204a8e3 to 892ba0c Compare February 27, 2018 19:54
@jlebon
Copy link
Member

jlebon commented Feb 27, 2018

bot, retest this please

@cgwalters
Copy link
Member Author

"msg": "argument must be an int, or have a fileno() method.", 

Damn it Python...

@cgwalters
Copy link
Member Author

Using Professional Software Engineering skill #23 (googling for the error message) I see https://code.vt.edu/nis-ansible-roles/upstream-ansible/commit/af4f0bd0087d5d4067ad11f45dc646230c3ced7f which seems somehow promising...yet I don't see that code in actual ansible git. Hmm. Let's try getting more debugging info out of ansible.

@jlebon
Copy link
Member

jlebon commented Feb 27, 2018

Hmm, right I can reproduce this locally:

$ sudo docker run --rm -v /var/cache/dnf:/var/cache/dnf:z registry.fedoraproject.org/fedora:27 sh -c 'dnf install -y ansible && ansible -vvv -i 10.8.246.101, all -m shell -a /bin/true'
...
META: ran handlers
Using module file /usr/lib/python2.7/site-packages/ansible/modules/commands/command.py
<10.8.246.101> ESTABLISH CONNECTION FOR USER: None on PORT 22 TO 10.8.246.101
10.8.246.101 | UNREACHABLE! => {
    "changed": false,
    "msg": "argument must be an int, or have a fileno() method.",
    "unreachable": true
}

If I pass -t, I get:

...
paramiko: The authenticity of host '10.8.246.101' can't be established.
The ssh-ed25519 key fingerprint is a1f9a4d65418abceec2988451f974c76.
Are you sure you want to continue connecting (yes/no)?

So it seems likely to be related to PAPR not allocating a tty.

@jlebon
Copy link
Member

jlebon commented Feb 27, 2018

Explicitly passing ANSIBLE_HOST_KEY_CHECKING=False "works" (as in I get the error I would expect):

<10.8.246.101> ESTABLISH CONNECTION FOR USER: None on PORT 22 TO 10.8.246.101
10.8.246.101 | UNREACHABLE! => {
    "changed": false,
    "msg": "No authentication methods available",
    "unreachable": true
}

Hmm, I wonder if -o StrictHostKeyChecking=no is not actually taking effect. Can you try using ANSIBLE_HOST_KEY_CHECKING as well?

@cgwalters
Copy link
Member Author

The paramiko error was enough of a hint: We were missing openssh-clients in the depchain. When I was testing this interactively, that missing dep was masked since git-core pulls it in.

@cgwalters
Copy link
Member Author

OK so...let me briefly outline my plans here.

High level goals;

  1. Dedup CI between rpm-ostree and ostree (and potentially other projects)
  2. Increase alignment with https://fedoraproject.org/wiki/CI - specifically things like using Ansible, though in practice we may end up with both Ansible + shell scripts?
  3. Move strongly to the VM-in-container approach; the benefits are powerful

In general, let's try to follow the Cockpit path on this. One of the
first decisions we need to take is how we inject binaries into the VM.
I'd like to go the "build an RPM" path instead of what we do now in rpm-ostree.

Taking this first step, along with the "dedup CI" path means I am inclined to
create a little bit more formal "libpaprci" that is a pure copy-paste lib for now.
Once we find high level patterns that work well, we can consider "baking" them into
PAPR yaml. (But doing that has the cost that it's harder to run "directly" in
a dev container).

An interesting question here is to what degree it makes sense to have PAPR itself
switch to VM-in-container behind the scenes, particularly for the cluster case
with one container and one VM.

Anyways, short term goal is step #1: deduping CI between rpm-ostree and ostree and
hence gaining better VM-style testing for ostree.

@cgwalters
Copy link
Member Author

@jlebon ^

@jlebon
Copy link
Member

jlebon commented Feb 28, 2018

I'm mostly in agreement with the above, including rolling with a copy-paste lib for now. (Actually, we could have the lib stored in the PAPR container and just scp all environments with a copy of it).

An interesting question here is to what degree it makes sense to have PAPR itself
switch to VM-in-container behind the scenes, particularly for the cluster case
with one container and one VM.

This is something that PAPR will have to learn if we want to migrate to OCP. It also takes us a step closer to integrating the Fedora STI directly in PAPR. (Basically, PAPR takes on the role of resource provisioner, or testing system).

Move strongly to the VM-in-container approach; the benefits are powerful

One thing to note is that right now all container workloads are performed directly on the executor nodes. And we don't do any form of scheduling other than relying on Jenkins' built-in queue based on executor slots. Before we go all in on the VM-in-container approach, we need better scheduling, which I think would make more sense to tackle until after we move to OCP?

For the time being, I would suggest provisioning a host and running containerized tests there. Obviously backwards 🙃, though at least it unblocks investments into this today that will still be relevant tomorrow. And we still get all the other nice features of the approach (better sharing, reproducibility, portability, etc...).

In the meantime, I'd like to pursue the developments going on in PACI to set up the full infra in OCP. Once we're to a point where we can run container workloads as pods there, we can iterate over integrating support for containerized VMs.

@cgwalters cgwalters force-pushed the fedora-standard-test-roles branch 3 times, most recently from 17534c4 to d203248 Compare March 6, 2018 15:08
@cgwalters cgwalters force-pushed the fedora-standard-test-roles branch from 80f37cc to 3759fce Compare March 6, 2018 16:19
Reusing the way `standard-test-roles` has support for booting
a qcow2 actually gets us to the "VM-in-container" flow.  Plus
Ansible over shell script is sometimes nicer.

https://fedoraproject.org/wiki/CI/Tests#Testing_an_Atomic_Host

It's better than what we were doing before for installed tests,
and moreover using Ansible more broadly for testing is going
to align us better with Fedora's CI.

As part of this I split off a "libpaprci" which I intend to maintain
as a "copylib" for a little bit between ostree/rpm-ostree, and then
we'll figure out how to expand from there (maybe some of the patterns
get "baked in" to PAPR for example).

Note the `FAH27-insttests` context moves to the top since it's now
of primary importance, and I expect that we start expanding it.
@cgwalters cgwalters force-pushed the fedora-standard-test-roles branch from ef79ba7 to e7dcd7e Compare March 7, 2018 18:07
@cgwalters cgwalters removed the WIP label Mar 7, 2018
@cgwalters cgwalters changed the title WIP: ci: Add experiment with Fedora Standard Test interface ci: Rework installed tests to use Fedora Standard Test interface Mar 7, 2018
@cgwalters
Copy link
Member Author

Lifting WIP on this.

@ostreedev ostreedev deleted a comment from rh-atomic-bot Mar 7, 2018
@cgwalters
Copy link
Member Author

@miabbott would you mind looking at this too?

- find:
paths: /root/tests/installed/
patterns: "itest-*.sh"
register: installed_tests
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This registered variable is never used.

remote_user: root
tasks:
# Install the built RPM to the VM
- command: git rev-parse HEAD
Copy link
Collaborator

@miabbott miabbott Mar 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Since lines 29-36 are commented out, you could also remove this use of git

changed_when: False
register: ostree_version
- set_fact:
ostree_version_yaml: "{{ ostree_version.stdout | from_yaml }}"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to above nit, you could comment out/remove lines 23-27

ci/build-rpm.sh Outdated
@@ -0,0 +1,46 @@
#!/usr/bin/bash
# Install build dependencies, run unit tests and installed tests.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: this looks like the wrong description, right?

pkg_install make /usr/bin/rpmbuild git
fi

# PAPR really should do this
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, good catch! Filed: projectatomic/papr#90.

ci/build-rpm.sh Outdated
esac
fi
case "${CONFIGOPTS:-}" in
*--with-curl*|--with-soup*)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like it's missing a * after the |.

OS_VERSION_ID=$(. /etc/os-release; echo $VERSION_ID)

pkg_upgrade() {
# https://bugzilla.redhat.com/show_bug.cgi?id=1483553
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional/not new: we should be able to drop this by now.


build() {
env NOCONFIGURE=1 ./autogen.sh
./configure --prefix=/usr --libdir=/usr/lib64 "$@"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional/not new: ahh, I forgot to upstream the --sysconfdir=/etc fix here!

/usr/bin/rpm -Fvh /root/x86_64/*.rpm && \
cd / && rpm2cpio /root/x86_64/ostree-tests-2*.rpm | cpio -div
changed_when: False
- shell: /usr/bin/rpm -Fvh /root/x86_64/*.rpm
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We just did this in the previous task, right?

register: ostree_version
- set_fact:
ostree_version_yaml: "{{ ostree_version.stdout | from_yaml }}"
# FIXME: make this work when building RPMs too
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, how about just checking that the version before and after the RPM install are not the same?

patterns: "itest-*.sh"
register: installed_tests
- name: Run sysinstalled tests
shell: /root/tests/installed/run.sh &> /root/installed-tests.log
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is OK, though things will get tricky when we want to e.g. have tests that reboot/modify the VM. It could be interesting to have a hybrid approach of this and rpm-ostree's libvm where tests can seamlessly use the ansible infra for running code remotely, rebooting, pushing/fetching files, etc... Or do you already have an idea in mind on how to approach this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I think the next step is to change some of the installed tests to be playbooks.


srpm: dist-snapshot
if test -f "$(SPEC)"; then \
sed -e "s,^Version:.*,Version: $(GITREV_FOR_PKG)," $(SPEC) > $(DISTGIT_NAME).spec && \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, does this not work for some reason? (Judging by the comment in overlay-git.yaml).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It works, I was just trying to proactively prevent any "not testing what was built" type issues like we've had once or twice. Basically I was thinking the very first test should be checking the binary's built version in some way. Which we now have.

connection: local
- name: Copy locally built RPMs
synchronize: src=x86_64/ dest=/root/x86_64/ archive=yes
- shell: ostree admin unlock || true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a big fan of using admin unlock for this, given that it's not how it's actually run on client systems. It also limits what tests can do (e.g. testing admin unlock itself, or rebooting). Have you considered building our own tree like in rpm-ostree?

Actually, it should work to just rpm-ostree override replace the RPMs, reset the ref and change the origin back to use refspec, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mmm. I use admin unlock a lot for testing personally where I really explicitly don't want things to persist. We'll clearly need vmcheck and override replace to be used as well though.

(I'd be cool down the line to have override override replace --transient or something which would be like more like replace + "livefs-on-overlay" but still underneath the ro mount).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the previous installed tests code in fah-prep.sh (now deleted) also used admin unlock i'd vote we keep it for now - and build upon this first step by e.g. copying the a-h-t reboot playbook and using it, etc.

@cgwalters
Copy link
Member Author

Fixup for all comments ⬆️

@jlebon
Copy link
Member

jlebon commented Mar 8, 2018

@rh-atomic-bot r+ 7407a66

@rh-atomic-bot
Copy link

⌛ Testing commit 7407a66 with merge 6e9d00d...

@jlebon
Copy link
Member

jlebon commented Mar 8, 2018

The PR tester failed with:

+ ostree --repo=repo remote add origin --set=gpg-verify=false http://127.0.0.1:44687
+ ostree --repo=repo pull --disable-static-deltas origin fedora/27/x86_64/atomic-host
error: [28] Timeout was reached

But it worked earlier, so probably a flake?

Anyway, let's try this out! 🎉

cgwalters added a commit to cgwalters/rpm-ostree that referenced this pull request Mar 8, 2018
This adds a shell primitive to make it easy to execute a playbook
task list.

The big picture idea is to sync with ostreedev/ostree#1462
and rewrite some of the libvm shell stuff as playbooks, allowing easier
code sharing with a-h-t and just in general being a better library for
talking ssh and executing commnads.
@rh-atomic-bot
Copy link

☀️ Test successful - status-atomicjenkins
Approved by: jlebon
Pushing 6e9d00d to master...

rh-atomic-bot pushed a commit to coreos/rpm-ostree that referenced this pull request Mar 9, 2018
This adds a shell primitive to make it easy to execute a playbook
task list.

The big picture idea is to sync with ostreedev/ostree#1462
and rewrite some of the libvm shell stuff as playbooks, allowing easier
code sharing with a-h-t and just in general being a better library for
talking ssh and executing commnads.

Closes: #1297
Approved by: jlebon
@cgwalters
Copy link
Member Author

This was reverted in #1783

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants