Skip to content
This repository has been archived by the owner on Mar 22, 2021. It is now read-only.

performance issue blocking https://github.com/ostreedev/ostree/pull/1513 #93

Open
cgwalters opened this issue Apr 4, 2018 · 5 comments

Comments

@cgwalters
Copy link
Member

cgwalters commented Apr 4, 2018

Let's take the conversation about ostreedev/ostree#1513 here.

I'm trying to analyze the issue a bit more to understand the root of performance issues; are we dealing with CPU overconsumption, or I/O, both, or something else too?

I logged into a jslave while it's otherwise idle, and I noticed:

Locally on my desktop:

[    1.083047] systemd[1]: systemd 234 running in system mode. (+PAM +AUDIT +SELINUX +IMA -APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD -IDN2 +IDN 
default-hierarchy=hybrid)

OpenStack:

[    4.212516] systemd[1]: systemd 234 running in system mode. (+PAM +AUDIT +SELINUX +IMA -APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD -IDN2 +IDN default-hierarchy=hybrid)

So simply booting a VM is 4 times slower. Which is understandable...I assume there's contention with the underlying guests.

@jlebon
Copy link
Collaborator

jlebon commented Apr 4, 2018

I assume there's contention with the underlying guests.

That's very likely. We could investigate some more here, though I think we should just sprint to getting an OCP instance set up and switch over ostree to help weed out issues.

@cgwalters
Copy link
Member Author

cgwalters commented Jun 20, 2018

There's two levels to this issue. One is doing qemu in VMs that have other concurrent workloads (containers, etc.)

However, for coreos/rpm-ostree#1362 where we're provisioning full VMs, performance is still awful. I think this is a generic QE OpenStack issue, but that remains to be determined. It might be specific to nested virt in QEOS.

As another data point, I've been playing with GCE nested virt and the performance is (as you might expect) quite good:

Wed Jun 20 13:28:26 UTC 2018 overlay: Starting
Wed Jun 20 13:28:38 UTC 2018 overlay: Checkout complete
Wed Jun 20 13:28:56 UTC 2018 overlay: Commit complete
Wed Jun 20 13:29:12 UTC 2018 overlay: Deploy complete

Actually a good baseline data point is:

GCE:

[    8.476232] systemd[1]: Successfully loaded SELinux policy in 793.738ms.

But in that test:

[   98.191917] systemd[1]: Successfully loaded SELinux policy in 8.273604s.

Hmm, are we somehow not getting nested virt enabled perhaps?

@cgwalters
Copy link
Member Author

Hmm, are we somehow not getting nested virt enabled perhaps?

Ah. Yes.

@jlebon
Copy link
Collaborator

jlebon commented Jun 20, 2018

Hmm, it's also possible we're using Ceph backed VMs, which have notoriously lower disk write performance. I'll double check that.

@jlebon
Copy link
Collaborator

jlebon commented Jun 20, 2018

Hmm, it's also possible we're using Ceph backed VMs

OK, I've confirmed this isn't the case. For posterity, can you post the same performance outputs here once you have nested virt working?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants