Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update getting_debugging_symbols.rst #380

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Conversation

fche
Copy link

@fche fche commented Dec 15, 2023

Confirmed elfutils/debuginfod code working fine for userspace drgn -p $pid type work.

Note that elfutils also has code for dwfl-reporting a running kernel and its modules ("dwfl_linux_kernel_report_kernel", "dwfl_linux_kernel_report_modules"), which could make your libdrgn code simpler, and also just work (tm) with debuginfod.

Confirmed elfutils/debuginfod code working fine for userspace drgn -p $pid type work.
@osandov
Copy link
Owner

osandov commented Jan 18, 2024

Hi! First of all, thank you for debuginfod. It has made the debugging experience so much better.

I've been hesitant to document debuginfod for drgn since, as you noticed, we don't use it for our main use case, the kernel. And we don't use it for the kernel because of https://sourceware.org/bugzilla/show_bug.cgi?id=29478. Each kernel module takes an average of a minute to start downloading, which is prohibitive if you have more than a handful of kernel modules. Has that been revisited at all? It looks like the Fedora kernel-debuginfo uses an xz-compressed cpio payload with a decent block size for random access. (Let me know if I should start a discussion on the elfutils-devel list.)

@fche
Copy link
Author

fche commented Jan 18, 2024

Each kernel module takes an average of a minute to start downloading, which is prohibitive if you have more than a handful of kernel modules.

I know what you're talking about, and unfortunately fedora kernel debuginfo packages are unusually pessimal. https://bugzilla.redhat.com/show_bug.cgi?id=1970578

Luckily, something quite as bad as a minute per module is very rare. Caching & prefetching centrally helps. You can also run a nearby debuginfod proxy/federation instance that caches your bits of interest for a long time. Plus one can e.g. setenv DEBUGINFOD_MAXTIME=5 to force the client to wait no more than 5s to complete a download.

Also, filed https://sourceware.org/bugzilla/show_bug.cgi?id=31265 to revisit the cache retention algorithms. Will take a serious look.

@osandov
Copy link
Owner

osandov commented Feb 27, 2024

Thanks for taking a look at the cache retention! I'm sure that will help in many cases, but I'm still seeing painfully slow times for the kernel.

vmlinux is the last file in the kernel-debuginfo RPM, but the most important for kernel debugging. The desired kernel modules are spread throughout the RPM. I tested out initiating a download of vmlinux and then the last kernel module in the RPM without actually transferring the data (piping into true so that curl dies with EPIPE as soon as it writes some data):

$ curl https://debuginfod.fedoraproject.org/buildid/297908dfb7ddc4b976c37b6a7d191497c10e3fbe/debuginfo | true
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0  387M    0 15864    0     0    162      0  29d 01h  0:01:37  29d 01h  4665
curl: (23) Failure writing output to destination
$ curl https://debuginfod.fedoraproject.org/buildid/b5fd7e1f50973816dd57d8d5f1d8de9b47aa81be/debuginfo | true
  8  187k    8 15847    0     0    164      0  0:19:29  0:01:36  0:17:53  3553
curl: (23) Failure writing output to destination

vmlinux took a minute and a half. Assuming nothing is cached, this makes total sense: debuginfod had to decompress the entire RPM to find it. But the kernel module took just as long, because debuginfod had to decompress almost the whole RPM again. (Subsequent attempts hit the cache and are instant.)

Maybe this is a chicken-and-egg problem, and using debuginfod for the kernel will only be practical when there are enough users of debuginfod for the kernel to keep the cache warm. But even then, kernel modules vary greatly by hardware and workload, so the slow, uncached case probably wouldn't be that rare.

As I alluded to in my previous comment, we could optimize this by taking advantage of xz's random access capabilities. xz can optionally split up the compressed stream into smaller, independently-compressed blocks, with an index mapping a decompressed offset to the block containing it. The Fedora/RHEL kernel.spec happens to enable this because it uses multi-threaded xz compression.

With a little bit of low-level code using libzma directly, we can get a MASSIVE speedup. I wrote a proof of concept that indexes all of the files in an xz-compressed RPM and then extracts them with random access: debuginfod_lzma_random_access_poc.cxx. Here's a graph of the amount of time it takes to find each file in the kernel-debuginfo RPM with the existing code vs. the proof of concept:

plot

On my machine, the time grows up to a minute towards the end of the RPM with the naive approach. With the optimization, no file takes longer than 0.25 seconds to find!

There are some downsides:

  1. The code is a bit tricky and bespoke.
  2. The existing code is specific to RPMs, although I believe it'd be possible to do the same for debs.
  3. The kernel RPM spec only enables this xz feature incidentally. If we want to depend on it, we'd probably want to at least add a comment to the spec that it's important for debuginfod.
  4. If there are similar debuginfo packages, they'd also need to enable this manually.

In my opinion, the huge improvement is well worth working through these downsides. What do you think?

@fche
Copy link
Author

fche commented Mar 10, 2024

That's a pretty amazing bit of prototyping. While the code is quite agnostic with respect to idiosyncrasies of distros and particular packages, it's hilarious that this particular kernel package would be the only one that both needs this benefit and supports this xz-index hack.

I wouldn't rule out an approach like this for debuginfod (just needs integration and all that index stuff stored in the database proper, so bloating it somewhat). But, given the server-side caching improvements that rolled out about a month ago onto the fedora servers (and now in elfutils 0.191 generally), maybe this sort of heroic effort is less necessary.

@osandov
Copy link
Owner

osandov commented Mar 27, 2024

I have a branch of drgn that uses debuginfod for kernel modules, and I've still been hitting these stalls. So when I get the chance, I will turn this into a proper patch for debuginfod.

@osandov
Copy link
Owner

osandov commented Jul 10, 2024

I finally got around to integrating this in debuginfod and sent a patch series: https://sourceware.org/pipermail/elfutils-devel/2024q3/007191.html.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants