-
Notifications
You must be signed in to change notification settings - Fork 170
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update getting_debugging_symbols.rst #380
base: main
Are you sure you want to change the base?
Conversation
Confirmed elfutils/debuginfod code working fine for userspace drgn -p $pid type work.
Hi! First of all, thank you for debuginfod. It has made the debugging experience so much better. I've been hesitant to document debuginfod for drgn since, as you noticed, we don't use it for our main use case, the kernel. And we don't use it for the kernel because of https://sourceware.org/bugzilla/show_bug.cgi?id=29478. Each kernel module takes an average of a minute to start downloading, which is prohibitive if you have more than a handful of kernel modules. Has that been revisited at all? It looks like the Fedora kernel-debuginfo uses an xz-compressed cpio payload with a decent block size for random access. (Let me know if I should start a discussion on the elfutils-devel list.) |
I know what you're talking about, and unfortunately fedora kernel debuginfo packages are unusually pessimal. https://bugzilla.redhat.com/show_bug.cgi?id=1970578 Luckily, something quite as bad as a minute per module is very rare. Caching & prefetching centrally helps. You can also run a nearby debuginfod proxy/federation instance that caches your bits of interest for a long time. Plus one can e.g. setenv DEBUGINFOD_MAXTIME=5 to force the client to wait no more than 5s to complete a download. Also, filed https://sourceware.org/bugzilla/show_bug.cgi?id=31265 to revisit the cache retention algorithms. Will take a serious look. |
Thanks for taking a look at the cache retention! I'm sure that will help in many cases, but I'm still seeing painfully slow times for the kernel.
$ curl https://debuginfod.fedoraproject.org/buildid/297908dfb7ddc4b976c37b6a7d191497c10e3fbe/debuginfo | true
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 387M 0 15864 0 0 162 0 29d 01h 0:01:37 29d 01h 4665
curl: (23) Failure writing output to destination
$ curl https://debuginfod.fedoraproject.org/buildid/b5fd7e1f50973816dd57d8d5f1d8de9b47aa81be/debuginfo | true
8 187k 8 15847 0 0 164 0 0:19:29 0:01:36 0:17:53 3553
curl: (23) Failure writing output to destination
Maybe this is a chicken-and-egg problem, and using debuginfod for the kernel will only be practical when there are enough users of debuginfod for the kernel to keep the cache warm. But even then, kernel modules vary greatly by hardware and workload, so the slow, uncached case probably wouldn't be that rare. As I alluded to in my previous comment, we could optimize this by taking advantage of xz's random access capabilities. xz can optionally split up the compressed stream into smaller, independently-compressed blocks, with an index mapping a decompressed offset to the block containing it. The Fedora/RHEL With a little bit of low-level code using libzma directly, we can get a MASSIVE speedup. I wrote a proof of concept that indexes all of the files in an xz-compressed RPM and then extracts them with random access: On my machine, the time grows up to a minute towards the end of the RPM with the naive approach. With the optimization, no file takes longer than 0.25 seconds to find! There are some downsides:
In my opinion, the huge improvement is well worth working through these downsides. What do you think? |
That's a pretty amazing bit of prototyping. While the code is quite agnostic with respect to idiosyncrasies of distros and particular packages, it's hilarious that this particular kernel package would be the only one that both needs this benefit and supports this xz-index hack. I wouldn't rule out an approach like this for debuginfod (just needs integration and all that index stuff stored in the database proper, so bloating it somewhat). But, given the server-side caching improvements that rolled out about a month ago onto the fedora servers (and now in elfutils 0.191 generally), maybe this sort of heroic effort is less necessary. |
I have a branch of drgn that uses debuginfod for kernel modules, and I've still been hitting these stalls. So when I get the chance, I will turn this into a proper patch for debuginfod. |
I finally got around to integrating this in debuginfod and sent a patch series: https://sourceware.org/pipermail/elfutils-devel/2024q3/007191.html. |
Confirmed elfutils/debuginfod code working fine for userspace drgn -p $pid type work.
Note that elfutils also has code for dwfl-reporting a running kernel and its modules ("dwfl_linux_kernel_report_kernel", "dwfl_linux_kernel_report_modules"), which could make your libdrgn code simpler, and also just work (tm) with debuginfod.