Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need suggestions for CONFIG_LTO_CLANG support #1320

Open
liu-song-6 opened this issue Dec 9, 2022 · 41 comments
Open

Need suggestions for CONFIG_LTO_CLANG support #1320

liu-song-6 opened this issue Dec 9, 2022 · 41 comments
Assignees

Comments

@liu-song-6
Copy link
Contributor

I am try to build livepatch for kernel built with CONFIG_LTO_CLANG.
The first issue was these .o files are not ELF files, but LLVM IR bitcode. To resolve this, I made some hack so that we run create-diff-object on vmlinux.o, which is ELF. However, create-diff-object spent hours in the following stack:

 find_symbol_by_index()                                                                                                                                                                                           
 kpatch_create_rela_list()                                                                                                                                                                                        
 kpatch_elf_open()

which makes it unusable.

So my question is: Shall we optimize create-diff-object to be faster? Or shall we try some other solutions?

Thanks in advance!

@joe-lawrence
Copy link
Contributor

@sm00th : This sounds familiar from your previous study of clang / LTO. Did you have any notes or branches left over from that investigation that may help @liu-song-6 ?

@sm00th
Copy link
Contributor

sm00th commented Dec 12, 2022

Hi there. Yeah, I ran into this issue as well and used this hacky commit to alleviate the problem during development. However I didn't think that is a proper solution to find_symbol_by_index() slowness.

The might be some other parts you might find usable on that branch: https://github.com/sm00th/kpatch/commits/lto

It is all very raw though.

@liu-song-6
Copy link
Contributor Author

@joe-lawrence @sm00th Thanks for sharing this work.

I tried the lto branch (on top of some of our patches for pgo), and it seems there is still some issues.

@sm00th , do you have plan to continue working on this? We are hoping to use kpatch on our LTO kernel in 2023 (hopefully H1). Thanks!

@sm00th
Copy link
Contributor

sm00th commented Dec 13, 2022

@joe-lawrence @sm00th Thanks for sharing this work.

I tried the lto branch (on top of some of our patches for pgo), and it seems there is still some issues.

Yes, there are. I worked on this a while ago and there definitely were issues left, but afaik it produced loadable and working (but unwieldy, they were roughly the same size as vmlinux because of all the debuginfo and rodata sections being copied over) kpatches for simple patches for vmlinux (maybe not for modules). The better parts of the branch were merged to mainline and what is left over are hacky commits that probably shouldn't be used as is but might give you some insights and early warnings about things I encountered.

@sm00th , do you have plan to continue working on this? We are hoping to use kpatch on our LTO kernel in 2023 (hopefully H1). Thanks!

No, I currently don't continue any work on this nor there are any plans to do so in near future.

@joe-lawrence
Copy link
Contributor

Thanks for the recap, @sm00th

@liu-song-6 : since our day job work targets RHEL, we're not actively pursuing LTO / clang until the RHEL kernel goes that direction, but that wouldn't stop us from reviewing patches and integrating pieces upstream here.

@liu-song-6
Copy link
Contributor Author

@sm00th thanks for these information.

Do you, by any chance, have some design doc, discussion thread, etc. about the LTO work? I tried to understand the commits in your lto branch, but failed to construct the logic of the design. Some documentations may save me days of work on the wrong direction...

Thanks again!

@sm00th
Copy link
Contributor

sm00th commented Dec 14, 2022

Do you, by any chance, have some design doc, discussion thread, etc. about the LTO work?

I'm sorry, I don't have anything usable

@liu-song-6
Copy link
Contributor Author

@sm00th Thanks! Let me see what I can get..

@liu-song-6
Copy link
Contributor Author

Some updates on this.

@yonghong-song showed me how to use --lto-obj-path option of ld.lld to generate "thinlto" files, which are ELF files similar to the .o files without LTO (btw, .o files with LTO are LLVM IR binary). We think it is possible to use these files to generate live patches

The LTO flow works like the following:

  1. compile .c file into .o (LLVM IR);
  2. cross file inline for all the .o files;
  3. generate ELF file for all .o files (thinlto files are dumped here);
  4. final linking.

I tried to run create-diff-object on these thinlto files, and they seem to work (with some minor changes). But I haven't finished the whole process in kpatch-build.

I would like some comments before developing all the logic. Specifically:

  1. Does this (CDOing thinlto file) sound like a viable solution?
  2. Are there some pitfalls with this that I didn't see?
  3. Is this better than CDOing big linked file(s)?

Thanks!

@sm00th
Copy link
Contributor

sm00th commented Jan 24, 2023

CDOing full vmlinux is very unpleasant so I'd vote for anything that will allow to avoid this. I have limited understanding of ThinLTO but it sounds like there are no further optimizations on step 4 so we should be ok.
This also limits kpatch to kernels compiled with CONFIG_LTO_CLANG_THIN, but it is better than nothing and with ThinLTO being much faster while still achieving comparable gains this probably will be the most commonly used option.

@liu-song-6
Copy link
Contributor Author

@sm00th Thanks for the quick feedback! I will give thinlto a try.

@jpoimboe
Copy link
Member

While I think that approach is fine as a stopgap (if it's not too much effort), pretty soon we will need CDO to support vmlinux.o anyway. Not only for LTO but also for CONFIG_X86_KERNEL_IBT and CONFIG_FINEIBT which are very useful hardening features which will soon become commonplace, IMO.

@liu-song-6
Copy link
Contributor Author

Hmm... In this case, I guess CDOing the whole vmlinux.o makes more sense?
Btw, does IBT add other challenges to livepatch? For example, will there be a case where we are patching func-A, but to do that, we need to add a ENDBR to func-B?

@jpoimboe
Copy link
Member

Hmm... In this case, I guess CDOing the whole vmlinux.o makes more sense?

Long term, yes.

For the short term, maybe that can be delayed for your use here, if you get the thinlto thing figured out, and if the changes aren't too disruptive.

I can pick up the work for CDO support for vmlinux.o, unless somebody else wants to do it. We (mostly peterz) had to do something similar for objtool upstream to get the performance under control.

Btw, does IBT add other challenges to livepatch? For example, will there be a case where we are patching func-A, but to do that, we need to add a ENDBR to func-B?

If I understand you correctly:

  1. new version of func-A adds indirect branch to func-B
  2. no previous code anywhere in the kernel had an indirect branch to func-B, so it doesn't have ENDBR.

In my experience I think such a patch would be very rare, but yes we'd need to make sure that case is detected and results in either a) build-time error or b) runtime ENDBR patching.

@jpoimboe
Copy link
Member

Hm, thinking about this some more, I'm no longer sure that IBT will require CDO to support vmlinux.o. It's possible that CDO could just be ignorant and assume the indirect-branched-to function has ENDBR, and instead have livepatch do the ENDBR patching if needed.

@liu-song-6
Copy link
Contributor Author

Hmm... In this case, I guess CDOing the whole vmlinux.o makes more sense?

Long term, yes.

For the short term, maybe that can be delayed for your use here, if you get the thinlto thing figured out, and if the changes aren't too disruptive.

I can pick up the work for CDO support for vmlinux.o, unless somebody else wants to do it. We (mostly peterz) had to do something similar for objtool upstream to get the performance under control.

Let me take a closer look at both options (and our timeline). CDOing vmlinux.o sounds interesting to me.

@liu-song-6
Copy link
Contributor Author

Btw, does IBT add other challenges to livepatch? For example, will there be a case where we are patching func-A, but to do that, we need to add a ENDBR to func-B?

If I understand you correctly:

  1. new version of func-A adds indirect branch to func-B
  2. no previous code anywhere in the kernel had an indirect branch to func-B, so it doesn't have ENDBR.

In my experience I think such a patch would be very rare, but yes we'd need to make sure that case is detected and results in either a) build-time error or b) runtime ENDBR patching.

How shall we do runtime ENDBR patching? Do we need some type of trampoline for that?

@liu-song-6
Copy link
Contributor Author

Hm, thinking about this some more, I'm no longer sure that IBT will require CDO to support vmlinux.o. It's possible that CDO could just be ignorant and assume the indirect-branched-to function has ENDBR, and instead have livepatch do the ENDBR patching if needed.

In this case, I guess livepatch has to use the original function to decide whether the KLP'ed function need ENDBR. And I guess this won't work if we are patching both funcA and funcB, and the new funcB requires endbr only because of patched funcA calls it with an indirect call. Did I understand this correctly?

@jpoimboe
Copy link
Member

How shall we do runtime ENDBR patching? Do we need some type of trampoline for that?

x86 kernel has text_poke_bp()

In this case, I guess livepatch has to use the original function to decide whether the KLP'ed function need ENDBR. And I guess this won't work if we are patching both funcA and funcB, and the new funcB requires endbr only because of patched funcA calls it with an indirect call. Did I understand this correctly?

Hm, I guess there are multiple scenarios, including:

  1. patched funcA indirect-calls unpatched funcB which doesn't have ENDBR. IMO, this seems exceedingly unlikely, for several reasons. It would need either a build time check (vmlinux.o) or runtime check, or a runtime ENDBR patch. (BTW, a runtime check sort of already exists today with the "Missing ENDBR" panic, which would hopefully be found when testing the livepatch. That might be sufficient given the rare-to-never likelihood of this actually happening.)

  2. unpatched funcB indirect-calls patched funcA which doesn't have ENDBR. The easy "fix" is to just always ensure that ENDBR is always emitted (i.e., not sealed by apply_ibt_endbr()) for every patched function.

  3. patched funcB indirect-calls patched funcA. Same "fix" as 2.

@liu-song-6
Copy link
Contributor Author

How shall we do runtime ENDBR patching? Do we need some type of trampoline for that?

x86 kernel has text_poke_bp()

I think ENDBR is actually removed at compile time (by objtool?), so there may not be space to patch ENDBR at runtime?

In this case, I guess livepatch has to use the original function to decide whether the KLP'ed function need ENDBR. And I guess this won't work if we are patching both funcA and funcB, and the new funcB requires endbr only because of patched funcA calls it with an indirect call. Did I understand this correctly?

Hm, I guess there are multiple scenarios, including:

  1. patched funcA indirect-calls unpatched funcB which doesn't have ENDBR. IMO, this seems exceedingly unlikely, for several reasons. It would need either a build time check (vmlinux.o) or runtime check, or a runtime ENDBR patch. (BTW, a runtime check sort of already exists today with the "Missing ENDBR" panic, which would hopefully be found when testing the livepatch. That might be sufficient given the rare-to-never likelihood of this actually happening.)

Yeah, panic at testing time is sufficient for very rare case.

  1. unpatched funcB indirect-calls patched funcA which doesn't have ENDBR. The easy "fix" is to just always ensure that ENDBR is always emitted (i.e., not sealed by apply_ibt_endbr()) for every patched function.
  2. patched funcB indirect-calls patched funcA. Same "fix" as 2.

I think klp_ftrace_handler() always indirect-calls patched function? So 2 and 3 should always carry ENDBR anyway?

@jpoimboe
Copy link
Member

How shall we do runtime ENDBR patching? Do we need some type of trampoline for that?

x86 kernel has text_poke_bp()

I think ENDBR is actually removed at compile time (by objtool?), so there may not be space to patch ENDBR at runtime?

The ENDBRs to be removed (aka "sealed") are annotated by objtool with the creation of the .ibt_endbr_seal section and then patched into NOPs by the kernel with apply_ibt_endbr().

For global functions, and for static functions which are referenced via function pointer in the same translation unit, the compiler emits ENDBR (which may be later converted to NOP by apply_ibt_endbr).

But I forgot about static functions which aren't referenced via function pointer in the same TU. The compiler optimizes them by not emitting ENDBR. You're right, I don't think there are any good solutions for patching those.

In this case, I guess livepatch has to use the original function to decide whether the KLP'ed function need ENDBR. And I guess this won't work if we are patching both funcA and funcB, and the new funcB requires endbr only because of patched funcA calls it with an indirect call. Did I understand this correctly?

Hm, I guess there are multiple scenarios, including:

  1. patched funcA indirect-calls unpatched funcB which doesn't have ENDBR. IMO, this seems exceedingly unlikely, for several reasons. It would need either a build time check (vmlinux.o) or runtime check, or a runtime ENDBR patch. (BTW, a runtime check sort of already exists today with the "Missing ENDBR" panic, which would hopefully be found when testing the livepatch. That might be sufficient given the rare-to-never likelihood of this actually happening.)

Yeah, panic at testing time is sufficient for very rare case.

  1. unpatched funcB indirect-calls patched funcA which doesn't have ENDBR. The easy "fix" is to just always ensure that ENDBR is always emitted (i.e., not sealed by apply_ibt_endbr()) for every patched function.

Just to clarify this scenario, the compiler will already emit ENDBR for funcA (both original and replacement versions), because in this case either funcA will be global, or it will be static with funcB referencing its pointer. in the same TU. And objtool won't try to seal it. So I believe there's nothing to do here on our side.

  1. patched funcB indirect-calls patched funcA. Same "fix" as 2.

Thinking again about this, this could actually be a little tricky. The way livepatching works, all function calls are routed to the original function first. So if the original funcA did not have ENDBR (i.e., the original funcA was not indirect-branched-to by any function including the original funcB), this wouldn't work unless the new funcB bypassed livepatching and called the replacement function directly.

The alternative would be to do CDO on vmlinux.o and warn about this scenario at kpatch-build time. But again it seems like a rare issue and maybe not worth the effort. We could instead maybe consider adding a warning about any new function pointer references added by a patch if IBT is enabled, something like "be sure to test this and make sure this indirect branch to this function doesn't cause an IBT panic".

So CDO on vmlinux.o might help enable/improve a few esoteric warnings, but I'm not sure that justifies the effort. At least not yet.

I think klp_ftrace_handler() always indirect-calls patched function? So 2 and 3 should always carry ENDBR anyway?

No, it actually returns to the patched function. I say this as the co-inventer of the patent ;-)

@liu-song-6
Copy link
Contributor Author

Thinking again about this, this could actually be a little tricky. The way livepatching works, all function calls are routed to the original function first. So if the original funcA did not have ENDBR (i.e., the original funcA was not indirect-branched-to by any function including the original funcB), this wouldn't work unless the new funcB bypassed livepatching and called the replacement function directly.

Can we solve this with a wrapper function (or trampoline)? Something like:

  1. new funcB indirect-calls wrapper funcC, which has ENDBR;
  2. funcC direct-calls original funcA, which doesn't have ENDBR.

If this works, we can also use the same mechanism to patch static functions which aren't referenced via function pointer in the same TU.

The alternative would be to do CDO on vmlinux.o and warn about this scenario at kpatch-build time. But again it seems like a rare issue and maybe not worth the effort. We could instead maybe consider adding a warning about any new function pointer references added by a patch if IBT is enabled, something like "be sure to test this and make sure this indirect branch to this function doesn't cause an IBT panic".

So CDO on vmlinux.o might help enable/improve a few esoteric warnings, but I'm not sure that justifies the effort. At least not yet.

I didn't expect IBT support to be this tricky... I will first try support LTO without CDOing vmlinux.o.

I think klp_ftrace_handler() always indirect-calls patched function? So 2 and 3 should always carry ENDBR anyway?

No, it actually returns to the patched function. I say this as the co-inventer of the patent ;-)

E.. that's right. This is so tricky.

@liu-song-6
Copy link
Contributor Author

Here is what I have so far.
The kpatch change: https://github.com/liu-song-6/kpatch/tree/thinlto-6.2
The kernel code with config and 3 text patches: https://github.com/liu-song-6/linux/tree/6.2-rc6

It seems to work on simpler patches (as the one in nfs and raid10). But failed with the following for the bpf change:

/home/songliubraving/.kpatch/tmp/patch/klp-lto-6-2.o: warning: objtool: nfs_join_page_group+0x95: unannotated intra-function call
  MODPOST /home/songliubraving/.kpatch/tmp/patch/Module.symvers
ERROR: modpost: "perf_event_bpf_event" [/home/songliubraving/.kpatch/tmp/patch/klp-lto-6-2.ko] undefined!
ERROR: modpost: "__register_sysctl_init" [/home/songliubraving/.kpatch/tmp/patch/klp-lto-6-2.ko] undefined!
make[2]: *** [scripts/Makefile.modpost:138: /home/songliubraving/.kpatch/tmp/patch/Module.symvers] Error 1
make[1]: *** [Makefile:1973: modpost] Error 2
make[1]: Leaving directory '/data/users/songliubraving/linux'
make: *** [Makefile:17: klp-lto-6-2.ko] Error 2

Any suggestions on what to fix/try from here? Thanks in advance!

@liu-song-6
Copy link
Contributor Author

I looked a little more into this. It seems that CDO didn't export the symbol perf_event_bpf_event because of the following check.

static bool need_klp_reloc(struct kpatch_elf *kelf, struct lookup_table *table,
                           struct section *relasec, const struct rela *rela)
{
/* ... */
        if (!lookup_symbol(table, rela->sym, &symbol)) {
                /*
                 * Assume the symbol lives in another .o in the patch module.
                 * A normal rela should work.
                 */
                return false;
        }
/* ... */
}

I think we probably need some special handling for this case, I am not quite sure how. Any suggestions...?

@joe-lawrence
Copy link
Contributor

Hi @liu-song-6 : Does the perf_event_bpf_event symbol still exist in vmlinux through the LTO? (I haven't followed this thread closely, but couldn't LTO optimize away unexported global symbols?)

@liu-song-6
Copy link
Contributor Author

@joe-lawrence yes, it is still in vmlinux but not in Module.symvers

readelf -s vmlinux | grep perf_event_bpf_event
122028: ffffffff81298cd0   861 FUNC    LOCAL  HIDDEN     1 perf_event_bpf_event

@liu-song-6
Copy link
Contributor Author

liu-song-6 commented Feb 7, 2023

Comparing LTO with non-LTO build:

LTO:

readelf -s vmlinux | grep perf_event_bpf_event
122028: ffffffff81298cd0   861 FUNC    LOCAL  HIDDEN     1 perf_event_bpf_event

non-LTO:

readelf -s vmlinux  | grep perf_event_bpf_event
124946: ffffffff81287070   858 FUNC    GLOBAL DEFAULT    1 perf_event_bpf_event

So the same symbol is GLOBAL in non-LTO kernel, but LOCAL in LTO kernel. With similar .config, vmlinux from LTO build has 2074 GLOBAL symbols; while vmlinux from non-LTO build has 34550 GLOBAL symbols. Can we just consider all symbols as GLOBAL for the LTO build? Will this break anything?

@liu-song-6
Copy link
Contributor Author

Some hack like the following seems to work.

diff --git i/kpatch-build/lookup.c w/kpatch-build/lookup.c
index f2596b15fc27..ade445a788ad 100644
--- i/kpatch-build/lookup.c
+++ w/kpatch-build/lookup.c
@@ -530,8 +530,7 @@ static bool lookup_global_symbol(struct lookup_table *table, char *name,

        memset(result, 0, sizeof(*result));
        for_each_obj_symbol(i, sym, table) {
-               if ((sym->bind == STB_GLOBAL || sym->bind == STB_WEAK) &&
-                   !strcmp(sym->name, name)) {
+               if (!strcmp(sym->name, name)) {

                        if (result->objname)
                                ERROR("duplicate global symbol found for %s", name);

liu-song-6 added a commit to liu-song-6/kpatch that referenced this issue Feb 8, 2023
Support CONFIG_LTO_CLANG_THIN with ld.lld --lto-obj-path option.

With CONFIG_LTO_CLANG_THIN, .o files are LLVM IR binary, so CDO doesn't
work on .o file. To solve this issue, we CDO the thinlto files generated
by the --lto-obj-path option. Clang LTO generates the thinlto files
after cross file inline, so they are good candidates for CDO. See [1] for
more discussions about this.

To achieve this, we need:

  1. kpatch-build to update kernel Makefile(s) so it generates thinlto
     files;
  2. kpatch-build and kpatch-cc to save the thinlto file properly;
  3. kpatch-build to feed these thinlto files to CDO;
  4. A small workaround in CDO that ignores changes in init.text for
     vmlinux.o.thinlto.*

[1] dynup#1320

Signed-off-by: Song Liu <[email protected]>
liu-song-6 added a commit to liu-song-6/kpatch that referenced this issue Feb 8, 2023
Support CONFIG_LTO_CLANG_THIN with ld.lld --lto-obj-path option.

With CONFIG_LTO_CLANG_THIN, .o files are LLVM IR binary, so CDO doesn't
work on .o file. To solve this issue, we CDO the thinlto files generated
by the --lto-obj-path option. Clang LTO generates the thinlto files
after cross file inline, so they are good candidates for CDO. See [1] for
more discussions about this.

To achieve this, we need:

  1. kpatch-build to update kernel Makefile(s) so it generates thinlto
     files;
  2. kpatch-build and kpatch-cc to save the thinlto file properly;
  3. kpatch-build to feed these thinlto files to CDO;
  4. The user need to supply vmlinux.o, from which we generate the symtab
     file. We need this because GLOBAL symbols may be marked as LOCAL in
     LTO vmlinux;
  4. A small workaround in CDO that ignores changes in init.text for
     vmlinux.o.thinlto.*

[1] dynup#1320

Signed-off-by: Song Liu <[email protected]>
liu-song-6 added a commit to liu-song-6/kpatch that referenced this issue Feb 14, 2023
Support CONFIG_LTO_CLANG_THIN with ld.lld --lto-obj-path option.

With CONFIG_LTO_CLANG_THIN, .o files are LLVM IR binary, so CDO doesn't
work on .o file. To solve this issue, we CDO the thinlto files generated
by the --lto-obj-path option. Clang LTO generates the thinlto files
after cross file inline, so they are good candidates for CDO. See [1] for
more discussions about this.

To achieve this, we need:

  1. kpatch-build to update kernel Makefile(s) so it generates thinlto
     files;
  2. kpatch-build and kpatch-cc to save the thinlto file properly;
  3. kpatch-build to feed these thinlto files to CDO;
  4. The user need to supply vmlinux.o, from which we generate the symtab
     file. We need this because GLOBAL symbols may be marked as LOCAL in
     LTO vmlinux;

[1] dynup#1320

Signed-off-by: Song Liu <[email protected]>
liu-song-6 added a commit to liu-song-6/kpatch that referenced this issue Feb 16, 2023
Support CONFIG_LTO_CLANG_THIN with ld.lld --lto-obj-path option.

With CONFIG_LTO_CLANG_THIN, .o files are LLVM IR binary, so CDO doesn't
work on .o file. To solve this issue, we CDO the thinlto files generated
by the --lto-obj-path option. Clang LTO generates the thinlto files
after cross file inline, so they are good candidates for CDO. See [1] for
more discussions about this.

To achieve this, we need:

  1. kpatch-build to update kernel Makefile(s) so it generates thinlto
     files;
  2. kpatch-build and kpatch-cc to save the thinlto file properly;
  3. kpatch-build to feed these thinlto files to CDO;
  4. The user need to supply vmlinux.o, from which we generate the symtab
     file. We need this because GLOBAL symbols may be marked as LOCAL in
     LTO vmlinux;

[1] dynup#1320

Signed-off-by: Song Liu <[email protected]>
liu-song-6 added a commit to liu-song-6/kpatch that referenced this issue Mar 4, 2023
Support CONFIG_LTO_CLANG_THIN with ld.lld --lto-obj-path option.

With CONFIG_LTO_CLANG_THIN, .o files are LLVM IR binary, so CDO doesn't
work on .o file. To solve this issue, we CDO the thinlto files generated
by the --lto-obj-path option. Clang LTO generates the thinlto files
after cross file inline, so they are good candidates for CDO. See [1] for
more discussions about this.

To achieve this, we need:

  1. kpatch-build to update kernel Makefile(s) so it generates thinlto
     files;
  2. kpatch-build and kpatch-cc to save the thinlto file properly;
  3. kpatch-build to feed these thinlto files to CDO;
  4. The user need to supply vmlinux.o, from which we generate the symtab
     file. We need this because GLOBAL symbols may be marked as LOCAL in
     LTO vmlinux;

[1] dynup#1320

Signed-off-by: Song Liu <[email protected]>
liu-song-6 added a commit to liu-song-6/kpatch that referenced this issue Mar 7, 2023
Support CONFIG_LTO_CLANG_THIN with ld.lld --lto-obj-path option.

With CONFIG_LTO_CLANG_THIN, .o files are LLVM IR binary, so CDO doesn't
work on .o file. To solve this issue, we CDO the thinlto files generated
by the --lto-obj-path option. Clang LTO generates the thinlto files
after cross file inline, so they are good candidates for CDO. See [1] for
more discussions about this.

To achieve this, we need:

  1. kpatch-build to update kernel Makefile(s) so it generates thinlto
     files;
  2. kpatch-build and kpatch-cc to save the thinlto file properly;
  3. kpatch-build to feed these thinlto files to CDO;
  4. The user need to supply vmlinux.o, from which we generate the symtab
     file. We need this because GLOBAL symbols may be marked as LOCAL in
     LTO vmlinux;

[1] dynup#1320

Signed-off-by: Song Liu <[email protected]>
liu-song-6 added a commit to liu-song-6/kpatch that referenced this issue Mar 15, 2023
Support CONFIG_LTO_CLANG_THIN with ld.lld --lto-obj-path option.

With CONFIG_LTO_CLANG_THIN, .o files are LLVM IR binary, so CDO doesn't
work on .o file. To solve this issue, we CDO the thinlto files generated
by the --lto-obj-path option. Clang LTO generates the thinlto files
after cross file inline, so they are good candidates for CDO. See [1] for
more discussions about this.

To achieve this, we need:

  1. kpatch-build to update kernel Makefile(s) so it generates thinlto
     files;
  2. kpatch-build and kpatch-cc to save the thinlto file properly;
  3. kpatch-build to feed these thinlto files to CDO;
  4. The user need to supply vmlinux.o, from which we generate the symtab
     file. We need this because GLOBAL symbols may be marked as LOCAL in
     LTO vmlinux;

[1] dynup#1320

Signed-off-by: Song Liu <[email protected]>
swine added a commit to swine/kpatch that referenced this issue Jul 21, 2023
Support CONFIG_LTO_CLANG_THIN with ld.lld --lto-obj-path option.

With CONFIG_LTO_CLANG_THIN, .o files are LLVM IR binary, so CDO doesn't
work on .o file. To solve this issue, we CDO the thinlto files generated
by the --lto-obj-path option. Clang LTO generates the thinlto files
after cross file inline, so they are good candidates for CDO. See [1] for
more discussions about this.

To achieve this, we need:

  1. kpatch-build to update kernel Makefile(s) so it generates thinlto
     files;
  2. kpatch-build and kpatch-cc to save the thinlto file properly;
  3. kpatch-build to feed these thinlto files to CDO;
  4. The user need to supply vmlinux.o, from which we generate the symtab
     file. We need this because GLOBAL symbols may be marked as LOCAL in
     LTO vmlinux;
  4. A small workaround in CDO that ignores changes in init.text for
     vmlinux.o.thinlto.*

[1] dynup#1320

Signed-off-by: Song Liu <[email protected]>
@github-actions
Copy link

This issue has been open for 30 days with no activity and no assignee. It will be closed in 7 days unless a comment is added.

@github-actions github-actions bot added the stale label Aug 20, 2023
@github-actions
Copy link

This issue was closed because it was inactive for 7 days after being marked stale.

@joe-lawrence
Copy link
Contributor

Reopening, IMHO the bot was a bit premature in closing this one.

@joe-lawrence joe-lawrence reopened this Aug 31, 2023
@bwendling
Copy link
Contributor

Drive-by comment: There's a public tool to generate livepatches for LLVM. It might be useful to you. https://github.com/google/llpatch

@github-actions github-actions bot removed the stale label Sep 1, 2023
@swine
Copy link
Contributor

swine commented Sep 8, 2023 via email

@github-actions
Copy link

github-actions bot commented Oct 9, 2023

This issue has been open for 30 days with no activity and no assignee. It will be closed in 7 days unless a comment is added.

@github-actions github-actions bot added the stale label Oct 9, 2023
@jpoimboe jpoimboe self-assigned this Oct 9, 2023
@jpoimboe jpoimboe removed the stale label Oct 9, 2023
@fazlamehrab
Copy link

fazlamehrab commented Feb 1, 2024

@liu-song-6 I found that the thinlto-6.2 branch can create live patches for kernel v6.2. However, attempting to use the fb-6.4 branch to generate a live patch for kernel v6.4 results in the following error.

$ kpatch-build block.patch -s linux/ -v linux/vmlinux.o 
Using source directory at /home/mehrab/linux
WARNING: Clang support is experimental
Testing patch file(s)
Reading special section data
Building original source

Building patched source
cp: cannot stat '/home/mehrab/linux/vmlinux.o.thinlto.o*': No such file or directory
ERROR: kpatch build failed. Check /home/mehrab/.kpatch/build.log for more details.

The block.patch looks like the following.

diff --git a/block/bdev.c b/block/bdev.c
index 21c63bfef323..97b62d1c5338 100644
--- a/block/bdev.c
+++ b/block/bdev.c
@@ -86,6 +86,7 @@ void invalidate_bdev(struct block_device *bdev)
                lru_add_drain_all();    /* make sure all lru add caches are flushed */
                invalidate_mapping_pages(mapping, 0, -1);
        }
+       printk("patch: invalidate_bdev\n");
 }
 EXPORT_SYMBOL(invalidate_bdev);

The .config file has following configs.

CONFIG_LTO=y
CONFIG_LTO_CLANG=y
CONFIG_ARCH_SUPPORTS_LTO_CLANG=y
CONFIG_ARCH_SUPPORTS_LTO_CLANG_THIN=y
CONFIG_HAS_LTO_CLANG=y
# CONFIG_LTO_NONE is not set
# CONFIG_LTO_CLANG_FULL is not set
CONFIG_LTO_CLANG_THIN=y

I'm wondering if I'm doing something wrong, or if there are additional patches for kernel v6.4. Could you share your thoughts? Additionally, have you tried the thin LTO live patch on kernel v5.15? Any insights would be greatly appreciated.

Thanks,
Mehrab

@liu-song-6
Copy link
Contributor Author

@fazlamehrab Thanks for testing this out.

​The config looks correct to me. Have you tried to use
the thinlto-6.2 branch on v6.4 and v5.15 kernels? I
think they should just work.

@fazlamehrab
Copy link

@liu-song-6 For v5.15, I get the following error.

$ kpatch-build ../block-5.15.patch -s ../linux/ -v ../linux/vmlinux.o
Using source directory at /home/mehrab/linux
WARNING: Clang support is experimental
Testing patch file(s)
Reading special section data
Building original source
Building patched source
Extracting new and modified ELF sections
Binary files orig/vmlinux.o.thinlto.o and patched/vmlinux.o.thinlto.o differ
Binary files orig/vmlinux.o.thinlto.o1 and patched/vmlinux.o.thinlto.o1 differ
Binary files orig/vmlinux.o.thinlto.o5 and patched/vmlinux.o.thinlto.o5 differ
Binary files orig/vmlinux.o.thinlto.o850 and patched/vmlinux.o.thinlto.o850 differ
create-diff-object: ERROR: vmlinux.o.thinlto.o850: find_local_syms: 190: couldn't find matching bdev.c local symbols in vmlinux symbol table
ERROR: 1 error(s) encountered. Check /home/mehrab/.kpatch/build.log for more details.

The build.log file contains the following:

create-diff-object: ERROR: vmlinux.o.thinlto.o850: find_local_syms: 190: couldn't find matching bdev.c local symbols in vmlinux symbol table

And, for v6.4, I get the following error.

$ kpatch-build ../block-6.4.patch -s ../linux/ -v ../linux/vmlinux.o
Using source directory at /home/mehrab/linux
WARNING: Clang support is experimental
Testing patch file(s)
Reading special section data
Building original source
Usage: ./scripts/setlocalversion [--no-local] [srctree]
ERROR: kpatch build failed. Check /home/mehrab/.kpatch/build.log for more details.

In this case, the build.log file contains the following:

checking file block/bdev.c
patching file block/bdev.c

Please share your thoughts on these.

liu-song-6 added a commit to liu-song-6/kpatch that referenced this issue Mar 28, 2024
Support CONFIG_LTO_CLANG_THIN with ld.lld --lto-obj-path option.

With CONFIG_LTO_CLANG_THIN, .o files are LLVM IR binary, so CDO doesn't
work on .o file. To solve this issue, we CDO the thinlto files generated
by the --lto-obj-path option. Clang LTO generates the thinlto files
after cross file inline, so they are good candidates for CDO. See [1] for
more discussions about this.

To achieve this, we need:

  1. kpatch-build to update kernel Makefile(s) so it generates thinlto
     files;
  2. kpatch-build and kpatch-cc to save the thinlto file properly;
  3. kpatch-build to feed these thinlto files to CDO;
  4. The user need to supply vmlinux.o, from which we generate the symtab
     file. We need this because GLOBAL symbols may be marked as LOCAL in
     LTO vmlinux;

[1] dynup#1320

Signed-off-by: Song Liu <[email protected]>
@liu-song-6
Copy link
Contributor Author

@fazlamehrab I hit something recently. I "fixed" it disabling some configs. You can probably try disable the following:

CONFIG_CALL_THUNKS
CONFIG_PREFIX_SYMBOLS
CONFIG_X86_KERNEL_IBT

Note that, they have some cross dependency, so you want to confirm they actually got disabled.

@fazlamehrab
Copy link

@liu-song-6 I tried this, but no luck with v5.15 kernel, yet. I must be missing something. Can you please see if you get the same error that I mentioned earlier with the attached config? It is based on default config using LLVM 11.

I also tried with LLVM 14, which gives me following errors.

readelf: Warning: Invalid pointer size (65) in compunit header, using 4 instead
readelf: Warning: CU at offset eb82de7 contains corrupt or unsupported version number: 5.
ERROR: can't find special struct alt_instr size. Check /home/mehrab/.kpatch/build.log for more details.

Also, please share the config that works for you, if possible. Thanks!

config-v5.15.txt

@jpoimboe
Copy link
Member

As a general update, we have no plans of solving this problem in kpatch-build. In fact, kpatch-build will eventually be deprecated in favor of klp-build, which is currently under development and will be merged into the upstream kernel.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants