Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PEP 656 musllinux support #305

Closed
6 tasks done
uranusjr opened this issue Apr 12, 2021 · 26 comments
Closed
6 tasks done

PEP 656 musllinux support #305

uranusjr opened this issue Apr 12, 2021 · 26 comments

Comments

@uranusjr
Copy link
Member

uranusjr commented Apr 12, 2021

PEP 656 proposed musllinux, a counterpart of manylinux for Linux distros running on musl libc. I’ve recently implemented support for installing musllinux wheels (pypa/packaging#411), and are now looking for a place to implement wheel generation logic.

Do you think auditwheel is a good place for this? Many things are the same auditing manylinux and musllinux wheels (the Linux part), but from my (very little) understanding, many parts in auditwheel assume glibc and would need some refactoring to work with musl, so there’s case to create a new tool for musllinux wheel auditing as well. My main end goal is to add musl to the support matrix of cibuildwheel (pypa/cibuildwheel#627), so either approach would ultimately be fine.

Edit from maintainers:
List of tasks that would need to be done in order to add support for musllinux.
This list is subject to changes and will be updated with feedback from comments.

This seems to be what's needed for now.
Be warned, most of the work is probably musllinux-policy.json.

@mayeut
Copy link
Member

mayeut commented May 9, 2021

Hi @uranusjr,

Sorry it took this long for someone to answer.

I think it makes sense to add musllinux support to auditwheel.
As you said, many things should be the same between musllinux & manylinux. Of course, there are also differences for glibc vs musl that need to be taken care of but I wouldn't say that auditwheel assumes glibc in its code.

One a of the major difference I foresee is that, because glibc uses symbol versioning, everything is done through the policy wether we're talking about glibc libraries, gcc/g++ libraries. Symbol versions are used to compute the effective policy and thus wether or not it's compatible with the requested one.
I do hope that symbol versioning is still a thing for gcc/g++ libraries on musl distros, that way, at least that part of the code can be reused. Detecting the minimum musl version a wheel is eligible for might be a bit trickier and require some new mechanisms in auditwheel (I've not read the full thread on discourse so don't know where the discussions stopped on that matter). A first step could be "check I'm valid for the current musl & graft needed libraries".

As I see it, it would require at least a new json policy file specific to musllinux, a way to switch between policies at startup & the minimum "musl" detection code.

@lkollar, any other inputs/ideas/remarks/objections ?

I do not have the time to look into it myself but will do my best to review any proposition to add such support.

@uranusjr
Copy link
Member Author

uranusjr commented May 9, 2021

The musllinux spec was intentionally designed to follow the new “perennial” manylinux design, so I’m kind of expecting to reuse much of the same logic. musl does not support the same symbol versioning functionalities as glibc, only allows pulling the default version and not the latest version, but I’m expecting users to mainly use containers to produce musllinux wheels, so they can use an older musl version in their image to get the musllinux tag they want; i.e. a wheel built against musl 1.2 can only become musllinux_1_2 or later, no matter what symbols you use.

@mayeut
Copy link
Member

mayeut commented May 9, 2021

I’m expecting users to mainly use containers to produce musllinux wheels, so they can use an older musl version in their image to get the musllinux tag they want i.e. a wheel built against musl 1.2 can only become musllinux_1_2 or later, no matter what symbols you use.

That was my - probably unclear - comment:

A first step could be "check I'm valid for the current musl & graft needed libraries".

You build & repair on musl 1.2, you get musllinux_1_2.

I was more thinking about symbol introduction rather than symbol compatibility and trying to detect which minimum musl version satisfies the symbols used by the wheel being repaired. Maybe that's a bit too complicated, not wanted or whatever.

The option you propose is a perfectly valid one although it differs a bit from my comment which doesn't add "or later". If you get "later", then, it becomes a hassle to test the wheel you just built in the same image because pip will refuse - rightfully so - to install it.

@uranusjr
Copy link
Member Author

uranusjr commented May 9, 2021

I think we are on the same page, just not communicating very well 🙂

I was more thinking about symbol introduction rather than symbol compatibility and trying to detect which minimum musl version satisfies the symbols used by the wheel being repaired. Maybe that's a bit too complicated, not wanted or whatever.

I don’t think it’s practical. musl does not expose the information itself, so to achieve this we’ll need to maintain the symbol-minver lookup ourselves. So yeah, too complicated etc.

The option you propose is a perfectly valid one although it differs a bit from my comment which doesn't add "or later".

I probably shouldn’t have added the “or later” part, it only made things more complicated. I was trying to say that the user theoratically can choose to repair a wheel into musllinux_1_3 against musl 1.2 (assuming other dependencies can be correctly grafted). I don’t see anyone would actually want to do that in practice though, so let’s not worry about that.

@mayeut
Copy link
Member

mayeut commented May 9, 2021

@uranusjr,

do you mind if auditwheel maintainers edit your first post to add a checklist of what we would expect to see in a PR ?

I think it would help potential contributors to understand what's expected without browsing through all comments.

@uranusjr
Copy link
Member Author

uranusjr commented May 9, 2021

No problem at all, please go ahead!

@mayeut
Copy link
Member

mayeut commented May 14, 2021

At first glance, most of the work should be in creating a musllinux-policy.json.

For manylinux1, manylinux2010 & manylinux2014, this was done semi-manually by running the scripts/calculate_symbol_versions.py script on different distros.

For PEP 600 manylinux, I automated this process in https://github.com/mayeut/pep600_compliance (which still relies on "run the script on a bunch of distros")

For PEP 656 musllinux, the same kind of work must happen.

I do hope that symbol versioning is still a thing for gcc/g++ libraries on musl distros, that way, at least that part of the code can be reused.

this has to be checked. If it's not the case, something else shall be implemented.

@uranusjr
Copy link
Member Author

How do you determine whether a distro is compliant to a PEP 600 manylinux tag? From my understanding of PEP 600 (which PEP 656 follows), whether a library (except libc) is allowed depends on whether it is provided by popular distros. This means the policy is inheritantly defined by distros, and it seems circular to me if distros can be categorised by that policy.

@mayeut
Copy link
Member

mayeut commented May 15, 2021

How do you determine whether a distro is compliant to a PEP 600 manylinux tag?

Just looking at its glibc version.
The PEP 600 tag policy should conform to that, not the other way around.

From my understanding of PEP 600 (which PEP 656 follows), whether a library (except libc) is allowed depends on whether it is provided by popular distros.

For auditing, it's not only wether a library is allowed. It's also wether or not its version (or more exactly symbols it uses) are allowed. This is only enforced for gcc libraries in auditwheel as of now (& glibc but we already talked about how to handle musl in a previous comment).

Let's take a look at simple sample:
distros with glibc 2.28, trying to determine what the manylinux_2_28 policy should be
for the sake of simplicity, let's just consider there are just CentOS 8 & Photon 3 in "popular" distros with glibc 2.28
If you compare the package list, you'll see that they're not using the same gcc package version.
If I were to build a wheel on CentOS 8 with the latest additions to the C++ standard, I'd probably be building a wheel that's not compatible with Photon 3 even though they're using the same glibc and that libstdc++.so.6 is a whitelisted library. Thus, CentOS 8 built wheels cannot be marked manylinux_2_28 just by looking at the glibc version and the whitelisted library, we have to take into account symbols used in libstdc++.so.6 (and symbol versioning makes that easy for gcc libraries)

In other words, every popular glibc 2.28 distro is compliant for consuming manylinux_2_28 wheels but every popular glibc 2.28 distro might not produce manylinux_2_28 compliant wheels.

This means the policy is inheritantly defined by distros, and it seems circular to me if distros can be categorised by that policy.

Well, I think you had a look at the https://github.com/mayeut/pep600_compliance and that the wording there is a bit misleading.
The "distro compatibility" section is just a mapping of the glibc version i.e. policy column manylinux_{glibc_major}_{glibc_minor} and distro column, just the list of popular distros with this glibc version.
It's mostly a centralized way for me to find the glibc version of a given popular distro.
It's thus not circular, it's a strict equivalence as defined in PEP 600.

The 2 other sections are a bit more interesting:
"Known compatibility issues":
The automation uses the policy file from auditwheel to detect if already defined policies are breaking some rule.
Again, the wording is probably bad, it's not the distro that's not compatible, it's the policy that's buggy. In any case, that's a compatibility issue.

The "Acceptable distros to build wheels" section is a bit more interesting here.
It reports distros which libraries/symbols versions matches exactly the policy.

The way the policies are built is even more interesting for the purpose of this thread and it exactly matches your comment "policy is inheritantly defined by distros":

next_version = None
for glibc_version in sorted(versions, reverse=True):
    policy_{glibc_version} = intersection([symbol_versions(distro, whitelist_libraries) for distro in get_distros(glibc_version)])
    if next_version:
        # A policy defined for glibc (x1, y1) shall be a subset of the policy defined for glibc (x2, y2) where (x1, y1) < (x2, y2)
        policy_{glibc_version} &= policy_{next_version}
    next_version = glibc_version

@uranusjr
Copy link
Member Author

Thanks a lot for the detailed explaination, indeed I was misunderstanding what pep600_compliance is showing 🙂

Since musl does not cover C++ in the first place and I believe distros just use libstdc++ built against musl, can the above PEP 600 implementation be used for PEP 656? Where can I find it? I quickly searched but couldn’t find the above code snippet in either pep600_compliance or auditwheel.

@mayeut
Copy link
Member

mayeut commented May 17, 2021

can the above PEP 600 implementation be used for PEP 656?

If gcc libraries do use symbol versioning (as opposed to musl), sure you can use it. If not....

I quickly searched but couldn’t find the above code snippet in either pep600_compliance or auditwheel.

I wrote this as pseudo-code for it to be a bit clearer.

The actual way it works in the pep600_compliance repo is:

  1. dump symbol versions for every distro in cache/{machine}/{distro_name}-{distro_version}.json
    It does a bit more than just symbol versions in light of We should probably allow libz linkage, and maybe libexpat #152.
    The content of the json is built by https://github.com/mayeut/pep600_compliance/blob/83b1afd0b44610f81b484021d3fa5ff0ec73b70b/pep600_compliance/images/base.py#L268-L271
    One function just calls the "symbol finding script"
    The other finds which "not whitelisted" libraries are dependencies of the python interpreter. The whitelist is not read from the policy file yet. For a new standard such as PEP 656 (and maybe even for manylinux), I think it would make sense to blacklist anything that appears in "extra", c.f. We should probably allow libz linkage, and maybe libexpat #152 for the rationale.
  2. Analyse all the cache/*/*.json files to update README.md (and some other markdown files)
    The policies are never written to disk in the repo. The code is there though, you'd be interested in https://github.com/mayeut/pep600_compliance/blob/83b1afd0b44610f81b484021d3fa5ff0ec73b70b/pep600_compliance/make_policies.py#L182-L190. It calls the make_policies which is closest to the pseudo-code from the previous comment.

PEP 656 starts from scratch so there's no pre-existing policies which should simplify a bit the code (look for "official") and policies can be built cleanly.
For PEP 600, we need to know if existing policies are breaking any rules and that's why those policies are immutable and incompatibilities can be reported.

@uranusjr
Copy link
Member Author

I know Alpine’s toolchain does support symbol versioning, and at least some libs are compiled with it (not sure if all of them do), so that should be a good starting point (again, I expect PEP 656 users to mainly use containers for compilation, so having one working distro should be enough). I’ll find some time to dig into the compliance logic and se what I can come up with.

(But thinking this some more, Alpine doesn’t really come with many libraries in the first place, maybe this whole discussion would be pretty moot and we end up not including any system libraries at all… I’ll need to check that.)

@mayeut
Copy link
Member

mayeut commented May 18, 2021

maybe this whole discussion would be pretty moot and we end up not including any system libraries at all… I’ll need to check that.

I need to find references for that but libstdc++.so.6 can't be grafted so if it's not whitelisted, it shall be blacklisted. With pybind11 wrappers for example, this would require some specific link options. There's a rather long discussion about those options in the manylinux repo. edited per #305 (comment)

mayeut added a commit to mayeut/auditwheel that referenced this issue May 27, 2021
This prepares for other policies to be added with meaningful names.
Contributes to pypa#305 resolution where `musllinux-policy.json` would be added.
mayeut added a commit that referenced this issue May 27, 2021
This prepares for other policies to be added with meaningful names.
Contributes to #305 resolution where `musllinux-policy.json` would be added.
@lkollar
Copy link
Contributor

lkollar commented Jun 28, 2021

So I suppose we should go through the current white list in the manylinux policy and decide what needs to be done for each in musllinux. At least some of the GNU libraries seem to contain symbol versions, which (since they are linked against musl) will not contain GLIBC and GLIBCXX symbol versions. That leaves the CXXABI and GCC symbols.

Looking at libstdc++ on Alpine 3.14, the only versions in the NEEDED section are GCC_3.0, GCC_3.3, GCC_3.4 and GCC_4.2.0. This matches the output from libstdc++ on Ubuntu 20.04 (except of course that that version also lists GLIBC symbols in the needed section and exports CXXABI and GLIBCXX versioned symbols). It also appears that the 6.0.28 version of libstdc++ is common across popular Linux glibc and musl distros as well (I could only see 6.0.29 in one rolling release distro). Also, Alpine does not come with libstdc++ preinstalled, and probably many other white listed libraries are missing from the base install as well. It might be possible to graft these for musllinux but if this doesn't work, users will have to know they need to install them. This is not a great user experience, but it would simplify the policy.

From this it seems that although some of the GNU symbol versions are kept on Alpine, these libraries are linked against musl and there is no symbol versioning in effect for the symbols in libc. Which raises the question, how can we tell which library version is compatible with each musl libc version? An alternative approach could be to base the policy around library versions. But for this we somehow need to determine these versions and validate them across musllinux distros.

I think it would be also worthwhile to come up with a list of popular musl distros which we can include in the initial research and testing. There are quite a few listed on https://wiki.musl-libc.org/projects-using-musl.html but I'm sure many of these are niche enough that we shouldn't include them. Some of the major ones I've used or heard about are Alpine, OpenWRT and the musl variant of Void. Any others?

@lkollar
Copy link
Contributor

lkollar commented Jun 29, 2021

After reading through the PEP discussions on Discourse, particularly from Nathaniel's post it seems that auditwheel should just graft all library dependencies except for libc and libz.

So in theory, the alpine:3.14 Docker image could be used to generate musllinux_1_2 wheels where auditwheel would simply skip libc and libz but graft every other dependency from the image. Does this sounds reasonable @mayeut @uranusjr?

I expect quite a few changes to auditwheel as the current policy format is centered around symbol versions and we probably have to change that for musllinux to use the version of musl instead to determine the platform tag. We should also create a musllinux image similar to manylinux, which contains the necessary tools. We can probably base this on Alpine as suggested in the Discourse thread.

@uranusjr
Copy link
Member Author

uranusjr commented Jun 30, 2021

Yes I think that’s totally reasonable. I intentionally left out what exactly should be grafted because the default Alpine installation is so bare. So we should start with grafting mostly everything and see what the community reacts. If there’s a strong demand in reducing wheel sizes (by sharing some libraries between them), we can learn from the actual users what they want to share and create PEPs for specific musllinux_X_Y tags to cater the need.

@lkollar
Copy link
Contributor

lkollar commented Jul 1, 2021

I started prototyping this in auditwheel and threw some code together to detect musl and get the version. This is not currently exposed in packaging, so I'm shelling out to the musl dynamic loader and parsing the version string. I can open an issue in packaging for exposing this functionality if you think this makes sense @uranusjr.

A major roadblock I've hit is that I have no idea how to determine what version of musl was a native extension linked against. Since there is no symbol version information available, and musl doesn't seem to have any visible version symbols, I'm not sure this is possible at all.

If we can't determine the version reliably, we could take it from the user with --plat. Of course, since one of the benefits of using audithweel is that it can automatically tag the wheel, this would be a pretty big limitation. However, with a musllinux Docker image and something like cibuildwheel we could at least provide an easy way to build wheels.

@lkollar
Copy link
Contributor

lkollar commented Jul 1, 2021

For anyone following this, I posted the same question about the musl version on Discourse.

@uranusjr
Copy link
Member Author

uranusjr commented Jul 2, 2021

This is not currently exposed in packaging, so I'm shelling out to the musl dynamic loader and parsing the version string. I can open an issue in packaging for exposing this functionality if you think this makes sense @uranusjr.

You mean exposing the interface in packaging._musllinux that reads musl’s version? I’m not a maintainer of packaging but that doesn’t sound like something packaging should expose. It’s probably a good idea if both projects can maintain a common implementation to this logic though (and either copy-paste or vendor into each project). Not sure how that’d work in practice but an issue on that topic would definitely be a good start.

I’ll reply to the rest on Discourse to keep the conversation in one place.

@mayeut
Copy link
Member

mayeut commented Jul 3, 2021

Looking at libstdc++ on Alpine 3.14, ....
It might be possible to graft these for musllinux but if this doesn't work, users will have to know they need to install them. This is not a great user experience, but it would simplify the policy.

@lkollar,

libstdc++ shall never be grafted. This would lead to various issues (c.f. #305 (comment), still need to find some references for this. Maybe @njsmith has some) edited per #305 (comment)

So in theory, the alpine:3.14 Docker image could be used to generate musllinux_1_2 wheels where auditwheel would simply skip libc and libz but graft every other dependency from the image. Does this sounds reasonable ?

I wouldn't graft libz until #152 is closed (keep as much identical between manylinux & musllinux). libexpat can be safely whitelisted so I'd say libc & libexpat at first.

@njsmith
Copy link
Member

njsmith commented Jul 3, 2021

I'm actually not aware of problems with grafting libstdc++, so long as wheels don't try to somehow reach into each other and share C++ interfaces. That doesn't mean there aren't any, because we haven't really tried it.

@mayeut
Copy link
Member

mayeut commented Jul 3, 2021

Thanks for the feedback @njsmith.
Thus my comment on libstdc++ can be ignored (I might have mixed things with "so long as wheels don't try to somehow reach into each other and share C++ interfaces"). If it proves problematic, it can be either whitelisted or blacklisted in a second time.

@lkollar
Copy link
Contributor

lkollar commented Jul 5, 2021

I've uploaded my work in progress in #313. This will require a significant refactor as auditwheel assumes that there is only one platform policy in existence and a lot of assumptions have been made around this.

We will also need to build musllinux Docker images to provide the base build environment for users. I've opened an issue in the manylinux repo to track the work.

@mayeut
Copy link
Member

mayeut commented Jul 31, 2021

PR #313 has been superseded by PR #315 per #315 (comment)

@liath
Copy link

liath commented Aug 27, 2021

#315 is merged now. So this is closable I think?

@mayeut
Copy link
Member

mayeut commented Sep 18, 2021

5.0.0 with musllinux support is available on PyPI

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants