-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Campaign to get people Publishing Wheels #25
Comments
http://pythonwheels.com/ is an attempt at this, now that pip 1.5 installs them by default this should be easier. I think one part of this would be to make the |
I was thinking surely it makes sense for a simple "build" command that made wheels, eggs and an sdist by default? Rather than having to specify each one separately? Am I wrong in thinking that you still need to install another package just to create wheels? |
Yes, you need to |
As of 2015 Christoph Gohlke publishes wheels rather than msi installers http://www.lfd.uci.edu/~gohlke/pythonlibs/ |
@scopatz is this something you could comment on? |
Thanks for roping me into this issue @brainwane. I am speaking a on behalf of conda-forge here. But basically, we'd love it if conda-forge could be used to build & publish wheels. To that end, it might be more useful to think of conda-forge as just "The Forge." We have the infrastructure for building binary packages across Linux, OS X, Windows, ARM, and Power8 already. We have a tool called conda-smithy that we develop and maintain that helps us keep all of the packages / recipes / CIs configured and up-to-date. I see two major hurdles to building and deploying wheels from conda-forge. These could be worked on in parallel. Building: conda-smithy would need to be updated so that packages that are configured to do would generate the approriate CI scripts (from Jinja templates) to build wheels. This would be CI-provider and architecture specific. Probably the easiest place to start is building from manylinux on Azure. We would probably need at least one configuration variable to live in The challenge with building is that most of the conda-forge people are not used to building wheels. I am happy to help work on the conda-forge infrastructure side, but I think we need someone who is an expert on the wheels side who is also willing to jump in and help scale this out with me. Deploying: Once we can build wheels, we need a place to put them. Nominally, this would be PyPI. But we need to be able to do this from a CI service. We are happy to have an authentication token that we use. There isn't much that I see that conda-forge can really do about this (which has prevented us from working on this issue previously). However, I think that the PyPI is working on this. I am super excited about this; the fundemental premis of conda-forge is to be open source, cross platform, community build infrastructure. If there are other folks out there who are enthusiatsic about getting this working, please reach out to me or put me touch! |
Thanks @scopatz! @waveform80 and @bennuttall would you like to speak from the piwheels perspective? And @jwodder, from what you have learned via Wheelodex? (Found out about you via this thread.) |
Perhaps the work that @matthew-brett did at MacPython to build wheels of key packages of the Scientific Python stack will be helpful as well. Also, I discovered cibuildwheel by @joerick recently. (Edit: wrong Matthew Brett) |
For the piwheels project we build arm platform wheels for the Raspberry Pi, built natively on Raspberry Pi hardware, on piwheels.org we don't try to bundle dependencies ala manylinux2010, instead we target what's stable in the distro (Raspbian) and make no promises elsewhere. The project source itself is open, so others could run their own repos targeting other platforms. I don't recommend maintainers upload arm wheels, and instead let us build them knowing they work on the Pi. We also attempt to show library dependencies on our project pages e.g. https://www.piwheels.org/project/numpy/ rather than let people work them out e.g. https://blog.piwheels.org/how-to-work-out-the-missing-dependencies-for-a-python-package/ |
Hi @scopatz, what do you propose to do about shared libraries that have no natural place in a wheel (to me, most shared libraries have no natural place in a wheel). We cannot stick our heads in the sand on that. That we use shared libraries heavily in conda is one of our most compelling advantanges and because we use the same ones across languages, putting those shared libraries in a wheel would be a bad thing to do. I'm not coming with a solution here. I wish I were, I really do. |
It would probably be neater not to ship the external libraries in the wheels, but it has in practice been working, at least on Linux and macOS. I can see the problem is more urgent for Conda, because y'all are building a multi-language software distribution. A few years ago, @njsmith wrote a spec for pip-installable libaries: pypa/wheel-builders#2 It isn't merged, and it looks like 'the current setup works for me' has meant that no-one thus far has had the time or energy to work further on that. I suspect something on that line is the proper solution, if we could muster the time. |
By the way - @scopatz - I'm happy to help integrating the wheel builds into conda forge - but I'm crazy busy these days, so I won't have much time for heavy lifting. |
Well, the software needs to work of course and I'm not being facetious! We end up discussing where the line is between the thing itself and the system libraries that support it, and that's not clear cut. Take xgboost as an example. It has a C/C++ library and bindings for Python and R. Now xgboost itself builds static libs for each so they sidestepped that issue while we're much more efficient (n many dimensions). Now libxgboost is clearly a part of the xgboost stack, but what about ncurses? Is it system or not? In conda-forge, we provide it, and in all honesty that line is organic and something we move as and when we find we need to. |
@brainwane @scopatz if there's a better title for this issue today, could you change it/comment so that someone else who can make the change, changes it? |
I can offer mild packaging familiarity, reasonable python / CI / cloud experience and say 10-20 hours a week for the next month if it would be helpful. I think I would be a good fit if there's a rough consensus on direction and pypa/conda experts available for consulting but bottlenecked on elbow grease |
@matthew-brett I thought Carl Kleffner did something similar to a pip installed tool chain with openBLAS for NumPy though my memory might be foggy |
@mikofski - right - Carl was working on Mingwpy, which was (still is) a pip-installable gcc compiler chain to build Python extensions that link against the Python.org Microsoft Visual C++ runtime library. Work has stalled on that, for a variety of reasons, although I still think it would be enormously useful. I can go into more details - or - @carlkl - do you want to give an update here? @mattip - because we were discussing this a couple of weeks ago. |
I don't know if we have a clear answer that pip should be used as a general-purpose packaging solution. My view which seems to be shared by several others from the recent discourse discussion about it is that it should not try to "reinvent the wheel" or replace general purpose packaging solutions (like conda, yum, apt-get, nix, brew, spack, etc...), pip has clear use as a packaging tool for developers and "self-integrators". For that use case, statically linking dependencies into a wheel (vendoring native dependencies) can be a stop-gap measure but become very difficult for distributors as evidenced by pytorch, rapids, arrow, and other communities. It is definitely not ideal and in-fact a growing problem with promoting the use of wheels for all Python users. Using pip to package native libraries is conceivably possible, but a bigger challenge than it seems at first. It is hard to understand the motivation for this considerable work when this problem is already solved by several other open-source and more general-purpose packaging systems. A better approach in my view is to enable native-library requirements to be satisfied by external packaging systems. In this way, pip can allow other package managers to install native requirements and only install wheels with native requirements if they are already present. Non-developer, end-users who use Python integrated with many other libraries (such as the PyData and SciPy users) should be also be encouraged to use their distribution package manager to get their software. These distributions (such as conda-forge) already satisfy robustly the need for one-command installation. This is a better user-experience than encouraging these particular users to "pip install" In sum: conda-forge infrastructure producing wheels is a good idea, conda-build recipes producing wheels that allow for conda-packages to satisfy native-library dependencies is an even better idea. |
@teoliphant While theoretically a reasonable idea, this ignores the fact that a significant number of users are asking for pip-installable versions of these packages. Ignoring those users, or suggesting that they should "just" switch to another packaging solution, is dismissing a genuine use case without sufficient investigation. I know from personal experience that there are people who do need such packages but who can't or won't switch to Conda (for example). And on Windows there is no OS-level distribution package manager. How do we serve such users? |
From talks at SciPy, it seemed like a good answer for those users is to
provide "fat" wheels that ship all needed shared libraries with the wheel.
These could be created using conda packages to minimize build time and
consolidate build procedures. There was some experimentation with that
using numpy and scikit-image as tests. The packages were significantly
larger - probably too large. Static linking is much more efficient, but
bifurcates the build process. I'm hopeful that we can explore ways to trim
down the shared library size such that this approach may be viable. Having
any sort of scheme to actually share native libraries via wheels
(pynativelib) would help, but I think a strong dependency solver is a hard
requirement for implemention of that.
…On Sun, Jul 14, 2019, 02:12 Paul Moore ***@***.***> wrote:
@teoliphant <https://github.com/teoliphant> While theoretically a
reasonable idea, this ignores the fact that a significant number of users
are asking for pip-installable versions of these packages. Ignoring those
users, or suggesting that they should "just" switch to another packaging
solution, is dismissing a genuine use case without sufficient investigation.
I know from personal experience that there are people who do need such
packages but who can't or won't switch to Conda (for example). And on
Windows there is no OS-level distribution package manager. How do we serve
such users?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#25?email_source=notifications&email_token=AAAJL6NJO3YZ5A7ES5QXNTLP7LGV7A5CNFSM4AJKEUU2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZ37VFQ#issuecomment-511179414>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAAJL6JMU67DXS6DDS5JRGTP7LGV7ANCNFSM4AJKEUUQ>
.
|
What about SONAME though? Or are you proposing to rewrite them and rename
the DSOs? If so are we worried about passing objects between different
versions of the same library? The glibc folks warned manylinux about that.
…On Sun, Jul 14, 2019, 5:22 PM Mike Sarahan ***@***.***> wrote:
From talks at SciPy, it seemed like a good answer for those users is to
provide "fat" wheels that ship all needed shared libraries with the wheel.
These could be created using conda packages to minimize build time and
consolidate build procedures. There was some experimentation with that
using numpy and scikit-image as tests. The packages were significantly
larger - probably too large. Static linking is much more efficient, but
bifurcates the build process. I'm hopeful that we can explore ways to trim
down the shared library size such that this approach may be viable. Having
any sort of scheme to actually share native libraries via wheels
(pynativelib) would help, but I think a strong dependency solver is a hard
requirement for implemention of that.
On Sun, Jul 14, 2019, 02:12 Paul Moore ***@***.***> wrote:
> @teoliphant <https://github.com/teoliphant> While theoretically a
> reasonable idea, this ignores the fact that a significant number of users
> are asking for pip-installable versions of these packages. Ignoring those
> users, or suggesting that they should "just" switch to another packaging
> solution, is dismissing a genuine use case without sufficient
investigation.
>
> I know from personal experience that there are people who do need such
> packages but who can't or won't switch to Conda (for example). And on
> Windows there is no OS-level distribution package manager. How do we
serve
> such users?
>
> —
> You are receiving this because you are subscribed to this thread.
> Reply to this email directly, view it on GitHub
> <
#25?email_source=notifications&email_token=AAAJL6NJO3YZ5A7ES5QXNTLP7LGV7A5CNFSM4AJKEUU2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZ37VFQ#issuecomment-511179414
>,
> or mute the thread
> <
https://github.com/notifications/unsubscribe-auth/AAAJL6JMU67DXS6DDS5JRGTP7LGV7ANCNFSM4AJKEUUQ
>
> .
>
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#25?email_source=notifications&email_token=AAH6S5FZX5IHXLV44YP73HTP7NADRA5CNFSM4AJKEUU2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZ4HQIY#issuecomment-511211555>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAH6S5G3OKHVP7DGZGKBVXTP7NADRANCNFSM4AJKEUUQ>
.
|
pip has clear use as a packaging tool for developers and "self-integrators".
I guess it is used by those people, but it's used by a lot of other people too.
For that use case, statically linking dependencies into a wheel (vendoring native dependencies) can be a stop-gap measure but become very difficult for distributors as evidenced by pytorch, rapids, arrow, and other communities. It is definitely not ideal and in-fact a growing problem with promoting the use of wheels for all Python users.
I guess the problem is growing, but only in the sense that there are
an increasing number of packages that ship wheels now. There are
some difficult packages - I know that the GUI packages can have
trouble. What difficulties are pytorch, rapids, arrow having? I'm
happy to advise.
Using pip to package native libraries is conceivably possible, but a bigger challenge than it seems at first. It is hard to understand the motivation for this considerable work when this problem is already solved by several other open-source and more general-purpose packaging systems.
A better approach in my view is to enable native-library requirements to be satisfied by external packaging systems. In this way, pip can allow other package managers to install native requirements and only install wheels with native requirements if they are already present.
I think that's exactly the problem - it's not practical for a Python
package to try and work with the huge numbers of package variants that
it could encounter.
Non-developer, end-users who use Python integrated with many other libraries (such as the PyData and SciPy users) should be also be encouraged to use their distribution package manager to get their software. These distributions (such as conda-forge) already satisfy robustly the need for one-command installation. This is a better user-experience than encouraging these particular users to "pip install"
I don't think Scipy or PyData users will have any trouble - were you
thinking of any package in particular? Numpy / Scipy / Matplotlib /
Pandas are all well packaged, and have been for a long time.
In sum: conda-forge infrastructure producing wheels is a good idea, conda-build recipes producing wheels that allow for conda-packages to satisfy native-library dependencies is an even better idea.
I don't think there's much appetite for making pip installs depend on
prior conda installs - wouldn't that just increase the confusion?
|
For arrow, I think it's best summarized here: https://twitter.com/wesmckinn/status/1149319821273784323
|
@wesm - I'm happy to help with this - let me know if I can. Did you already contact the scikit-build folks? I have the impression they are best for C++ chains. (Sorry, I can't reply on Twitter, have no account). |
I believe we have one of the most complex package builds in the whole Python ecosystem. I think TensorFlow or PyTorch might have us beat, but it's close (it's obviously not a competition =D). I haven't contacted the scikit-build folks yet, if that could help us simplify our Python build I'm quite interested. I'm personally all out of budget for this after I lost a third or more of my June to build and package-related issues so maybe someone else can look into it |
Thanks - that sounds very tiring. I bet we can use this as a stimulus to improve the tooling. Would you mind making an issue in some sensible place in the Arrow repositories for us to continue the discussion? |
I'll echo what @wesm said here. I spent a lot of time as well trying to cope with wheel packaging issues on PyArrow. I'd be much happier if people accepted to settle on (disclaimer: I used to work for Anaconda but don't anymore. Also I own a very small amount of company shares) |
@pitrou - I hear the hope, but I really doubt that's going to happen in the short term. So I still think the best way, for now, is for those of us with some interest and time, to try and improve the wheel building machinery to the point where they are a minimal drain on your development resources. |
Just to drop some statistics to indicate the seriousness of this problem, our download numbers are growing to the same magnitude as NumPy and pandas
One of the reasons for our complex build environment is that we're solving problems that are very difficult or impossible to solve without a deep dependency stack. So there is no end in sight to our suffering with the current state of wheels |
@isuruf - a tip of the hat for your enterprise sir! How easy would this be to generalize? |
I have released the first version of conda-press (v0.0.1) https://github.com/regro/conda-press. It would be awesome if you all could play around with it & help improve it! Next steps are to:
|
Excellent news! Are the resulting wheels compatible with "normal" wheels from PyPI? (Specifically on Windows - I'm assuming that the Linux ones won't be manylinux-compatible...) To put it another way, the README talks about |
@pfmoore - Great questions! These are intended as general purpose wheels. I'll update the README to make that more clear. The Windows wheels should be compatible with non-conda installations because conda-forge goes to great lengths to be compatible with externally built wheels. Also, on the Linux side, they still should be compatible, even though they don't adhere to the manylinux specs (yet). This is because the wheels built here should still be ABI compatible. If there are compatibility issues, we should look into fixing them. |
Cool - that's really great to know. |
@scopatz, how are the shared libraries handled? |
@isuruf - Depends what you mean by handled and where they live. Normal ones live in |
What happens if Note that pip dependency resolution is different than conda's and doesn't ensure a consistent environment. |
It depends on how you build the wheel with conda-press. If you are building "fat" wheels (with all non-python deps included in a single wheel), then the second package installed would clobber the first. If "thin" wheels were built, then I am very open to having an option where the shared |
Why not put needed shared libraries alongside their extension modules and
use auditwheel to avoid collisions at load time? It'll be less efficient
but much safer.
…On Fri, Aug 9, 2019, 10:19 Anthony Scopatz ***@***.***> wrote:
It depends on how you build the wheel with conda-press. If you are
building "fat" wheels (with all non-python deps included in a single
wheel), then the second package installed would clobber the first.
If "thin" wheels were built, then libfoo would get its own wheel and be
managed by pip and the version pins would kick in saying packages are
incompaible.
I am very open to having an option where the shared lib/ goes into a
subdirectory with the package name. I think that this could work on
platforms that support multiple RPATHs. PRs welcome!
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#25?email_source=notifications&email_token=AAAJL6MESUIUMDRE4B376WLQDWDH3A5CNFSM4AJKEUU2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD367EJY#issuecomment-519959079>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAAJL6JMSJW54LOSRD6A3MTQDWDH3ANCNFSM4AJKEUUQ>
.
|
This is a pip bug worth fixing pypa/pip#4625. sigh
s/would/should/ because... pypa/pip#988. sigh |
Even if you fix the bug, this is still a situation where the only truly correct behavior is at most to offer a user override, but more likely to error out. I really think the better answer for these fat wheels is to dodge the issue by avoiding conflicts, especially since auditwheel has that mostly figured out (except for windows, as I understand it - but machomachomangler was promising?)
and remember, even with a solver, the quality of the job it can do is limited by the quality of the metadata available to it. The solver in pip will be a big step forward, but it will take a long time to work out the kinks in the metadata. It's an ongoing exercise for conda and our package ecosystem, and it always will be, because our understanding of compatibility is always improving, and new breaks in compatibility are always appearing. |
Yea, I agree. For now though, doing a better job with the metadata we do have is, as you said, a big step forward. :) |
I hope you're bracing for lots of people being upset about the solver. We hear a lot of people complain about conda that it is too rigid - it won't let them install things that "should work." They like that pip will just do what they want, presumably because any damage that results is something they either don't see, or don't correlate with pip ignoring some other constraint. Being technically correct isn't always a comfortable place to be, unfortunately. Before you put the solver into place as the default mechanism, it would probably be a good idea to make a blog post or docs page explaining what the solver does and why it is really a good thing, even though it may present some new issues that people aren't used to thinking about. |
@msarahan Thanks for bringing this up! Mind putting that comment in the pip resolver rollout planning issue pypa/pip#6536 ? Thanks. |
@msarahan - PRs very welcome! |
Yep yep. @brainwane helpfully linked to the issue where, I am hoping, we can figure out what's the best way to deal with potential breakage for users that doing proper resolution can cause and how to handle the communication around it. |
@scopatz, so the answer to
is "that would be better, yes, but I can't or don't want to spend the time to do that?" or what? I didn't say you should do anything. I asked about a glaring technical flaw in your implementation, and wondered why you wouldn't try to address that. Saying PR's welcome here is dodging the question and trying to pawn off responsibility for the issue that your tool creates. You should label your wheels as "great in isolation, but exercise caution when using more than one." And by caution, I mean judicious use of backups before every operation, since pypa/pip#4625 indicates that there's currently no real way to know when damage is going to happen until it happens. |
Sorry @msarahan - not trying to dodge the question, per se. I am on vacation and was answering from my phone. I also don't want to get sucked into a big this-vs-that debate. We all have our opinions about the technical issues here. (And I agree with you on the technical merits.) The problem conda-press is solving is that it is difficult to easily create wheels on variety of platforms. What is out & implemented right now is a first step (v0.0.1). I agree that their are a lot of ways that the wheels it builds could be improved. But I am (conda-press?) isn't going to necessarily take a lot of sides in the various discussions, e.g. thin vs fat wheels, passing audit-wheel vs not. I believe that their are different use cases out there and conda-press is a place where we can come together to build some tooling around these. I think that there are a good number of people out there who just want any wheel, even if that comes with the "only safe in isolation" caveat. People use a lot of virtual environments these days, so isolation & known-work-well-together packages reasonable. It is probably a good idea to have a tracking issue for use cases. My hope with conda-press is that we can together build a tool that helps us publish better wheels that cover more use cases over time. I know that this going to require more eyes than just my own, and want to make it clear that the project is a kind and welcoming place to contribute. |
I also totally want to have these kinds of conversations about what should conda-press do, but over at conda-press. Maybe I should have said "please open an issue" instead of a PR above. Sorry about any confusion. |
For folks following this issue, there is some related discussion happening over here as well: psf/fundable-packaging-improvements#19 |
How can we get more people to publish Wheels, especially for Windows? Christoph Gohlke published Windows installers but that won't work for Wheel because he won't have rights to upload them.
Perhaps the Build farm I've wanted to do can be used here?
The text was updated successfully, but these errors were encountered: