Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent page-size on arm64 #735

Closed
ianw opened this issue Aug 19, 2020 · 19 comments
Closed

Inconsistent page-size on arm64 #735

ianw opened this issue Aug 19, 2020 · 19 comments

Comments

@ianw
Copy link

ianw commented Aug 19, 2020

Hello,

tl;dr Debuntu has a 4k page-size and CentOS 7/8 has a 64k page size, so aarch64 manylinux wheels built on the former fail on the latter with alignment issues.

There seem to have been a number of places that have noticed this

and a few other issues pop up too.

Now I understand that to the letter of PEP599 that the build system should be CentOS 7; but in this case the page-size relates to the host system, so running the centos7 manylinux2014_aarch64 container is going to give you a different page-size depending on if you run it on Debuntu or CentOS (or of course, some other distro).

This is compounded by (AFAIK) travisCI only providing Ubuntu images (https://docs.travis-ci.com/user/reference/overview/ ?) making it difficult for people to get a 64k environment.

Individual builds can work around this with a linker flag to set 64k-alignment (e.g.pyca/bcrypt@8d35d8a) . But that has to be ported into each build script and most people will never realise.

I wonder what solutions people might have for this? Some I can think of

  • raise a bug against devtoolset-9 to update the default alignment to 64k
  • rebuilding devtoolset-9 with such a change and installing in the builder container
  • some sort of hacky wrapper script around cc/ld to pass flags/linker-script/??
  • some way to fail with a sensible error message; setuptools hack or something?
  • redefine wheels to also publish the page-size they are built for (sounds painful, not backwards compatible and afaik, the only benefit is slightly smaller binaries in the 4k case ... if this could even be done with the plumbing required in pip etc).

It would be good to come up with something by default, because doing the obvious thing of grabbing the manylinux2014_aarch64 container and building wheels on debuntu (i.e. travis practically) creates wheels that don't actually work on the reference platform; so that seems suboptimal.

@AGSaidi
Copy link

AGSaidi commented Aug 19, 2020

At least for gold this was fixed in the linker to default to 64KB and it looks likes ld has defaulted to max-page-size=64KB from when the code was originally added.

Furthermore other bugs similar to this and this reference the gold bug but otherwise don't seem to have a problem with 64kB pages.

Feels like we're missing something here

@ianw
Copy link
Author

ianw commented Aug 19, 2020

@AGSaidi yes I tend to agree having played a bit more now.

I have a native Ubuntu Bionic arm64 host here, and it seems to create binaries 64k aligned both natively and also when building inside a manylinux2014_aarch container. I'm basing this on

$ gcc -o test test.c
$ readelf --segments ./test

Elf file type is DYN (Shared object file)
Entry point 0x620
There are 8 program headers, starting at offset 64

Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
  LOAD           0x0000000000000000 0x0000000000000000 0x0000000000000000
                 0x00000000000007f4 0x00000000000007f4  R E    0x10000  <<<<<< THIS
  LOAD           0x0000000000000d80 0x0000000000010d80 0x0000000000010d80
                 0x0000000000000290 0x0000000000000298  RW     0x10000
  ...

However, clearly people are managing to get incorrectly built binaries out, somehow.

It looks like bad ffi wheels (https://foss.heptapod.net/pypy/cffi/-/issues/463) might have been generated in https://github.com/python-cffi/cffi-travis-wheel via https://github.com/matthew-brett/multibuild ? The bcrypt build that also works around it https://github.com/pyca/bcrypt/blob/master/.github/workflows/wheel-builder.yml#L101 appears to run the container pretty plainly under qemu.

@geoffreyblake
Copy link
Contributor

@ianw, @AGSaidi

I'm looking into this and a smallish project that exposes the issue if the cffi project. If we just do the following:

hg clone https://foss.heptapod.net/pypy/cffi
cd cffi
sudo docker run -i -t -e PLAT=aarch64 -v `pwd`:/io quay.io/pypa/manylinux2014_aarch64

#In docker
cd /io/
/opt/python/cp39-cp39/bin/python3.9 setup.py bdist_wheel
for whl in /io/dist/*_aarch64.whl; do   auditwheel repair $whl --plat "manylinux2014_aarch64" -w /io/dist/; done
cd dist
unzip cffi-1.14.2-cp39-cp39-manylinux2014_aarch64.whl
readelf -l _cffi_backend.cpython-39-aarch64-linux-gnu.so

Elf file type is DYN (Shared object file)
Entry point 0x7c00
There are 10 program headers, starting at offset 64

Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
  LOAD           0x0000000000000000 0x0000000000000000 0x0000000000000000
                 0x00000000000301b0 0x00000000000301b0  R E    0x10000
  GNU_STACK      0x0000000000000000 0x0000000000000000 0x0000000000000000
                 0x0000000000000000 0x0000000000000000  RW     0x10
  NOTE           0x0000000000000200 0x0000000000000200 0x0000000000000200
                 0x0000000000000024 0x0000000000000024  R      0x4
  GNU_EH_FRAME   0x000000000002af00 0x000000000002af00 0x000000000002af00
                 0x0000000000000864 0x0000000000000864  R      0x4
  LOAD           0x000000000003f4c0 0x000000000004f4c0 0x000000000004f4c0
                 0x0000000000003928 0x0000000000006110  RW     0x10000
  TLS            0x000000000003f4c0 0x000000000004f4c0 0x000000000004f4c0
                 0x0000000000000000 0x0000000000000004  R      0x4
  GNU_RELRO      0x000000000003f4c0 0x000000000004f4c0 0x000000000004f4c0
                 0x0000000000000b40 0x0000000000000b40  R      0x1
  LOAD           0x000000000010c000 0x0000000000056000 0x0000000000056000
                 0x0000000000000fc8 0x0000000000000fc8  RW     0x1000 <--- 4kB!
  DYNAMIC        0x000000000010d000 0x0000000000057000 0x0000000000057000
                 0x0000000000000210 0x0000000000000210  RW     0x8
  LOAD           0x000000000010d000 0x0000000000057000 0x0000000000057000
                 0x0000000000001178 0x0000000000001178  RW     0x1000 <--- 4kB!

 Section to Segment mapping:
  Segment Sections...
   00     .dynsym .gnu.version .gnu.version_r .rela.dyn .rela.plt .init .plt .text .fini .rodata .eh_frame_hdr .eh_frame
   01
   02
   03     .eh_frame_hdr
   04     .init_array .fini_array .data.rel.ro .got .got.plt .data .bss
   05     .tbss
   06     .init_array .fini_array .data.rel.ro .got
   07     .gnu.hash .note.gnu.build-id
   08     .dynamic
   09     .dynamic .dynstr

We see the last 2 LOAD sections carry the 4kB alignment, all they contain is the .gnu.hash and .dynstr structures. But interestingly enough if we pull out what the bdist_wheel is doing to compile the c extension we get something different:

gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -fPIC -DUSE__THREAD -DHAVE_SYNC_SYNCHRONIZE -I/opt/python/cp39-cp39/include/python3.9 -c _cffi_backend.c -o _cffi_backend.o
gcc -pthread -shared _cffi_backend.o -lffi -o _cffi_backend.cpython-39-aarch64-linux-gnu.so

readelf -l _cffi_backend.cpython-39-aarch64-linux-gnu.so

Elf file type is DYN (Shared object file)
Entry point 0x7c00
There are 8 program headers, starting at offset 64

Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
  LOAD           0x0000000000000000 0x0000000000000000 0x0000000000000000
                 0x00000000000301b0 0x00000000000301b0  R E    0x10000
  LOAD           0x000000000003f4c0 0x000000000004f4c0 0x000000000004f4c0
                 0x0000000000003928 0x0000000000006110  RW     0x10000
  DYNAMIC        0x000000000003fc50 0x000000000004fc50 0x000000000004fc50
                 0x0000000000000200 0x0000000000000200  RW     0x8
  NOTE           0x0000000000000200 0x0000000000000200 0x0000000000000200
                 0x0000000000000024 0x0000000000000024  R      0x4
  TLS            0x000000000003f4c0 0x000000000004f4c0 0x000000000004f4c0
                 0x0000000000000000 0x0000000000000004  R      0x4
  GNU_EH_FRAME   0x000000000002af00 0x000000000002af00 0x000000000002af00
                 0x0000000000000864 0x0000000000000864  R      0x4
  GNU_STACK      0x0000000000000000 0x0000000000000000 0x0000000000000000
                 0x0000000000000000 0x0000000000000000  RW     0x10
  GNU_RELRO      0x000000000003f4c0 0x000000000004f4c0 0x000000000004f4c0
                 0x0000000000000b40 0x0000000000000b40  R      0x1

 Section to Segment mapping:
  Segment Sections...
   00     .note.gnu.build-id .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_r .rela.dyn .rela.plt .init .plt .text .fini .rodata .eh_frame_hdr .eh_frame
   01     .init_array .fini_array .data.rel.ro .dynamic .got .got.plt .data .bss
   02     .dynamic
   03     .note.gnu.build-id
   04     .tbss
   05     .eh_frame_hdr
   06
   07     .init_array .fini_array .data.rel.ro .dynamic .got

.gnu.hash and .dynstr are in the first LOAD entry, at 64kB alignment. Something else is happening to relocate these entries, maybe this error is in setuptools?

@geoffreyblake
Copy link
Contributor

Hmmm, no not setuptools, it appears the auditwheel step is where the magic happens on manylinux2014. That's the next place to look.

@geoffreyblake
Copy link
Contributor

It looks like the version of patchelf used in the current version of manylinux2014_aarch64 is slightly too old, it is the culprit doing the rewriting on Arm64 and misaligning the LOAD sections in the ELF library. If I hacked auditwheel/patcher.py to force the --page-size on the patchelf calls to 65536 then I get this binary layout:


Elf file type is DYN (Shared object file)
Entry point 0x7c00
There are 10 program headers, starting at offset 64

Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
  LOAD           0x0000000000000000 0x0000000000000000 0x0000000000000000
                 0x00000000000301b0 0x00000000000301b0  R E    0x10000
  GNU_STACK      0x0000000000000000 0x0000000000000000 0x0000000000000000
                 0x0000000000000000 0x0000000000000000  RW     0x10
  NOTE           0x0000000000000200 0x0000000000000200 0x0000000000000200
                 0x0000000000000024 0x0000000000000024  R      0x4
  GNU_EH_FRAME   0x000000000002af00 0x000000000002af00 0x000000000002af00
                 0x0000000000000864 0x0000000000000864  R      0x4
  LOAD           0x000000000003f4c0 0x000000000004f4c0 0x000000000004f4c0
                 0x0000000000003928 0x0000000000006110  RW     0x10000
  TLS            0x000000000003f4c0 0x000000000004f4c0 0x000000000004f4c0
                 0x0000000000000000 0x0000000000000004  R      0x4
  GNU_RELRO      0x000000000003f4c0 0x000000000004f4c0 0x000000000004f4c0
                 0x0000000000000b40 0x0000000000000b40  R      0x1
  LOAD           0x0000000000110000 0x0000000000060000 0x0000000000060000
                 0x0000000000000fc8 0x0000000000000fc8  RW     0x10000 <---- yay, 64kB not 4kB!
  DYNAMIC        0x0000000000120000 0x0000000000070000 0x0000000000070000
                 0x0000000000000210 0x0000000000000210  RW     0x8
  LOAD           0x0000000000120000 0x0000000000070000 0x0000000000070000
                 0x0000000000001178 0x0000000000001178  RW     0x10000

 Section to Segment mapping:
  Segment Sections...
   00     .dynsym .gnu.version .gnu.version_r .rela.dyn .rela.plt .init .plt .text .fini .rodata .eh_frame_hdr .eh_frame
   01
   02
   03     .eh_frame_hdr
   04     .init_array .fini_array .data.rel.ro .got .got.plt .data .bss
   05     .tbss
   06     .init_array .fini_array .data.rel.ro .got
   07     .gnu.hash .note.gnu.build-id
   08     .dynamic
   09     .dynamic .dynstr

@geoffreyblake
Copy link
Contributor

So the issue is coming from Manylinux2014_aarch64 and its version of patchelf being slightly too old.

NixOS/patchelf@0470d69
This patch was submitted June 20th, and fixes the issues with AArch64. ManyLinux2014 is pulling the 0.11 tag release which is 3 weeks older. Updating ManyLinux2014 should fix this.

@mattip
Copy link
Contributor

mattip commented Aug 19, 2020

This is a duplicate of issue pypa/auditwheel#251. From the original discussion on the numpy issue, there is a claim that using patchelf to solve this is the wrong tool, and a better idea would be to build python itself with an appropriate flag, those flags would then be passed on to any c-extension module build with distutils.

@mattip
Copy link
Contributor

mattip commented Aug 19, 2020

Ahh, I misunderstood. the compile is correct, but then auditwheel calls patchelf which breaks the shared objects?

@AGSaidi
Copy link

AGSaidi commented Aug 19, 2020

Ahh, I misunderstood. the compile is correct, but then auditwheel calls patchelf which breaks the shared objects?

Correct. The compile is fine.

@mattip
Copy link
Contributor

mattip commented Aug 19, 2020

PR #628 updated patchelf to 0.11. I don't see a subsequent release on the repo

@AGSaidi
Copy link

AGSaidi commented Aug 19, 2020

The fix @geoffreyblake referenced above hasn't made it into a release yet. One option would be to go back to carrying a patch for this

@mattip
Copy link
Contributor

mattip commented Aug 19, 2020

I pinged upstream, but yes, a PR to add the patch might fix the problem until they release.

@njsmith
Copy link
Member

njsmith commented Aug 19, 2020

@geoffreyblake great analysis!

@geoffreyblake
Copy link
Contributor

@mattip carrying a patch, or just checking out a known good hash should solve this unless they can cut a 0.12 release quickly. Building a new container to try out and see if that fixes things.

@geoffreyblake
Copy link
Contributor

Pushed a PR for review that solves this issue by carrying a patch for patchelf: #739

@mattip
Copy link
Contributor

mattip commented Aug 28, 2020

patchelf released 0.12 a little while ago

@radarhere
Copy link
Contributor

I've created #740 to upgrade patchelf.

@mattip
Copy link
Contributor

mattip commented Aug 29, 2020

I think #741 closes this. Thanks to all who put the effort into pinpointing the problem and solving it.

@mayeut
Copy link
Member

mayeut commented Sep 26, 2020

Closing per last comments.
Thanks to all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants