Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

macOS: ocrd-olena-binarize fails (old bash version) #30

Closed
stweil opened this issue Feb 14, 2020 · 16 comments · Fixed by #31 or OCR-D/ocrd_im6convert#13
Closed

macOS: ocrd-olena-binarize fails (old bash version) #30

stweil opened this issue Feb 14, 2020 · 16 comments · Fixed by #31 or OCR-D/ocrd_im6convert#13

Comments

@stweil
Copy link
Collaborator

stweil commented Feb 14, 2020

macOS provides an old version of bash which obviously does not support declare -A:

$ ocrd-olena-binarize --help
/Users/venv-20200202/bin/ocrd-olena-binarize: line 34: declare: -A: invalid option
declare: usage: declare [-afFirtx] [-p] [name[=value] ...]

$ /bin/bash --version
GNU bash, version 3.2.57(1)-release (x86_64-apple-darwin19)
Copyright (C) 2007 Free Software Foundation, Inc.
@stweil stweil changed the title ocrd-olena-binarize fails on macOS because of old bash version macOS: ocrd-olena-binarize fails (old bash version) Feb 14, 2020
@kba
Copy link
Member

kba commented Feb 17, 2020

MacOS ships with a 13 year old bash? Tricky. Can't you upgrade bash?

IIRC macOS ships with zsh. Do you know what version? We could add a switch to change from bash to zsh on macOS (with a few adaptions to the code).

But I would really strongly recommend using a modern bash with all the features and security fixes etc.

@stweil
Copy link
Collaborator Author

stweil commented Feb 20, 2020

Yes, meanwhile the default shell for macOS switched to zsh:

% zsh --version
zsh 5.7.1 (x86_64-apple-darwin19.0)

Homebrew and other extensions for macOS offer recent versions of bash in /usr/local/bin/bash, but I think that OCR-D tools should not require such additional software, but work with the default which is available on macOS. I rarely notice problems caused by the old bash version.

@bertsky
Copy link
Collaborator

bertsky commented Feb 20, 2020

Homebrew and other extensions for macOS offer recent versions of bash in /usr/local/bin/bash, but I think that OCR-D tools should not require such additional software,

Why not? This could become part of a native OS installation adaptor. Like deps-ubuntu for Debian/Ubuntu, only more general (perhaps borrowing some OS detection and package management magic from autotools). And there's always a Docker version...

but work with the default which is available on macOS. I rarely notice problems caused by the old bash version.

It would be really hard to work without associative arrays (declare -A) in bash. And I also think the security implications of relying on such an old bash are unwieldy.

It would also probably be hard making shell code portable between bash and zsh (0-based vs 1-based array indexing, different array indices syntax, CWD-relative source, subtle parameter expansion differences etc). Plus we'd have to replace the shebang from #!/bin/bash to #!/bin/zsh everywhere. And then we would create an extra zsh dependency for Ubuntu, our default target platform.

This affects scripts in:

  • bashlib
  • bashlib-based processors like ocrd-olena-binarize or ocrd-im6convert
  • shell recipes in makefiles

@stweil
Copy link
Collaborator Author

stweil commented Feb 20, 2020

Why not?

There exist competing third party package managers for macOS (Fink, Homebrew, MacPorts, maybe more). In theory I can install more than one of them, but in real life that can cause conflicts, and it always costs a lot of disk space. None of those package managers will install /bin/bash, so ocrd-olena-binarize would still use the original old bash and fail. That's exactly my scenario: I run a recent bash from Homebrew, but that one is not used.

Replacing shell code by Python 3 would be the safe choice.

@stweil
Copy link
Collaborator Author

stweil commented Feb 20, 2020

Plus we'd have to replace the shebang from #!/bin/bash to #!/bin/zsh everywhere. And then we would create an extra zsh dependency for Ubuntu, our default target platform.

None of my Debian hosts has zsh installed. I don't think that zsh is the solution.

@stweil
Copy link
Collaborator Author

stweil commented Feb 20, 2020

I still think that the best solution would be using Python for everything which cannot be done in a POSIX /bin/sh, but there might be a relatively simple solution with OCR-D/ocrd_all:

We could check whether /bin/bash is available and sufficiently new. If it is not, we could check whether there is a sufficiently new bash in the default PATH. If there is one, that bash could be linked to $VIRTUAL_ENV/bin, and the shebang #!/bin/bash would have to be patched to use the linked bash.

@kba
Copy link
Member

kba commented Feb 20, 2020

How about replacing

#!/bin/bash

with

#!/usr/bin/env bash

for MacOS and doing a version check that fails early, loud and explicitly? That would shift the burden of having the right bash in PATH to the user.

OTOH I'm not as pessimistic about zsh/bash interop as @bertsky.

I have to say I'm not keen on reimplementing bashlib in Python especially if it is just about MacOS shipping broken software.

@stweil
Copy link
Collaborator Author

stweil commented Feb 20, 2020

That's a good suggestion. Only a small number of files seems to use associative arrays with declare -A, so only those would have to use #!/usr/bin/env bash (or fix SHELL for the Makefile):

bin/ocrd-import
bin/ocrd-im6convert
bin/ocrd-olena-binarize
lib/python3.7/site-packages/ocrd/lib.bash
lib64/python3.7/site-packages/ocrd/lib.bash
share/workflow-configuration/Makefile

(list produced by searching the whole generated VIRTUAL_ENV directory)

@bertsky
Copy link
Collaborator

bertsky commented Feb 20, 2020

There exist competing third party package managers for macOS (Fink, Homebrew, MacPorts, maybe more). In theory I can install more than one of them, but in real life that can cause conflicts

Then I guess the host/OS adaptor will have to provide multiple profiles (macos-fink, macos-homebrew, macos-macports etc).

and it always costs a lot of disk space

Well if that's more than the overhead for Docker, then macOS is a poor choice for a native installation.

None of those package managers will install /bin/bash

Here I agree with @kba: The traditional way to deal with this is /usr/bin/env. And if macOS does not even support that, then the host adaptor will have to go to further contortions like fakechroot etc.

Replacing shell code by Python 3 would be the safe choice.

Again I side with @kba: bashlib was meant as a last resort to easily wrap non-Python tools without knowledge of Python. Any reasonable platform we target natively should support at least a (fairly recent) bash. Especially now that we have invested a lot of work in that direction.

None of my Debian hosts has zsh installed. I don't think that zsh is the solution.

Neither do I. It serves its purpose for interactive work (and arguably well), but this is about the default scripting language.

everything which cannot be done in a POSIX /bin/sh

...this already excludes (associative) arrays and most of variable expansion we use in every other line. I don't think POSIX is a good reference point either. (Except perhaps for shell recipes in makefiles, where it is also the default.)

#!/usr/bin/env bash

and doing a version check that fails early, loud and explicitly? That would shift the burden of having the right bash in PATH to the user.

I'm all in for that! Replacing shebangs and hard-coded bash exec invocations throughout should not be difficult.

@stweil
Copy link
Collaborator Author

stweil commented Feb 20, 2020

I'd only replace /bin/bash where needed, that means for those files:

  • bin/ocrd-import
  • bin/ocrd-im6convert
  • bin/ocrd-olena-binarize
  • share/workflow-configuration/Makefile

@bertsky
Copy link
Collaborator

bertsky commented Feb 20, 2020

and doing a version check that fails early, loud and explicitly

We could compare ${BASH_VERSINFO[0]} and ${BASH_VERSINFO[1]} with our frame of reference. Perhaps even in bashlib (for those scripts that source it). Which raises the question: which version do we minimally require?

Associative arrays are around since 4.0 (but have been fixed/improved since IIRC).

4.4 introduced (among others):

i. Bash does not allow function names containing /' and =' to be exported.

(which of course has security implications)

o. There is a new ${parameter@spec} family of operators to transform the
value of `parameter'.

(I sometimes use this to escape quotations in the input.)

z. Bash now treats SIGINT received when running a non-builtin command in a
loop the way it has traditionally treated running a builtin command:
running any trap handler and breaking out of the loop.

(sounds significant to our trap handling scripts)

So, how about:

((BASH_VERSINFO[0]<4 || BASH_VERSINFO[1]<4)) && echo >&2 "bash $BASH_VERSION is too old" && exit 1

@kba
Copy link
Member

kba commented Feb 20, 2020

4.4 was released 2016-09-15. I think that is a reasonable expectation.

((BASH_VERSINFO[0]<4 || BASH_VERSINFO[1]<4)) && echo >&2 "bash $BASH_VERSION is too old" && exit 1

👍

@bertsky
Copy link
Collaborator

bertsky commented Feb 20, 2020

I'd only replace /bin/bash where needed, that means for those files:

* bin/ocrd-import

* bin/ocrd-im6convert

* bin/ocrd-olena-binarize

* share/workflow-configuration/Makefile

Done: PRs above and bertsky/workflow-configuration@8c86e00)

@bertsky
Copy link
Collaborator

bertsky commented Feb 20, 2020

lib/python3.7/site-packages/ocrd/lib.bash

Perhaps we should do something in core as well, but I don't know what exactly. As @stweil pointed out, we don't have any shebangs in the library itself. There is, however:

  • a shebang in tests/assets/column-samples/build.sh
  • SHELL = /bin/bash in Makefile

Apart from that, for lib.bash itself, only the version check comes to mind.

@stweil
Copy link
Collaborator Author

stweil commented Feb 20, 2020

@kba, that's not a good check. It fails with a very recent bash 5.0. Better:

((BASH_VERSINFO<4 || BASH_VERSINFO<5 && BASH_VERSINFO[1]<4)) && echo >&2 "bash $BASH_VERSION is too old" && exit 1
# or
((BASH_VERSINFO<4 || BASH_VERSINFO==4 && BASH_VERSINFO[1]<4)) && echo >&2 "bash $BASH_VERSION is too old" && exit 1

@kba
Copy link
Member

kba commented Feb 21, 2020

Apart from that, for lib.bash itself, only the version check comes to mind.

OCR-D/core#445

kba added a commit to OCR-D/core that referenced this issue Jun 7, 2020
kba added a commit to OCR-D/core that referenced this issue Jun 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants