-
Notifications
You must be signed in to change notification settings - Fork 948
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Makefile instead of the build_dependencies.sh script #210
Conversation
ws233
commented
Aug 11, 2015
- libtiff update
- No more image libraries in lib folder. Instead those files are taken directly from the libtiff-ios folder, where they are built.
- No more leptonica & tesseract libraries in the lib folder of the repository. Instead those binaries are built automatically directly when run button in xCode is pressed
- No more build_dependencies script. Makefile instead to build Tesseract and all dependent libraries.
- An extra build phase has been added to the Tesseract OCR iOS framework. This phase simply runs the Makefile and build all the dependencies if necessary.
- Readme files have been updated to mention above changes.
@kevincon, I've removed build script with the Makefile.
Pls, try this and let me know what do you think about the whole idea. |
@kevincon , the build has failed cause liftiff-ios is configured to be built with iOS SDK 8.4. But in our tests we use 7.1 and 8.1 versions. I'll fork libtiff-ios and modify it to support any SDK version very soon, but so far you may try the PR with SDK 8.4. |
You may also try common clean targets as follows:
|
291f572
to
f7dd0cb
Compare
I like the idea of converting the build script to a Makefile for the reasons you mentioned, but I think you need to update the podspec as well since the libs wouldn't be part of the repo anymore. But before you do that, wouldn't making the Makefile run as part of the TesseractOCR target cause users of the CocoaPod to have to build the libs every time they install the pod in a new project (which takes a long time, as you mention in the README)? If that's the case, I don't think we should make this change because I think that's unreasonable for every user to have to wait that long when they add the Pod to a project. Instead, would it be possible to add a brand new target to the project that represents building the libs using the Makefile so that it's easier for users to do that if they want to (and we still get the other Makefile benefits), but then also still have the libs included in the repo so that CocoaPod users don't have to build them when they add the Pod to a project? And maybe the existing TesseractOCR target would be smart enough to know that, since the libs are included in the repo, they don't have to be rebuilt if the user doesn't need them to be rebuilt? |
What update do you mean? Sorry, I don't understand what is necessary here.
Yes, the lib is completely compiled on every installing the pod in a new project. But how often do you install new projects? That's a tradeoff between first compile time and repo size. So far the repo size is already about 350Mb and it will significantly grow further, since in the xcode 7 bitcode has appeared. It significantly increases the size of every architecture slice, almost doubles, if I'm not mistaking. Anyway, In my humble opinion 350Mb for the wrapper is quiet big, we do not have so much code in our repo ^.^ Is it possible to make a survey in hithub to ask our users about this tradeoff? What do they choose? Do they think the biuld time is preferrable or the repo size?
I thought about it. I may include the precompiled libs binaries back to the repo. In such case we even do not need the separate target. |
@kevincon, just an example. In the bitcode branch I've uploaded 58.61 MB of just a
But besides only |
@kevincon, how do you like an idea of building the dependencies right in |
The podspec file (TesseractOCRiOS.podspec in the root of the repo) is what CocoaPods uses to determine how to install the library in a user's project when they run
So if we were to merge this PR right now,
I don't think the important point is how often users install new projects; I think the important point is the user's perception about the library based on their experience installing it.
I timed how long it takes to run
I also timed how long it takes to run
1808.37 seconds (30 minutes and 8 seconds) is ridiculously long for a user to wait to start using this library, regardless of whether that time is part of To be clear, I will not merge this pull request if it results in imposing that kind of time on users when they install this library.
It's an easy choice from my point of view to choose having a larger repo over the long compilation time that this PR introduces for users. As yet another data point, I timed how long it takes to clone this repo (from HEAD in master):
I'm personally happy as long as that time stays under a minute, and I don't think users will complain that they're about to run out of hard drive space as part of using this library.
I can't think of a great way to do this given our resources right now. We could start a mailing list for users interested in providing their opinions on these kinds of questions, but otherwise I think the best thing we can do is to create a new issue on the repo to ask users what they would prefer.
I think the best thing to do is to put the pre-compiled lib binaries back in the repo so that users don't have to build them when they install the library (or build it for the first time).
You're right that GitHub recommends that individual files be less than 50 MB, but this is an arbitrary recommendation on their part, so I'm not concerned about it. In reality, the only true limit we face is the 1 GB limit that GitHub imposes on public repos. I agree that we should address the issue of a growing repo size, but not at the expense of our users' experience installing the library.
I propose that we simply purge the repo of older versions of the libs anytime we update them. One trade-off I can think of is that if users check out a specific past commit of the repo (say commit X) then the libs they see at that commit will be the libs from HEAD as opposed to the actual libs that were part of the repo at the time of commit X. However, I think that this issue is somewhat mitigated by the fact that users can download a zip file from the Releases page that has a preserved copy of the libs from the time of each release. There is a tool called BFG Repo-Cleaner that we can use to do this: https://rtyley.github.io/bfg-repo-cleaner. What do you think? |
ok. That makes sence. I'll add libs back. |
2. No more image libraries in lib folder. Instead those files are taken directly from the libtiff-ios folder, where they are built. 3. No more leptonica & tesseract libraries in the lib folder of the repository. Instead those binaries are built automatically directly when run button in xCode is pressed 4. No more build_dependencies script. Makefile instead to build Tesseract and all dependent libraries. 5. An extra build phase has been added to the Tesseract OCR iOS framework. This phase simply runs the Makefile and build all the dependencies if necessary. 6. Readme files have been updated to mention above changes.
2. Removed internal tesseract and leptonica headers from the project.
ok. I've added back the libraries binaries. |
2. Image fat libs are copied in ./libs and their headers in ./include
I've removed prebuilt |
Thanks for putting the libs back, but it looks like you put back a folder for leptonica-1.71 while the Makefile is hard-coded to look for leptonica-1.72. This results in a bug with the Makefile; if you try to run
Can you also fix the two warnings on the TesseractOCR target? The way we resolved those before was by manually editing the affected Tesseract source files to add (int) casts (see #120 when we talked about it before). Also I think it's okay for the libs to be built as part of the Travis build. |
…ependent libs have been downloaded yet.
Ok. I've fixed the issue. |
No, please fix those files in include/ now as part of this PR. Upstream tesseract has 14 open PRs, so I doubt they review them often (if at all?). In the meantime, we shouldn't expose our users to any compilation warnings. Related, I think we should specify specific tags for the submodules so that we have some indication that they're stable (while preserving the ability to easily upgrade them in the future). We can do this by modifying the .gitmodules file to point to the tag of a stable release. I created a new issue to do this (#214) and to update the libs accordingly (e.g. I noticed that upstream Tesseract finally released version 3.04 last month, so we should make the tesseract-ocr submodule point to the 3.04 release tag). Let me know if you'd like to take care of this issue; otherwise I'll do it when I have some time.
Thanks for fixing it, it works for me now. Once you fix the compilation warnings, I'll merge this PR. |
Yes, they do review PRs. But our one is low priority, I guess.
I was going to do this as soon as we commit this PR.
I would appreciate if you help me with this and take care about the upgrade. But be aware that Tesseract 3.04 has significantly changed the function to produce PDF output. So let me know if you need some help with upgrading our wrapper for this function or rewritting tests for it. Thx! |
Also, |
d2ab8eb
to
a800676
Compare
That's ready to be merged as well. I've also modified Travis CI build matrix, so now every target is build with rebuilding dependent libraries and without it. That helps us ensure that 1. the binaries uploaded into the repo are correct and 2. the Makefile to build dependent libraries is not broken. |
…S environment variable only. 2. Makefile updated to support ARCHS environment variable only for building specified architectures only. 3. README_howto_compile_libraries.md updated to explain above changes. 4. Prebuilt script changed to build only active architectures of dependent libraries in Travis CI build environment.
Add Makefile instead of the build_dependencies.sh script