Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ERROR: Failed to continue from: data/eng/eng.lstm #18

Open
WissamAntoun opened this issue Jun 9, 2022 · 0 comments
Open

ERROR: Failed to continue from: data/eng/eng.lstm #18

WissamAntoun opened this issue Jun 9, 2022 · 0 comments

Comments

@WissamAntoun
Copy link

Hey @Shreeshrii, I'm using your Makefile in a docker container to train tesseract 5 of an English font, just to see if my setup works.

I've been encountering this issue for a while now:

Loaded file data/eng/eng.lstm, unpacking...
Failed to continue from: data/eng/eng.lstm

I have tried to use traineddata from tessdata_best and tessdata , same exact error!!

this is the output of combine_tessdata -e data/eng.traineddata data/eng/eng.lstm with tessdata_best

Extracting tessdata components from data/eng.traineddata
Wrote data/eng/eng.lstm
Version:4.00.00alpha:eng:synth20170629:[1,36,0,1Ct3,3,16Mp3,3Lfys64Lfx96Lrx96Lfx512O1c1]
17:lstm:size=11689099, offset=192
18:lstm-punc-dawg:size=4322, offset=11689291
19:lstm-word-dawg:size=3694794, offset=11693613
20:lstm-number-dawg:size=4738, offset=15388407
21:lstm-unicharset:size=6360, offset=15393145
22:lstm-recoder:size=1012, offset=15399505
23:version:size=80, offset=15400517

The command that fails is the following:

lstmtraining \
  --continue_from data/eng/eng.lstm --old_traineddata data//eng.traineddata \
  --traineddata data/engDejavu/engDejavu-proto.traineddata \
  --train_listfile data/engDejavu/list.train \
  --eval_listfile data/engDejavu/list.eval \
  --max_iterations 100 \
  --debug_interval -1 \
  --learning_rate 0.0001 \
  --target_error_rate 0.01 \
  --model_output data/engDejavu/checkpoints/engDejavu

Dockerfile:

# Set docker image
FROM ubuntu:18.04

# Skip the configuration part
ENV DEBIAN_FRONTEND noninteractive

# Update and install depedencies
RUN apt-get update && \
    apt-get install -y wget unzip bc vim python3-pip libleptonica-dev git htop

# Packages to complie Tesseract
RUN apt-get install -y --reinstall make && \
    apt-get install -y g++ autoconf automake libtool pkg-config libpng-dev libjpeg8-dev libtiff5-dev libicu-dev \
    libpango1.0-dev libcairo2-dev autoconf-archive rename ttf-mscorefonts-installer && fc-cache -f

# Set working directory
WORKDIR /app

RUN mkdir /app/src && cd /app/src

# # Set the locale
RUN apt-get install -y locales && locale-gen en_GB.UTF-8
ENV LC_ALL=en_GB.UTF-8
ENV LANG=en_GB.UTF-8
ENV LANGUAGE=en_GB.UTF-8

# # Copy requirements into the container at /app
COPY requirements.txt ./

RUN pip3 install -r requirements.txt

# # Complie Tesseract with training options (also feel free to update Tesseract versions and such!)
RUN mkdir src && cd /app/src && \
    git clone https://github.com/tesseract-ocr/tesseract.git && \
    cd /app/src/tesseract && \
    ./autogen.sh && ./configure --disable-graphics && make && make ins all && ldconfig && \
    make training && make training-install

Any help or guidance is appreciated! thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant