T'aan

This project intends to easily recognize characters in old writings and automatically translate them to any other language.

Usage

python img2txt.py <input_file> <output_file>

Requirements

Tesseract by Google is used by T'aan to perform the OCR. Download it or clone it from its repository in GitHub. Follow the installing instructions for your corresponding OS. The latest version already includes models to recognize the most common writing systems and languages.
The module pdf2image, a wrapper of poppler, is required to transform a given PDF into images and then use Tesseract.

References

Online

OCRchie: Modular Optical Character Recognition Software. An OCR software package created by Katherine Marsden, while a Computer Science graduate from the Univerity of Berkeley in 1996.
Character Recognition by Feature Point Extraction. An old but entertaining approach to detect characters based on feature points created by Eric W. Brown of the Northeastern University in 1992.
Optical Character Recognition. A short description of a simple OCR by Vijay Rajan Nadar using pixel counting and relative location.
Automatically Detect and Recognize Text in Natural Images. An example using MATLAB to detect regions containing text in an image.
Neural Network OCR. Some ideas about optical character recognition using neural networks presented in an article by Andrew Kirillow. 2005.
CNNs for handwritten and machine-printed character recognition. An example of Convolutional Neural Networks designed to recognize visual patterns directly from pixel images with minimal preprocessing.
Coding Bilinear Interpolation. A short explanation and the code for a Bilinear Interpolation by The Supercomputing Blog.
Fast C++: Bilinear Pixel Interpolation using SSE. Explanation and code for a fast Bilinear interpolation implemented in C++.
OpenCV Tutorials. Set of basic OpenCV tutorials.

Books and Articles

M. Cheriet, N. Kharma, C. Liu and C. Suen. Character Recognition Systems: A Guide for Students and Practitioners. John Wiley & Sons. 2007.
H. Bunke and P.S.P. Wang. Handbook of Character Recognition and Document Image Analysis. World Scientific Pub Co Inc. 1997.
S.V. Rice, G. Nagy and T.A. Nartker. Optical Character Recognition: An Illustrated Guide to the Frontier (The Springer International Series in Engineering and Computer Science). Springer. 1999.
C. Steger, M. Ulrich and C. Wiedemann. Machine Vision: Algorithms and Applications. Wiley-VCH. 2007.
C.M. Bishop. Neural Networks for Pattern Recognition (Advanced Texts in Econometrics. Clarendon Press. 1st Edition. 1996.
D.L. Baggio, S. Emami, D.M. Escrivá, K. Ievgen, N. Mahmood, J. Saragih, R. Shilkrot. Mastering OpenCV with Practical Computer Vision Projects. Packt Publishing. Pages 176-187. 2012.
J. Gllavata, R. Ewerth and B. Freisleben. A Robust Algorithm for Text Detection in Images. Proceedings of the 3rd International Symposium on Image and Signal Processing and Analysis 2003. Volume 2. Pages 611-616.
W. Huang., Z. Lin, J. Yang, and J. Wang. Text Localization in Natural Images using Stroke Feature Transform and Text Covariance Descriptors_. IEEE International Conference on Computer Vision (ICCV), 2013.
L. Kang, Y. Li and D. Doermann. Orientation Robust Text Line Detection in Natural Images. CVPR2014.
F. Mohammad, J. Anarase, M. Shingote and P. Ghanwat. Optical Character Recognition Implementation Using Pattern Matching. International Journal of Computer Science and Information Technologies. Volume 5 (2). pages 2088-2090. 2014.
O.D Trier, A.K. Jain and T. Taxt. Feature Extraction Methods for Character Recognition - A Survey. Patter Recognition. Volume 29. Issue 4. Pages 641-662. April 1996.
R. Lienhart and A. Wernicke. Localizing and Segmenting Text in Images and Videos. IEEE Transactions on Circuits and Systems for Video Technology. Volume 12. Issue 4. April 2002.
Y. Amit and D. Geman Shape Quantization and Recognition with Randomized Trees. Neural Computation. Volume 9. Pages 1545-1588. 1997.

What is T'aan?

T'aan means language in Mayan. In pre-hispanic Mesoamerica existed dozens of languages and hundreds of dialects, which difficulted the fast integration of the several nations. In order to ease their communication, specialized translators and interpreters were established. These interpreters could read the several scripts and codices and, thus, connect their communities.

The Mayans of Yucatán used the word T'aan to name language, conversation, to read aloud, word or voice. In short, everything that had to do with communication in any language belongs to the space of T'aan.

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
.gitignore		.gitignore
README.md		README.md
TextPage001.txt		TextPage001.txt
Tinosos_-_Fernandez_de_Lizardi.pdf		Tinosos_-_Fernandez_de_Lizardi.pdf
findText.m		findText.m
findText.py		findText.py
img2txt.py		img2txt.py
outfile.png		outfile.png
page001.png		page001.png
page001.txt		page001.txt
pdf2txt.py		pdf2txt.py
truth_page001.txt		truth_page001.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

T'aan

Usage

Requirements

References

Online

Books and Articles

What is T'aan?

About

Releases

Packages

Languages

Mayitzin/Taan

Folders and files

Latest commit

History

Repository files navigation

T'aan

Usage

Requirements

References

Online

Books and Articles

What is T'aan?

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages