This project is mainly used to demonstrate how I think about using OCR on a HKID.
- I am not perfect, neither is my code.
- Using this code at your own risk, there is NO absolute guarantee that this script can scan everything you send it to.
- In other words, it can only be used in a reasonable way. For example, if you pass in an image with mainly the HKID and a little extra space around the border, that's OK. However, if your HKID is in the corner of the whole image while 80% (or more!) of the image contains other stuff, this script will not work.
- The key to successful text recognition is clear image. If your image is NOT clear, like the text is not easily and reasonably identified/blurred, this script may not give accurate result.
- The Google Key has been deactivated. Please replace it with your own KEY.
- Yes you can use Facebook OCR, you are not limited to that. You can also use tesseract, which I also highly recommended. Actually if you look at my code, I include (but commented) pytesseract.
python hkid.py -i <image_path> [-d/--debug]
e.g. python hkid.py -i hkid_sample-no-sample.jpg
It will return a JSON string, like below.
{'result': ['李智能', 'LEE, Chi Nan', '2621 2535 5174', '出生日期Date of Birth', '女F', '01-01-1968', 'k AZ', '簽發日期Date of Issue', '(01-79)']}