Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extract plate number #1

Closed
robmarkcole opened this issue Jan 8, 2021 · 6 comments
Closed

Extract plate number #1

robmarkcole opened this issue Jan 8, 2021 · 6 comments
Labels
documentation Improvements or additions to documentation

Comments

@robmarkcole
Copy link

Hi, great work on the model! The next step is to extract the plate number. I tried AWS textract, which works well:

image

I also tried tesseract OCR which is open source and can run locally, but it failed to get the plate. You can try using https://github.com/robmarkcole/text-insights-app

Possibly something as simple as OpenCV is worth a try too

@odd86
Copy link
Owner

odd86 commented Jan 8, 2021

Hello, let me give you some nice python code ;)

You should also check the font of your licence plates.
Here in Norway they use Myraid Pro, so i use the Language Independent Training pack.
When you have the font download the rigth tesseract model here: Tesseract Language Packs

Take your Language pack and copy it to C:\Program Files\Tesseract-OCR\tessdata

First i crop out the cordinates of the licenceplate, then i pass that image to tesseract.

    from io import BytesIO
    from PIL import Image

    def _get_cropped_image(coordinates, image, image_name=""):
        image = Image.open(BytesIO(image))
        size = image.size
        y_max = coordinates["y_max"]
        y_min = coordinates["y_min"]
        x_max = coordinates["x_max"]
        x_min = coordinates["x_min"]
        cropped = image.crop((x_min, y_min, x_max, y_max))
        byte_image = BytesIO()
        cropped.save(byte_image, "PNG")
        cropped.save(f"{image_name}.png", )
    import re

    import cv2
    import pytesseract

    pytesseract.pytesseract.tesseract_cmd = "C:\\Program Files\\Tesseract-OCR\\tesseract.exe" # Your path to tesseract.exe

    def _read_licence_plate():
        img = cv2.imread("licence-plate.png", 0)
        gray = cv2.resize(img, None, fx=3, fy=3, interpolation=cv2.INTER_CUBIC)
        blur = cv2.GaussianBlur(gray, (5, 5), 0)
        gray = cv2.medianBlur(gray, 3)
        ret, thresh = cv2.threshold(blur, 0, 255, cv2.THRESH_OTSU | cv2.THRESH_BINARY_INV)
        rect_kern = cv2.getStructuringElement(cv2.MORPH_RECT, (3, 3))
        dilation = cv2.dilate(thresh, rect_kern, iterations=1)
        contours, hierarchy = cv2.findContours(dilation, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
        sorted_contours = sorted(contours, key=lambda ctr: cv2.boundingRect(ctr)[0])
        im2 = gray.copy()
        plate_num = ""
        i = 0
        for cnt in sorted_contours:
            try:
                x, y, w, h = cv2.boundingRect(cnt)
                height, width = im2.shape
                if height / float(h) > 7:
                    continue

                ratio = h / float(w)
                if ratio < 1.2 or ratio > 3.77:
                    continue

                area = h * w
                if width / float(w) > 22:
                    continue
                    
                if area < 70:
                    continue
                    
                roi1 = thresh[y - 5:y + h + 5, x - 5:x + w + 5]
                roi2 = cv2.bitwise_not(roi1)
                roi3 = cv2.medianBlur(roi2, 5)
                
                whitelist = "ABCDEFGHIJKLMNOPQRSTUVWXYZ" if i < 3 else "0123456789" # <- Make a smart whitelist
                text = pytesseract.image_to_string(
                    roi3,
                    lang="ENG6", # Insert the name of your language pack
                    config=f'-c tessedit_char_whitelist={whitelist} --psm 10 --oem 3'
                )
                plate_num += text.replace('\n', '').replace('', '')
                i += 1
            except Exception as e:
                print(e)
                pass
        is_reg = re.findall(r'[a-zA-Z]{2}[0-9]{5}', plate_num) # <- Make the regex pick out licenceplate numbers based on your country
        if len(is_reg) > 0:
            print(is_reg[0])
            return is_reg[0]
        return False

@odd86 odd86 added the documentation Improvements or additions to documentation label Jan 8, 2021
@robmarkcole
Copy link
Author

@odd86 I spent a bit of time experimenting with opencv and tesseract and concluded it will be a solution that requires a lot of fine tuning, e.g. of blur parameter etc. I am interested in a neural net approach which is robust and doesn't require fine tuning, are you aware of any?

@odd86
Copy link
Owner

odd86 commented Jan 10, 2021

Yeah, i ended up making my own nuralnet traind model for Tesseract 4.
Download the model

This model is made with 1 million iterations ower a dataset of 800 images of norwegian licence plates.
If you make a dataset for other plates it would be fun to include them!

I just made a crawler that got all images of cars for sale on our Norwegian main car sale site and checked them for plates.
After the plates where cropped out i ran them true some of the previous models of tesseract and last i looked ower the dataset to fix errors.

So now i just send in the image to tesseract and dont need to do all the tweaking

@robmarkcole
Copy link
Author

Very nice! Re custom model for Tesseract 4 is there a nice article I can follow to reproduce with UK number plates? It would be good to document this somewhere so people can make dedicated models for their own country

@Themrpie
Copy link

Hello! I'm thinking about building a licence plate detector to use on deepstack so it was nice to find your model.
But I'm also needing to be able to read the licence plate, can you comment on the Tessaract 4 accurracy?
I'm not sure if to follow that path or to train each letter as a category, which would be a lot more work but I'm looking for a reliable solution.
Thanks

@odd86
Copy link
Owner

odd86 commented Feb 1, 2021

Hello @Themrpie.
Read my answer about that here:
https://forum.deepstack.cc/t/licence-plate-reader/687/10

@odd86 odd86 closed this as completed Aug 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

3 participants