This repository is the official implementation for the following paper:
Chuhui Xue, Wenqing Zhang, Yu Hao, Shijian Lu, Philip Torr, Song Bai, ECCV 2022 (Oral)
Part of the code is inherited from open_clip.
- English
Backbone | Pre-train Data | Pre-train Model | Fine-tune Data | Fine-tune Model (PSENet) | Precision | Recall | F-score |
---|---|---|---|---|---|---|---|
ResNet-50 | SynthText | Link | Total-Text | Link | 89.9 | 81.6 | 85.5 |
ResNet-101 | SynthText | Link | Total-Text | Link | 89.9 | 82.2 | 85.9 |
ResNet-50 | Web Image | Link | Total-Text | Link | 90.1 | 83.5 | 86.7 |
- Chinese
Backbone | Pre-train Data | Pre-train Model |
---|---|---|
ResNet-50 | LSVT-Weak Annotation | Link |
conda create -n oclip python=3.7
conda activate oclip
pip install -r requirement.txt
git clone https://github.com/bytedance/oclip.git
cd oclip
export PYTHONPATH="$PYTHONPATH:$PWD/src"
Download SynthText and put it to ./data.
You may use the provided script to generate the annotation for pre-training:
python tools/convert_synthtext_csv.py --data_dir data/SynthText/ --save_dir data/SynthText/
- Note we use [space] to represent the masked characters. For customized datasets, you may modify codes in src/training/data.py and your data annotation accordingly.
Sample running code for training:
CUDA_VISIBLE_DEVICES=0,1,2,3 python3 -u src/training/main.py \
--save-frequency 3 \
--report-to tensorboard \
--train-data="data/SynthText/train_char.csv" \
--char-dict-pth="data/SynthText/char_dict" \
--csv-img-key filepath \
--csv-caption-key title \
--warmup 10000 \
--batch-size=64 \
--lr=1e-4\
--wd=0.1 \
--epochs=100 \
--workers=8 \
--model RN50 \
--logs='output/RN50_synthtext'
We also provide a script for visualization of attention maps in the pre-trained model.
Download the pre-trained model to ./pretrained.
python3 tools/visualize_attn.py --model_path pretrained/RN50_synthtext.pt --char_dict_path data/SynthText/char_dict --model_config_file src/training/model_configs/RN50.json --im_fn demo/sample.jpg --text_list "ST LING" "STRLIN " "A GYLL'S" " ODGINGS" --demo_path demo/
Input Image | Image Attenion Map | "ST LING" | "STRLIN " | "A GYLL'S" | " ODGINGS" |
---|---|---|---|---|---|
We provide a script for converting model parameter names, thus it could be used in the dev-1.x branch of MMOCR
# first modify the model_path and save_path in tools/convert2mmocr.py
python tools/convert2mmocr.py
@article{xue2022language,
title={Language Matters: A Weakly Supervised Vision-Language Pre-training Approach for Scene Text Detection and Spotting},
author={Xue, Chuhui and Zhang, Wenqing and Hao, Yu and Lu, Shijian and Torr, Philip and Bai, Song},
journal={Proceedings of the European Conference on Computer Vision (ECCV)},
year={2022}
}