Skip to content
/ ODM Public

ODM: A Text-Image Further Alignment Pre-training Approach for Scene Text Detection and Spotting

Notifications You must be signed in to change notification settings

PriNing/ODM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 

Repository files navigation

ODM

method

This repository is the official implementation for the following paper:

ODM: A Text-Image Further Alignment Pre-training Approach for Scene Text Detection and Spotting

Chen Duan, Pei Fu, Shan Guo, Qianyi Jiang, Xiaoming Wei, CVPR 2024

Part of the code is inherited from oCLIP.

Data

Download SynthText

ODM model

Download ODM, and extract RN50 from ODM.

We provide a script for converting model parameter names:

python tools/convert2mmocr.py

Train

Single-GPU:

python -u src/training/main.py     \
--save-frequency 20     \
--report-to tensorboard     \
--train-data /path/to/data      \
--char-dict-pth /path/to/char    \
--gt_dir /path/to/gt \
--csv-img-key filepath     \
--csv-caption-key title     \
--warmup 10000     \
--batch-size=32  \
--lr=1e-4    \
--wd=0.1     \
--epochs=100     \
--workers=4    \
--model RN50_Seg_Clip  \
--gpu 0 \
--logs=/path/to/save \
--prefix demo \

Multi-GPU

python -u src/training/main.py     \
--save-frequency 20     \
--report-to tensorboard     \
--train-data /path/to/data      \
--char-dict-pth /path/to/char    \
--gt_dir /path/to/gt \
--csv-img-key filepath     \
--csv-caption-key title     \
--warmup 10000     \
--batch-size=32  \
--lr=1e-4    \
--wd=0.1     \
--epochs=100     \
--workers=4    \
--model RN50_Seg_Clip  \
--logs=/path/to/save \
--prefix demo \

Visualization

Visualization

Citation

@inproceedings{duan2024odm,
  title={ODM: A Text-Image Further Alignment Pre-training Approach for Scene Text Detection and Spotting},
  author={Duan, Chen and Fu, Pei and Guo, Shan and Jiang, Qianyi and Wei, Xiaoming},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={15587--15597},
  year={2024}
}

About

ODM: A Text-Image Further Alignment Pre-training Approach for Scene Text Detection and Spotting

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages