update README.md

large-ocr-model · Jan 16, 2024 · fafe675 · fafe675
1 parent 0fe2e92
commit fafe675
Show file tree

Hide file tree

Showing 6 changed files with 4 additions and 4 deletions.
diff --git a/.DS_Store b/.DS_Store
diff --git a/README.md b/README.md
@@ -5,8 +5,7 @@
 
 Recently, multimodal large models have received widespread attention in academia and industry, and their visual language interaction capabilities have been outstanding. However, in the field of optical character recognition (OCR), that is, the ability to extract textual information from images, the performance of multi-modal large models is relatively weak. With the continuous advancement of technology, this problem has been significantly improved with the advanced performance of large OCR models. OCR large models perform better in terms of recognition accuracy and robustness. Today, OCR large models have become an important tool for multi-modal large models in the OCR field, providing strong support for the development of related applications. We present the OCR model to Qwen-VL-Chat within the framework of the expanding research on multi-modal large models (LMM) and carry out an extensive evaluation on four VQA tasks. The findings demonstrate the effectiveness of OCR in processing challenging visual-language interaction tasks, the significance of OCR in enhancing multi-modal large model text recognition capabilities, and the significant improvement in LMM accuracy on VQA tasks.
 
-
-<div align="center"><img src="assets/table5.png" style="zoom:60%" alt="table5"/></div>
+<p align="center"><img src="assets/table5.png"></p>
 
 
 ## 📸 VQA visualization effects
@@ -23,14 +22,15 @@ Recently, multimodal large models have received widespread attention in academia
 
 In the field of natural language processing (NLP), the relationship between model size, data volume, computing power and model performance has been extensively studied. However, in the field of optical character recognition (OCR), the exploration of these "scaling laws" is still in its infancy. To fill this gap, we conducted a comprehensive study and in-depth analysis of the relationship between model size, data volume, and computing power and OCR performance. The results reveal that, holding other influencing factors constant, there is a smooth exponential relationship between performance and model size and training data volume. In addition, we also create a large-scale dataset REBU-Syn, containing 6 million real samples and 18 million synthetic samples. Using these rules and data sets, we successfully trained a high-precision OCR model and achieved SOTA accuracy on the OCR test benchmark. **In particular, we found that the OCR model can significantly enhance the capabilities of multi-modal large models and achieve significant accuracy improvements on multiple VQA tasks, proving the great potential of OCR in improving the performance of multi-modal large models.**
 
+<p align="center"><img src="assets/f1.png"></p>
+
 
-<div align="center"><img src="assets/f1.png" style="zoom:40%" alt="f1"/></div>
 
 ## 🛠️ Dataset
 
 In the field of OCR, the quality and diversity of data sets are extremely important. We created a new data set REBU-Syn by collecting and integrating open source data sets. In addition, we utilize the latest generation technology to generate 60M synthetic data MJST+ for additional use.
 
-<div align="center"><img src="assets/table3.png" style="zoom:60%" alt="table3"/></div>
+<p align="center"><img src="assets/table3.png"></p>
 
 ## 🗝️ Scaling Law for OCR
 

diff --git a/assets/.DS_Store b/assets/.DS_Store
diff --git a/assets/f1.png b/assets/f1.png
diff --git a/assets/table3.png b/assets/table3.png
diff --git a/assets/table5.png b/assets/table5.png