2School of Artificial Intelligence, University of Chinese Academy of Sciencesβ
3ByteDance, Inc 4University of Science and Technology of Chinaβ
β If DreamClear is helpful to your projects, please help star this repo. Thanks! π€
- 2024.11.30: Release more convenient inference code for your own images.
- 2024.10.25: Release segmentation&detection code, pre-trained models.
- 2024.10.25: Release
RealLQ250
benchmark, which contains 250 real-world LQ images. - 2024.10.25: Release training&inference code, pre-trained models of DreamClear.
- 2024.10.24: This repo is created.
-
Clone this repo and navigate to DreamClear folder
git clone https://github.com/shallowdream204/DreamClear.git cd DreamClear
-
Create Conda Environment and Install Package
conda create -n dreamclear python=3.9 -y conda activate dreamclear pip3 install -r requirements.txt
-
Download Pre-trained Models (All models except for llava can be downloaded at Huggingface for convenience.)
PixArt-Ξ±-1024
: PixArt-XL-2-1024-MS.pthVAE
: sd-vae-ft-emaT5 Text Encoder
: t5-v1_1-xxlLLaVA
: llava-v1.6-vicuna-13bSwinIR
: general_swinir_v1.ckpt
DreamClear
: DreamClear-1024.pthRMT for Segmentation
: rmt_uper_s_2x.pthRMT for Detection
: rmt_maskrcnn_s_1x.pth
Similar to SeeSR, We pre-prepare HQ-LQ image pairs for the training of IR model. Run the following command to make paired data for training:
python3 tools/make_paired_data.py \
--gt_path gt_path1 gt_path2 ... \
--save_dir /path/to/save/folder/ \
--epoch 1 # number of epochs to generate paired data
After generating paired data, you can use MLLM (e.g., LLaVA) to generate detailed text prompt for HQ images. Then you need to use T5 to extract text features in order to save training time. Run:
python3 tools/extract_t5_features.py \
--t5_ckpt /path/to/t5-v1_1-xxl \
--caption_folder /path/to/caption/folder \
--save_npz_folder /path/to/save/npz/folder
Finally, the directory structure for training datasets should look like
training_datasets_folder/
βββ gt
βββ 0000001.png # GT , (1024, 1024, 3)
βββ ...
βββ sr_bicubic
βββ 0000001.png # LQ + bicubic upsample, (1024, 1024, 3)
βββ ...
βββ caption
βββ 0000001.txt # Caption files (not used in training)
βββ ...
βββ npz
βββ 0000001.npz # T5 features
βββ ...
Run the following command to train DreamClear with default settings:
python3 -m torch.distributed.launch --nproc_per_node=8 --nnodes=... --node_rank=... --master_addr=... --master_port=... \
train_dreamclear.py configs/DreamClear/DreamClear_Train.py \
--load_from /path/to/PixArt-XL-2-1024-MS.pth \
--vae_pretrained /path/to/sd-vae-ft-ema \
--swinir_pretrained /path/to/general_swinir_v1.ckpt \
--val_image /path/to/RealLQ250/lq/val_image.png \
--val_npz /path/to/RealLQ250/npz/val_image.npz \
--work_dir experiments/train_dreamclear
Please modify the path of training datasets in configs/DreamClear/DreamClear_Train.py
. You can also modify the training hyper-parameters (e.g., lr
, train_batch_size
, gradient_accumulation_steps
) in this file, according to your own GPU machines.
We provide the RealLQ250
benchmark, which can be downloaded from Google Drive.
Run the following command to restore LQ images (the code defaults to using 2 GPUs for inference):
python3 -m torch.distributed.launch --nproc_per_node 1 --master_port 1234 \
test.py configs/DreamClear/DreamClear_Test.py \
--dreamclear_ckpt /path/to/DreamClear-1024.pth \
--swinir_ckpt /path/to/general_swinir_v1.ckpt \
--vae_ckpt /path/to/sd-vae-ft-ema \
--t5_ckpt /path/to/t5-v1_1-xxl \
--llava_ckpt /path/to/llava-v1.6-vicuna-13b \
--lre --cfg_scale 4.5 --color_align wavelet \
--image_path /path/to/input/images \
--save_dir validation \
--mixed_precision fp16 \
--upscale 4
Testing instructions for segmentation and detection can be found in their respective folders.
The provided code and pre-trained weights are licensed under the Apache 2.0 license.
This code is based on PixArt-Ξ±, BasicSR and RMT. Some code are brought from SeeSR, StableSR, DiffBIR and LLaVA. We thank the authors for their awesome work.
If you have any questions, please feel free to reach me out at [email protected].
If you find our work useful for your research, please consider citing our paper:
@inproceedings{ai2024dreamclear,
title={DreamClear: High-Capacity Real-World Image Restoration with Privacy-Safe Dataset Curation},
author={Yuang Ai and Xiaoqiang Zhou and Huaibo Huang and Xiaotian Han and Zhengyu Chen and Quanzeng You and Hongxia Yang},
booktitle={The Thirty-eighth Annual Conference on Neural Information Processing Systems},
year={2024},
url={https://openreview.net/forum?id=6eoGVqMiIj}
}