Skip to content

Commit

Permalink
Merge branch 'main' of github.com:Winfredy/SadTalker
Browse files Browse the repository at this point in the history
  • Loading branch information
vinthony committed Apr 8, 2023
2 parents 479a5ad + 8a99c8b commit 35dd94f
Show file tree
Hide file tree
Showing 6 changed files with 337 additions and 133 deletions.
54 changes: 39 additions & 15 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,11 @@
</div>

## 🔥 Highlight

- 🔥 The extension of the [stable-diffusion-webui](https://github.com/AUTOMATIC1111/stable-diffusion-webui) is online. Just install it in `extensions -> install from URL -> https://github.com/Winfredy/SadTalker`, checkout more details [here](#sd-webui-extension).

https://user-images.githubusercontent.com/4397546/222513483-89161f58-83d0-40e4-8e41-96c32b47bd4e.mp4

- 🔥 Beta version of the `full image mode` is online! checkout [here](https://github.com/Winfredy/SadTalker#beta-full-bodyimage-generation) for more details.

| still | still + enhancer | [input image @bagbag1815](https://twitter.com/bagbag1815/status/1642754319094108161) |
Expand All @@ -49,6 +54,10 @@

## 📋 Changelog (Previous changelog can be founded [here](docs/changlelog.md))

- __[2023.04.06]__: stable-diffiusion webui extension is release.

- __[2023.04.03]__: Enable TTS in huggingface and gradio local demo.

- __[2023.03.30]__: Launch beta version of the full body mode.

- __[2023.03.30]__: Launch new feature: through using reference videos, our algorithm can generate videos with more natural eye blinking and some eyebrow movement.
Expand Down Expand Up @@ -82,16 +91,14 @@ the 3D-aware face render for final video generation.
- [ ] training code of each componments.
- [ ] Audio-driven Anime Avatar.
- [ ] interpolate ChatGPT for a conversation demo 🤔
- [ ] integrade with stable-diffusion-web-ui. (stay tunning!)
- [x] integrade with stable-diffusion-web-ui. (stay tunning!)

https://user-images.githubusercontent.com/4397546/222513483-89161f58-83d0-40e4-8e41-96c32b47bd4e.mp4


## ⚙️ Installation

#### Dependence Installation
## ⚙️ Installation

<details><summary>CLICK ME For Mannual Installation </summary>
#### Installing Sadtalker on Linux:

```bash
git clone https://github.com/Winfredy/SadTalker.git
Expand All @@ -108,25 +115,39 @@ conda install ffmpeg

pip install -r requirements.txt

### tts is optional for gradio demo.
### pip install TTS

```

</details>
More tips about installnation on Windows and the Docker file can be founded [here](docs/install.md)

#### Sd-Webui-Extension:
<details><summary>CLICK ME</summary>

<details><summary>CLICK For Docker Installation </summary>
Installing the lastest version of [stable-diffusion-webui](https://github.com/AUTOMATIC1111/stable-diffusion-webui) and install the sadtalker via `extension`.
<img width="726" alt="image" src="https://user-images.githubusercontent.com/4397546/230698519-267d1d1f-6e99-4dd4-81e1-7b889259efbd.png">

A dockerfile are also provided by [@thegenerativegeneration](https://github.com/thegenerativegeneration) in [docker hub](https://hub.docker.com/repository/docker/wawa9000/sadtalker), which can be used directly as:
Then, retarting the stable-diffusion-webui, set some commandline args. The models will be downloaded automatically in the right place. Alternatively, you can add the path of pre-downloaded sadtalker checkpoints to `SADTALKTER_CHECKPOINTS` in `webui_user.sh`(linux) or `webui_user.bat`(windows) by:

```bash
docker run --gpus "all" --rm -v $(pwd):/host_dir wawa9000/sadtalker \
--driven_audio /host_dir/deyu.wav \
--source_image /host_dir/image.jpg \
--expression_scale 1.0 \
--still \
--result_dir /host_dir
# windows (webui_user.bat)
set COMMANDLINE_ARGS=--no-gradio-queue --disable-safe-unpickle
set SADTALKER_CHECKPOINTS=D:\SadTalker\checkpoints

# linux (webui_user.sh)
export COMMANDLINE_ARGS=--no-gradio-queue --disable-safe-unpickle
export SADTALKER_CHECKPOINTS=/path/to/SadTalker/checkpoints
```

After installation, the SadTalker can be used in stable-diffusion-webui directly.

<img width="726" alt="image" src="https://user-images.githubusercontent.com/4397546/230698614-58015182-2916-4240-b324-e69022ef75b3.png">

</details>



#### Download Trained Models
<details><summary>CLICK ME</summary>

Expand Down Expand Up @@ -161,9 +182,12 @@ python inference.py --driven_audio <audio.wav> --source_image <video.mp4 or pict
```
The results will be saved in `results/$SOME_TIMESTAMP/*.mp4`.

Or a local gradio demo can be run by:
Or a local gradio demo similar to our [hugging-face demo](https://huggingface.co/spaces/vinthony/SadTalker) can be run by:

```bash

## you need manually install TTS(https://github.com/coqui-ai/TTS) via `pip install tts` in advanced.

python app.py
```

Expand Down
25 changes: 25 additions & 0 deletions docs/install.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@



### Windows Native

- Make sure you have `ffmpeg` in the `%PATH%` as suggested in [#54](https://github.com/Winfredy/SadTalker/issues/54), following [this](https://www.geeksforgeeks.org/how-to-install-ffmpeg-on-windows/) installation to install `ffmpeg`.


### Windows WSL
- Make sure the environment: `export LD_LIBRARY_PATH=/usr/lib/wsl/lib:$LD_LIBRARY_PATH`


### Docker installnation

A dockerfile are also provided by [@thegenerativegeneration](https://github.com/thegenerativegeneration) in [docker hub](https://hub.docker.com/repository/docker/wawa9000/sadtalker), which can be used directly as:

```bash
docker run --gpus "all" --rm -v $(pwd):/host_dir wawa9000/sadtalker \
--driven_audio /host_dir/deyu.wav \
--source_image /host_dir/image.jpg \
--expression_scale 1.0 \
--still \
--result_dir /host_dir
```

23 changes: 12 additions & 11 deletions scripts/download_models.sh
Original file line number Diff line number Diff line change
@@ -1,12 +1,13 @@
mkdir ./checkpoints
wget https://github.com/Winfredy/SadTalker/releases/download/v0.0.1/auido2exp_00300-model.pth -O ./checkpoints/auido2exp_00300-model.pth
wget https://github.com/Winfredy/SadTalker/releases/download/v0.0.1/auido2pose_00140-model.pth -O ./checkpoints/auido2pose_00140-model.pth
wget https://github.com/Winfredy/SadTalker/releases/download/v0.0.1/epoch_20.pth -O ./checkpoints/epoch_20.pth
wget https://github.com/Winfredy/SadTalker/releases/download/v0.0.1/facevid2vid_00189-model.pth.tar -O ./checkpoints/facevid2vid_00189-model.pth.tar
wget https://github.com/Winfredy/SadTalker/releases/download/v0.0.1/shape_predictor_68_face_landmarks.dat -O ./checkpoints/shape_predictor_68_face_landmarks.dat
wget https://github.com/Winfredy/SadTalker/releases/download/v0.0.1/wav2lip.pth -O ./checkpoints/wav2lip.pth
wget https://github.com/Winfredy/SadTalker/releases/download/v0.0.1/mapping_00229-model.pth.tar -O ./checkpoints/mapping_00229-model.pth.tar
wget https://github.com/Winfredy/SadTalker/releases/download/v0.0.1/BFM_Fitting.zip -O ./checkpoints/BFM_Fitting.zip
wget https://github.com/Winfredy/SadTalker/releases/download/v0.0.1/hub.zip -O ./checkpoints/hub.zip
unzip ./checkpoints/hub.zip -d ./checkpoints/
unzip ./checkpoints/BFM_Fitting.zip -d ./checkpoints/
wget -nc https://github.com/Winfredy/SadTalker/releases/download/v0.0.1/auido2exp_00300-model.pth -O ./checkpoints/auido2exp_00300-model.pth
wget -nc https://github.com/Winfredy/SadTalker/releases/download/v0.0.1/auido2pose_00140-model.pth -O ./checkpoints/auido2pose_00140-model.pth
wget -nc https://github.com/Winfredy/SadTalker/releases/download/v0.0.1/epoch_20.pth -O ./checkpoints/epoch_20.pth
wget -nc https://github.com/Winfredy/SadTalker/releases/download/v0.0.1/facevid2vid_00189-model.pth.tar -O ./checkpoints/facevid2vid_00189-model.pth.tar
wget -nc https://github.com/Winfredy/SadTalker/releases/download/v0.0.1/shape_predictor_68_face_landmarks.dat -O ./checkpoints/shape_predictor_68_face_landmarks.dat
wget -nc https://github.com/Winfredy/SadTalker/releases/download/v0.0.1/wav2lip.pth -O ./checkpoints/wav2lip.pth
wget -nc https://github.com/Winfredy/SadTalker/releases/download/v0.0.1/mapping_00229-model.pth.tar -O ./checkpoints/mapping_00229-model.pth.tar
wget -nc https://github.com/Winfredy/SadTalker/releases/download/v0.0.1/BFM_Fitting.zip -O ./checkpoints/BFM_Fitting.zip
wget -nc https://github.com/Winfredy/SadTalker/releases/download/v0.0.1/hub.zip -O ./checkpoints/hub.zip

unzip -n ./checkpoints/hub.zip -d ./checkpoints/
unzip -n ./checkpoints/BFM_Fitting.zip -d ./checkpoints/
133 changes: 133 additions & 0 deletions scripts/extension.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,133 @@
import os, sys
from pathlib import Path
import tempfile
import gradio as gr
from modules.call_queue import wrap_gradio_gpu_call, wrap_queued_call
from modules.shared import opts, OptionInfo
from modules import shared, paths, script_callbacks
import launch
import glob

def get_source_image(image):
return image

def get_img_from_txt2img(x):
talker_path = Path(paths.script_path) / "outputs"
imgs_from_txt_dir = str(talker_path / "txt2img-images/")
imgs = glob.glob(imgs_from_txt_dir+'/*/*.png')
imgs.sort(key=lambda x:os.path.getmtime(os.path.join(imgs_from_txt_dir, x)))
img_from_txt_path = os.path.join(imgs_from_txt_dir, imgs[-1])
return img_from_txt_path, img_from_txt_path

def get_img_from_img2img(x):
talker_path = Path(paths.script_path) / "outputs"
imgs_from_img_dir = str(talker_path / "img2img-images/")
imgs = glob.glob(imgs_from_img_dir+'/*/*.png')
imgs.sort(key=lambda x:os.path.getmtime(os.path.join(imgs_from_img_dir, x)))
img_from_img_path = os.path.join(imgs_from_img_dir, imgs[-1])
return img_from_img_path, img_from_img_path

def install():

kv = {
"face-alignment": "face-alignment==1.3.5",
"imageio": "imageio==2.19.3",
"imageio-ffmpeg": "imageio-ffmpeg==0.4.7",
"librosa":"librosa==0.8.0",
"pydub":"pydub==0.25.1",
"scipy":"scipy==1.8.1",
"tqdm": "tqdm",
"yacs":"yacs==0.1.8",
"pyyaml": "pyyaml",
"dlib": "dlib-bin",
"gfpgan": "gfpgan",
}

for k,v in kv.items():
print(k, launch.is_installed(k))
if not launch.is_installed(k):
launch.run_pip("install "+ v, "requirements for SadTalker")


if os.getenv('SADTALKER_CHECKPOINTS'):
print('load Sadtalker Checkpoints from '+ os.getenv('SADTALKER_CHECKPOINTS'))
else:
### run the scripts to downlod models to correct localtion.
print('download models for SadTalker')
launch.run("cd " + paths.script_path+"/extensions/SadTalker && bash ./scripts/download_models.sh", live=True)
print('SadTalker is successfully installed!')


def on_ui_tabs():
install()

sys.path.extend([paths.script_path+'/extensions/SadTalker'])

repo_dir = paths.script_path+'/extensions/SadTalker/'

result_dir = opts.sadtalker_result_dir
os.makedirs(result_dir, exist_ok=True)

from src.gradio_demo import SadTalker

if os.getenv('SADTALKER_CHECKPOINTS'):
checkpoint_path = os.getenv('SADTALKER_CHECKPOINTS')
else:
checkpoint_path = repo_dir+'checkpoints/'

sad_talker = SadTalker(checkpoint_path=checkpoint_path, config_path=repo_dir+'src/config', lazy_load=True)

with gr.Blocks(analytics_enabled=False) as audio_to_video:
with gr.Row().style(equal_height=False):
with gr.Column(variant='panel'):
with gr.Tabs(elem_id="sadtalker_source_image"):
with gr.TabItem('Upload image'):
with gr.Row():
input_image = gr.Image(label="Source image", source="upload", type="filepath").style(height=512,width=512)

with gr.Row():
submit_image2 = gr.Button('load From txt2img', variant='primary')
submit_image2.click(fn=get_img_from_txt2img, inputs=input_image, outputs=[input_image, input_image])

submit_image3 = gr.Button('load from img2img', variant='primary')
submit_image3.click(fn=get_img_from_img2img, inputs=input_image, outputs=[input_image, input_image])

with gr.Tabs(elem_id="sadtalker_driven_audio"):
with gr.TabItem('Upload'):
with gr.Column(variant='panel'):

with gr.Row():
driven_audio = gr.Audio(label="Input audio", source="upload", type="filepath")


with gr.Column(variant='panel'):
with gr.Tabs(elem_id="sadtalker_checkbox"):
with gr.TabItem('Settings'):
with gr.Column(variant='panel'):
is_still_mode = gr.Checkbox(label="Still Mode (fewer head motion)").style(container=True)
is_enhance_mode = gr.Checkbox(label="Enhance Mode (better face quality )").style(container=True)
submit = gr.Button('Generate', elem_id="sadtalker_generate", variant='primary')

with gr.Tabs(elem_id="sadtalker_genearted"):
gen_video = gr.Video(label="Generated video", format="mp4").style(width=256)


### gradio gpu call will always return the html,
submit.click(
fn=wrap_queued_call(sad_talker.test),
inputs=[input_image,
driven_audio,
is_still_mode,
is_enhance_mode],
outputs=[gen_video, ]
)

return [(audio_to_video, "SadTalker", "extension")]

def on_ui_settings():
talker_path = Path(paths.script_path) / "outputs"
section = ('extension', "SadTalker")
opts.add_option("sadtalker_result_dir", OptionInfo(str(talker_path / "SadTalker/"), "Path to save results of sadtalker", section=section))

script_callbacks.on_ui_settings(on_ui_settings)
script_callbacks.on_ui_tabs(on_ui_tabs)
Loading

0 comments on commit 35dd94f

Please sign in to comment.