-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
waveglow번역 #104
base: master
Are you sure you want to change the base?
waveglow번역 #104
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -3,7 +3,7 @@ layout: hub_detail | |
background-class: hub-background | ||
body-class: hub | ||
title: WaveGlow | ||
summary: WaveGlow model for generating speech from mel spectrograms (generated by Tacotron2) | ||
summary: Tacotron2이 만들어낸 mel spectrograms을 소비하여 음성을 생성하기 위한 WaveGlow 모델입니다. | ||
category: researchers | ||
image: nvidia_logo.png | ||
author: NVIDIA | ||
|
@@ -18,58 +18,59 @@ demo-model-link: https://huggingface.co/spaces/pytorch/WaveGlow | |
--- | ||
|
||
|
||
### Model Description | ||
### 모델 설명 | ||
|
||
The Tacotron 2 and WaveGlow model form a text-to-speech system that enables user to synthesise a natural sounding speech from raw transcripts without any additional prosody information. The Tacotron 2 model (also available via torch.hub) produces mel spectrograms from input text using encoder-decoder architecture. WaveGlow is a flow-based model that consumes the mel spectrograms to generate speech. | ||
|
||
### Example | ||
Tacotron2 및 WaveGlow 모델은 사용자가 추가 운율 정보 없이 원본 텍스트에서 자연스러운 음성을 합성할 수 있는 텍스트 음성 변환 시스템을 형성합니다. Tacotron2 모델(torch.com를 통해서도 사용 가능)은 인코더-디코더 아키텍쳐를 사용하여 입력 텍스트로부터 mel spectrograms를 생성합니다. WaveGlow는 음성을 생성하기 위해 mel spectrograms를 소비하는 흐름 기반 모델입니다. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 맨 마지막 문장 |
||
|
||
In the example below: | ||
- pretrained Tacotron2 and Waveglow models are loaded from torch.hub | ||
- Tacotron2 generates mel spectrogram given tensor represantation of an input text ("Hello world, I missed you so much") | ||
- Waveglow generates sound given the mel spectrogram | ||
- the output sound is saved in an 'audio.wav' file | ||
### 예제 | ||
|
||
To run the example you need some extra python packages installed. | ||
These are needed for preprocessing the text and audio, as well as for display and input / output. | ||
아래의 예시에서 : | ||
- 사전 학습을 받은 Tacotron2 및 WaveGlow 모델들은 torch.hub에서 로드됩니다. | ||
- Tacotron2는 입력 텍스트의 텐서 표현("Hello world, I missed you so much")을 주어진 mel spectrogram을 생성합니다. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
와 같이 번역하는건 어떤가요? 해당문장에서 이해를 돕기위해 허브에서 같이 제공되는 그림을 언급하도록 원문의 구조도 변경하여 Merge한 상태입니다!. |
||
- WaveGlow는 mel spectrogram이 준 소리를 발생시킵니다. | ||
- 출력된 소리는 'audio.wav' 파일에 저장됩니다. | ||
|
||
예제를 실행하려면 추가 파이썬 패키지가 설치되어 있어야 합니다. | ||
텍스트 및 오디오 뿐만 아니라 디스플레이 및 입력/출력 전처리에 필요합니다. | ||
```bash | ||
pip install numpy scipy librosa unidecode inflect librosa | ||
apt-get update | ||
apt-get install -y libsndfile1 | ||
``` | ||
|
||
Load the WaveGlow model pre-trained on [LJ Speech dataset](https://keithito.com/LJ-Speech-Dataset/) | ||
[LJ Speech datase](https://keithito.com/LJ-Speech-Dataset/)에 대해 사전 학습을 받은 WaveGlow 모델을 로드합니다. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
```python | ||
import torch | ||
waveglow = torch.hub.load('NVIDIA/DeepLearningExamples:torchhub', 'nvidia_waveglow', model_math='fp32') | ||
``` | ||
|
||
Prepare the WaveGlow model for inference | ||
추론을 위해 WaveGlow 모델을 준비합니다. | ||
```python | ||
waveglow = waveglow.remove_weightnorm(waveglow) | ||
waveglow = waveglow.to('cuda') | ||
waveglow.eval() | ||
``` | ||
|
||
Load a pretrained Tacotron2 model | ||
사전 학습을 받은 Tacotron2 모델을 로드합니다. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 사전 학습을 받은 |
||
```python | ||
tacotron2 = torch.hub.load('NVIDIA/DeepLearningExamples:torchhub', 'nvidia_tacotron2', model_math='fp32') | ||
tacotron2 = tacotron2.to('cuda') | ||
tacotron2.eval() | ||
``` | ||
|
||
Now, let's make the model say: | ||
이제 모델이 이렇게 말하게 해봅시다 : | ||
```python | ||
text = "hello world, I missed you so much" | ||
``` | ||
|
||
Format the input using utility methods | ||
유용한 체계성을 사용하여 입력 형식을 지정합니다. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. "유용한 체계성을" |
||
```python | ||
utils = torch.hub.load('NVIDIA/DeepLearningExamples:torchhub', 'nvidia_tts_utils') | ||
sequences, lengths = utils.prepare_input_sequence([text]) | ||
``` | ||
|
||
Run the chained models | ||
체인 모델을 실행합니다. | ||
```python | ||
with torch.no_grad(): | ||
mel, _, _ = tacotron2.infer(sequences, lengths) | ||
|
@@ -78,22 +79,22 @@ audio_numpy = audio[0].data.cpu().numpy() | |
rate = 22050 | ||
``` | ||
|
||
You can write it to a file and listen to it | ||
그것을 파일에 쓰고 들을 수 있습니다. | ||
```python | ||
from scipy.io.wavfile import write | ||
write("audio.wav", rate, audio_numpy) | ||
``` | ||
|
||
Alternatively, play it right away in a notebook with IPython widgets | ||
또는 IPython 위젯을 사용하여 노트북에서 바로 재생할 수 있습니다. | ||
```python | ||
from IPython.display import Audio | ||
Audio(audio_numpy, rate=rate) | ||
``` | ||
|
||
### Details | ||
For detailed information on model input and output, training recipies, inference and performance visit: [github](https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/SpeechSynthesis/Tacotron2) and/or [NGC](https://ngc.nvidia.com/catalog/resources/nvidia:tacotron_2_and_waveglow_for_pytorch) | ||
### 세부사항 | ||
모델 입력 및 출력, 교육 방안, 추론 및 성과에 대한 자세한 내용은 방문하십시오 : [github](https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/SpeechSynthesis/Tacotron2) 그리고/또는 [NGC](https://ngc.nvidia.com/catalog/resources/nvidia:tacotron_2_and_waveglow_for_pytorch) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 문장에 목적어가 없어서 조금 어색하게 느껴지는데, |
||
|
||
### References | ||
### 참고 문헌 | ||
|
||
- [Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions](https://arxiv.org/abs/1712.05884) | ||
- [WaveGlow: A Flow-based Generative Network for Speech Synthesis](https://arxiv.org/abs/1811.00002) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'Tacotron2이' 이 부분이 조금 어색합니다
만약 2를 '이'라고 읽는다면 'Tacotron2가'가 되고
'투'라고 읽는다면 역시 'Tacotron2가' 됩니다
input, ouput을 강조하기 위해 from과 for를 소비와 생성으로 바꾸었는데
조금 더 읽기 편하게 바꾸면 어떨까요?
Tacotron2가 만든 멜 스펙트로그램(mel spectrograms)으로 음성을 만드는 WaveGlow 모델입니다.