diff --git a/.gitignore b/.gitignore index 9401d2e..4c7ed97 100644 --- a/.gitignore +++ b/.gitignore @@ -15,6 +15,4 @@ *.toc *.wav *.sh -encoder/saved_models/* -synthesizer/saved_models/* -vocoder/saved_models/* +synthesizer/saved_models/* \ No newline at end of file diff --git a/README-CN.md b/README-CN.md index fca50d5..d679aa3 100644 --- a/README-CN.md +++ b/README-CN.md @@ -25,16 +25,7 @@ * 安装 [ffmpeg](https://ffmpeg.org/download.html#get-packages)。 * 运行`pip install -r requirements.txt` 来安装剩余的必要包。 -### 2. 使用预训练好的编码器/声码器 -下载[预训练模型](https://github.com/CorentinJ/Real-Time-Voice-Cloning/wiki/Pretrained-models),解压下载内容,并复制`encoder`与`vocoder`下的`saved_models`到本代码库的相应目录下 - -确保得到以下文件: -``` -encoder\saved_models\pretrained.pt -vocoder\saved_models\pretrained\pretrained.pt -``` - -### 3. 使用数据集训练合成器 +### 2. 使用数据集训练合成器 * 下载 数据集并解压:确保您可以访问 *train* 文件夹中的所有音频文件(如.wav) * 使用音频和梅尔频谱图进行预处理: `python synthesizer_preprocess_audio.py ` @@ -50,14 +41,15 @@ vocoder\saved_models\pretrained\pretrained.pt * 当您在训练文件夹 *synthesizer/saved_models/* 中看到注意线显示和损失满足您的需要时,请转到下一步。 > 仅供参考,我的注意力是在 18k 步之后出现的,并且在 50k 步之后损失变得低于 0.4。 - -### 4. 启动工具箱 +### 3. 启动工具箱 然后您可以尝试使用工具箱: `python demo_toolbox.py -d ` ## TODO -- [ ] 允许直接使用中文 +- [X] 允许直接使用中文 - [X] 添加演示视频 - [X] 添加对更多数据集的支持 -- [ ] 上传预训练模型 +- [X] 上传预训练模型 +- [ ] 支持parallel tacotron +- [ ] 服务化与容器化 - [ ] 🙏 欢迎补充 diff --git a/README.md b/README.md index 73b7878..3d15b54 100644 --- a/README.md +++ b/README.md @@ -26,12 +26,8 @@ * Install [PyTorch](https://pytorch.org/get-started/locally/). * Install [ffmpeg](https://ffmpeg.org/download.html#get-packages). * Run `pip install -r requirements.txt` to install the remaining necessary packages. - -### 2. Reuse the pretrained encoder/vocoder -* Download the following models and extract the encoder and vocoder models to the according directory of this project. Don't use the synthesizer -https://github.com/CorentinJ/Real-Time-Voice-Cloning/wiki/Pretrained-models -> Note that we need to specify the newly trained synthesizer model, since the original model is incompatible with the Chinese sympols. It means the demo_cli is not working at this moment. -### 3. Train synthesizer with your dataset +> Note that we are using the pretrained encoder/vocoder but synthesizer, since the original model is incompatible with the Chinese sympols. It means the demo_cli is not working at this moment. +### 2. Train synthesizer with your dataset * Download aidatatang_200zh or SLR68 dataset and unzip: make sure you can access all .wav in *train* folder * Preprocess with the audios and the mel spectrograms: `python synthesizer_preprocess_audio.py ` @@ -48,7 +44,7 @@ Allow parameter `--dataset {dataset}` to support adatatang_200zh, SLR68 ![step-135500-mel-spectrogram_sample_1](https://user-images.githubusercontent.com/7423248/128587255-4945faa0-5517-46ea-b173-928eff999330.png) > A link to my early trained model: [Baidu Yun](https://pan.baidu.com/s/10t3XycWiNIg5dN5E_bMORQ) Code:aid4 -### 4. Launch the Toolbox +### 3. Launch the Toolbox You can then try the toolbox: `python demo_toolbox.py -d ` @@ -59,4 +55,6 @@ or - [x] Add demo video - [X] Add support for more dataset - [X] Upload pretrained model +- [ ] Support parallel tacotron +- [ ] Service orianted and docterize - 🙏 Welcome to add more diff --git a/encoder/saved_models/pretrained.pt b/encoder/saved_models/pretrained.pt new file mode 100644 index 0000000..47e405a Binary files /dev/null and b/encoder/saved_models/pretrained.pt differ diff --git a/vocoder/saved_models/pretrained/pretrained.pt b/vocoder/saved_models/pretrained/pretrained.pt new file mode 100644 index 0000000..ee2d5cf Binary files /dev/null and b/vocoder/saved_models/pretrained/pretrained.pt differ