Update readme for training encoder (#250)

This commit is contained in:
Wings Music 2021-12-07 19:10:29 +08:00 committed by GitHub
parent 4728863f9d
commit 875fe15069
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
2 changed files with 25 additions and 8 deletions

View File

@ -32,13 +32,21 @@
### 2. 准备预训练模型 ### 2. 准备预训练模型
考虑训练您自己专属的模型或者下载社区他人训练好的模型: 考虑训练您自己专属的模型或者下载社区他人训练好的模型:
> 近期创建了[知乎专题](https://www.zhihu.com/column/c_1425605280340504576) 将不定期更新炼丹小技巧or心得也欢迎提问 > 近期创建了[知乎专题](https://www.zhihu.com/column/c_1425605280340504576) 将不定期更新炼丹小技巧or心得也欢迎提问
#### 2.1 使用数据集自己训练合成器模型与2.2二选一) #### 2.1 使用数据集自己训练encoder模型 (可选)
* 进行音频和梅尔频谱图预处理:
`python encoder_preprocess.py <datasets_root>`
使用`-d {dataset}` 指定数据集,支持 librispeech_othervoxceleb1aidatatang_200zh使用逗号分割处理多数据集。
* 训练encoder: `python encoder_train.py my_run <datasets_root>/SV2TTS/encoder`
> 训练encoder使用了visdom。你可以加上`-no_visdom`禁用visdom但是有可视化会更好。在单独的命令行/进程中运行"visdom"来启动visdom服务器。
#### 2.2 使用数据集自己训练合成器模型与2.3二选一)
* 下载 数据集并解压:确保您可以访问 *train* 文件夹中的所有音频文件(如.wav * 下载 数据集并解压:确保您可以访问 *train* 文件夹中的所有音频文件(如.wav
* 进行音频和梅尔频谱图预处理: * 进行音频和梅尔频谱图预处理:
`python pre.py <datasets_root> -d {dataset} -n {number}` `python pre.py <datasets_root> -d {dataset} -n {number}`
可传入参数: 可传入参数:
* -d`{dataset}` 指定数据集,支持 aidatatang_200zh, magicdata, aishell3, data_aishell, 不传默认为aidatatang_200zh * `-d {dataset}` 指定数据集,支持 aidatatang_200zh, magicdata, aishell3, data_aishell, 不传默认为aidatatang_200zh
* -n `{number}` 指定并行数CPU 11770k + 32GB实测10没有问题 * `-n {number}` 指定并行数CPU 11770k + 32GB实测10没有问题
> 假如你下载的 `aidatatang_200zh`文件放在D盘`train`文件路径为 `D:\data\aidatatang_200zh\corpus\train` , 你的`datasets_root`就是 `D:\data\` > 假如你下载的 `aidatatang_200zh`文件放在D盘`train`文件路径为 `D:\data\aidatatang_200zh\corpus\train` , 你的`datasets_root`就是 `D:\data\`
* 训练合成器: * 训练合成器:
@ -46,7 +54,7 @@
* 当您在训练文件夹 *synthesizer/saved_models/* 中看到注意线显示和损失满足您的需要时,请转到`启动程序`一步。 * 当您在训练文件夹 *synthesizer/saved_models/* 中看到注意线显示和损失满足您的需要时,请转到`启动程序`一步。
#### 2.2使用社区预先训练好的合成器与2.1二选一) #### 2.3使用社区预先训练好的合成器与2.2二选一)
> 当实在没有设备或者不想慢慢调试,可以使用社区贡献的模型(欢迎持续分享): > 当实在没有设备或者不想慢慢调试,可以使用社区贡献的模型(欢迎持续分享):
| 作者 | 下载链接 | 效果预览 | 信息 | | 作者 | 下载链接 | 效果预览 | 信息 |
@ -56,7 +64,7 @@
|@FawenYo | https://drive.google.com/file/d/1H-YGOUHpmqKxJ9FRc6vAjPuqQki24UbC/view?usp=sharing [百度盘链接](https://pan.baidu.com/s/1vSYXO4wsLyjnF3Unl-Xoxg) 提取码1024 | [input](https://github.com/babysor/MockingBird/wiki/audio/self_test.mp3) [output](https://github.com/babysor/MockingBird/wiki/audio/export.wav) | 200k steps 台湾口音需切换到tag v0.0.1使用 |@FawenYo | https://drive.google.com/file/d/1H-YGOUHpmqKxJ9FRc6vAjPuqQki24UbC/view?usp=sharing [百度盘链接](https://pan.baidu.com/s/1vSYXO4wsLyjnF3Unl-Xoxg) 提取码1024 | [input](https://github.com/babysor/MockingBird/wiki/audio/self_test.mp3) [output](https://github.com/babysor/MockingBird/wiki/audio/export.wav) | 200k steps 台湾口音需切换到tag v0.0.1使用
|@miven| https://pan.baidu.com/s/1PI-hM3sn5wbeChRryX-RCQ 提取码2021 | https://www.bilibili.com/video/BV1uh411B7AD/ | 150k steps 注意:根据[issue](https://github.com/babysor/MockingBird/issues/37)修复 并切换到tag v0.0.1使用 |@miven| https://pan.baidu.com/s/1PI-hM3sn5wbeChRryX-RCQ 提取码2021 | https://www.bilibili.com/video/BV1uh411B7AD/ | 150k steps 注意:根据[issue](https://github.com/babysor/MockingBird/issues/37)修复 并切换到tag v0.0.1使用
#### 2.3训练声码器 (可选) #### 2.4训练声码器 (可选)
对效果影响不大已经预置3款如果希望自己训练可以参考以下命令。 对效果影响不大已经预置3款如果希望自己训练可以参考以下命令。
* 预处理数据: * 预处理数据:
`python vocoder_preprocess.py <datasets_root> -m <synthesizer_model_path>` `python vocoder_preprocess.py <datasets_root> -m <synthesizer_model_path>`

View File

@ -32,7 +32,16 @@
> Note that we are using the pretrained encoder/vocoder but synthesizer, since the original model is incompatible with the Chinese sympols. It means the demo_cli is not working at this moment. > Note that we are using the pretrained encoder/vocoder but synthesizer, since the original model is incompatible with the Chinese sympols. It means the demo_cli is not working at this moment.
### 2. Prepare your models ### 2. Prepare your models
You can either train your models or use existing ones: You can either train your models or use existing ones:
#### 2.1. Train synthesizer with your dataset
#### 2.1 Train encoder with your dataset (Optional)
* Preprocess with the audios and the mel spectrograms:
`python encoder_preprocess.py <datasets_root>` Allowing parameter `--dataset {dataset}` to support the datasets you want to preprocess. Only the train set of these datasets will be used. Possible names: librispeech_other, voxceleb1, voxceleb2. Use comma to sperate multiple datasets.
* Train the encoder: `python encoder_train.py my_run <datasets_root>/SV2TTS/encoder`
> For training, the encoder uses visdom. You can disable it with `--no_visdom`, but it's nice to have. Run "visdom" in a separate CLI/process to start your visdom server.
#### 2.2 Train synthesizer with your dataset
* Download dataset and unzip: make sure you can access all .wav in folder * Download dataset and unzip: make sure you can access all .wav in folder
* Preprocess with the audios and the mel spectrograms: * Preprocess with the audios and the mel spectrograms:
`python pre.py <datasets_root>` `python pre.py <datasets_root>`
@ -43,7 +52,7 @@ Allowing parameter `--dataset {dataset}` to support aidatatang_200zh, magicdata,
* Go to next step when you see attention line show and loss meet your need in training folder *synthesizer/saved_models/*. * Go to next step when you see attention line show and loss meet your need in training folder *synthesizer/saved_models/*.
#### 2.2 Use pretrained model of synthesizer #### 2.3 Use pretrained model of synthesizer
> Thanks to the community, some models will be shared: > Thanks to the community, some models will be shared:
| author | Download link | Preview Video | Info | | author | Download link | Preview Video | Info |
@ -53,7 +62,7 @@ Allowing parameter `--dataset {dataset}` to support aidatatang_200zh, magicdata,
|@FawenYo | https://drive.google.com/file/d/1H-YGOUHpmqKxJ9FRc6vAjPuqQki24UbC/view?usp=sharing https://u.teknik.io/AYxWf.pt | [input](https://github.com/babysor/MockingBird/wiki/audio/self_test.mp3) [output](https://github.com/babysor/MockingBird/wiki/audio/export.wav) | 200k steps with local accent of Taiwan, only works under version 0.0.1 |@FawenYo | https://drive.google.com/file/d/1H-YGOUHpmqKxJ9FRc6vAjPuqQki24UbC/view?usp=sharing https://u.teknik.io/AYxWf.pt | [input](https://github.com/babysor/MockingBird/wiki/audio/self_test.mp3) [output](https://github.com/babysor/MockingBird/wiki/audio/export.wav) | 200k steps with local accent of Taiwan, only works under version 0.0.1
|@miven| https://pan.baidu.com/s/1PI-hM3sn5wbeChRryX-RCQ code2021 | https://www.bilibili.com/video/BV1uh411B7AD/ | only works under version 0.0.1 |@miven| https://pan.baidu.com/s/1PI-hM3sn5wbeChRryX-RCQ code2021 | https://www.bilibili.com/video/BV1uh411B7AD/ | only works under version 0.0.1
#### 2.3 Train vocoder (Optional) #### 2.4 Train vocoder (Optional)
> note: vocoder has little difference in effect, so you may not need to train a new one. > note: vocoder has little difference in effect, so you may not need to train a new one.
* Preprocess the data: * Preprocess the data:
`python vocoder_preprocess.py <datasets_root> -m <synthesizer_model_path>` `python vocoder_preprocess.py <datasets_root> -m <synthesizer_model_path>`