From 875fe1506931bba6c15c4e7068d7a415e87cf3af Mon Sep 17 00:00:00 2001 From: Wings Music Date: Tue, 7 Dec 2021 19:10:29 +0800 Subject: [PATCH] Update readme for training encoder (#250) --- README-CN.md | 18 +++++++++++++----- README.md | 15 ++++++++++++--- 2 files changed, 25 insertions(+), 8 deletions(-) diff --git a/README-CN.md b/README-CN.md index 523a2a0..096ee5a 100644 --- a/README-CN.md +++ b/README-CN.md @@ -32,13 +32,21 @@ ### 2. 准备预训练模型 考虑训练您自己专属的模型或者下载社区他人训练好的模型: > 近期创建了[知乎专题](https://www.zhihu.com/column/c_1425605280340504576) 将不定期更新炼丹小技巧or心得,也欢迎提问 -#### 2.1 使用数据集自己训练合成器模型(与2.2二选一) +#### 2.1 使用数据集自己训练encoder模型 (可选) + +* 进行音频和梅尔频谱图预处理: +`python encoder_preprocess.py ` +使用`-d {dataset}` 指定数据集,支持 librispeech_other,voxceleb1,aidatatang_200zh,使用逗号分割处理多数据集。 +* 训练encoder: `python encoder_train.py my_run /SV2TTS/encoder` +> 训练encoder使用了visdom。你可以加上`-no_visdom`禁用visdom,但是有可视化会更好。在单独的命令行/进程中运行"visdom"来启动visdom服务器。 + +#### 2.2 使用数据集自己训练合成器模型(与2.3二选一) * 下载 数据集并解压:确保您可以访问 *train* 文件夹中的所有音频文件(如.wav) * 进行音频和梅尔频谱图预处理: `python pre.py -d {dataset} -n {number}` 可传入参数: -* -d`{dataset}` 指定数据集,支持 aidatatang_200zh, magicdata, aishell3, data_aishell, 不传默认为aidatatang_200zh -* -n `{number}` 指定并行数,CPU 11770k + 32GB实测10没有问题 +* `-d {dataset}` 指定数据集,支持 aidatatang_200zh, magicdata, aishell3, data_aishell, 不传默认为aidatatang_200zh +* `-n {number}` 指定并行数,CPU 11770k + 32GB实测10没有问题 > 假如你下载的 `aidatatang_200zh`文件放在D盘,`train`文件路径为 `D:\data\aidatatang_200zh\corpus\train` , 你的`datasets_root`就是 `D:\data\` * 训练合成器: @@ -46,7 +54,7 @@ * 当您在训练文件夹 *synthesizer/saved_models/* 中看到注意线显示和损失满足您的需要时,请转到`启动程序`一步。 -#### 2.2使用社区预先训练好的合成器(与2.1二选一) +#### 2.3使用社区预先训练好的合成器(与2.2二选一) > 当实在没有设备或者不想慢慢调试,可以使用社区贡献的模型(欢迎持续分享): | 作者 | 下载链接 | 效果预览 | 信息 | @@ -56,7 +64,7 @@ |@FawenYo | https://drive.google.com/file/d/1H-YGOUHpmqKxJ9FRc6vAjPuqQki24UbC/view?usp=sharing [百度盘链接](https://pan.baidu.com/s/1vSYXO4wsLyjnF3Unl-Xoxg) 提取码:1024 | [input](https://github.com/babysor/MockingBird/wiki/audio/self_test.mp3) [output](https://github.com/babysor/MockingBird/wiki/audio/export.wav) | 200k steps 台湾口音需切换到tag v0.0.1使用 |@miven| https://pan.baidu.com/s/1PI-hM3sn5wbeChRryX-RCQ 提取码:2021 | https://www.bilibili.com/video/BV1uh411B7AD/ | 150k steps 注意:根据[issue](https://github.com/babysor/MockingBird/issues/37)修复 并切换到tag v0.0.1使用 -#### 2.3训练声码器 (可选) +#### 2.4训练声码器 (可选) 对效果影响不大,已经预置3款,如果希望自己训练可以参考以下命令。 * 预处理数据: `python vocoder_preprocess.py -m ` diff --git a/README.md b/README.md index 61a3126..a25e4a9 100644 --- a/README.md +++ b/README.md @@ -32,7 +32,16 @@ > Note that we are using the pretrained encoder/vocoder but synthesizer, since the original model is incompatible with the Chinese sympols. It means the demo_cli is not working at this moment. ### 2. Prepare your models You can either train your models or use existing ones: -#### 2.1. Train synthesizer with your dataset + +#### 2.1 Train encoder with your dataset (Optional) + +* Preprocess with the audios and the mel spectrograms: +`python encoder_preprocess.py ` Allowing parameter `--dataset {dataset}` to support the datasets you want to preprocess. Only the train set of these datasets will be used. Possible names: librispeech_other, voxceleb1, voxceleb2. Use comma to sperate multiple datasets. + +* Train the encoder: `python encoder_train.py my_run /SV2TTS/encoder` +> For training, the encoder uses visdom. You can disable it with `--no_visdom`, but it's nice to have. Run "visdom" in a separate CLI/process to start your visdom server. + +#### 2.2 Train synthesizer with your dataset * Download dataset and unzip: make sure you can access all .wav in folder * Preprocess with the audios and the mel spectrograms: `python pre.py ` @@ -43,7 +52,7 @@ Allowing parameter `--dataset {dataset}` to support aidatatang_200zh, magicdata, * Go to next step when you see attention line show and loss meet your need in training folder *synthesizer/saved_models/*. -#### 2.2 Use pretrained model of synthesizer +#### 2.3 Use pretrained model of synthesizer > Thanks to the community, some models will be shared: | author | Download link | Preview Video | Info | @@ -53,7 +62,7 @@ Allowing parameter `--dataset {dataset}` to support aidatatang_200zh, magicdata, |@FawenYo | https://drive.google.com/file/d/1H-YGOUHpmqKxJ9FRc6vAjPuqQki24UbC/view?usp=sharing https://u.teknik.io/AYxWf.pt | [input](https://github.com/babysor/MockingBird/wiki/audio/self_test.mp3) [output](https://github.com/babysor/MockingBird/wiki/audio/export.wav) | 200k steps with local accent of Taiwan, only works under version 0.0.1 |@miven| https://pan.baidu.com/s/1PI-hM3sn5wbeChRryX-RCQ code:2021 | https://www.bilibili.com/video/BV1uh411B7AD/ | only works under version 0.0.1 -#### 2.3 Train vocoder (Optional) +#### 2.4 Train vocoder (Optional) > note: vocoder has little difference in effect, so you may not need to train a new one. * Preprocess the data: `python vocoder_preprocess.py -m `