From 45bc43bf3c185aad2d7e56274fb11e567a0bb643 Mon Sep 17 00:00:00 2001 From: babysor00 Date: Wed, 8 Sep 2021 23:36:35 +0800 Subject: [PATCH] Add description of new vocoder and pretrained models --- README-CN.md | 24 ++++++++++-------------- README.md | 17 ++++++----------- 2 files changed, 16 insertions(+), 25 deletions(-) diff --git a/README-CN.md b/README-CN.md index 0921819..85ae1fe 100644 --- a/README-CN.md +++ b/README-CN.md @@ -8,13 +8,13 @@ ### [DEMO VIDEO](https://www.bilibili.com/video/BV1sA411P7wM/) ## 特性 -🌍 **中文** 支持普通话并使用多种中文数据集进行测试:adatatang_200zh, magicdata, aishell3 +🌍 **中文** 支持普通话并使用多种中文数据集进行测试:adatatang_200zh, magicdata, aishell3, biaobei,MozillaCommonVoice 等 🤩 **PyTorch** 适用于 pytorch,已在 1.9.0 版本(最新于 2021 年 8 月)中测试,GPU Tesla T4 和 GTX 2060 -🌍 **Windows + Linux** 在修复 nits 后在 Windows 操作系统和 linux 操作系统中进行测试 +🌍 **Windows + Linux** 可在 Windows 操作系统和 linux 操作系统中运行(苹果系统M1版也有社区成功运行案例) -🤩 **Easy & Awesome** 仅使用新训练的合成器(synthesizer)就有良好效果,复用预训练的编码器/声码器 +🤩 **Easy & Awesome** 仅需下载或新训练合成器(synthesizer)就有良好效果,复用预训练的编码器/声码器,或实时的HiFi-GAN作为vocoder ## 快速开始 > 0训练新手友好版可以参考 [Quick Start (Newbie)](https://github.com/babysor/Realtime-Voice-Clone-Chinese/wiki/Quick-Start-(Newbie)) @@ -49,9 +49,10 @@ ### 2.2 使用预先训练好的合成器 > 实在没有设备或者不想慢慢调试,可以使用网友贡献的模型(欢迎持续分享): -| 作者 | 下载链接 | 效果预览 | -| --- | ----------- | ----- | -|@miven| https://pan.baidu.com/s/1PI-hM3sn5wbeChRryX-RCQ 提取码:2021 | https://www.bilibili.com/video/BV1uh411B7AD/) +| 作者 | 下载链接 | 效果预览 | 信息 | +| --- | ----------- | ----- | ----- | +|@FawenYo | https://drive.google.com/file/d/1H-YGOUHpmqKxJ9FRc6vAjPuqQki24UbC/view?usp=sharing [百度盘链接](https://pan.baidu.com/s/1vSYXO4wsLyjnF3Unl-Xoxg) 提取码:1024 | [input](https://github.com/babysor/MockingBird/wiki/audio/self_test.mp3) [output](https://github.com/babysor/MockingBird/wiki/audio/export.wav) | 200k steps 台湾口音 +|@miven| https://pan.baidu.com/s/1PI-hM3sn5wbeChRryX-RCQ 提取码:2021 | https://www.bilibili.com/video/BV1uh411B7AD/ | 150k steps 旧版需根据[issue](https://github.com/babysor/MockingBird/issues/37)修复 ### 2.3 训练声码器 (Optional) * 预处理数据: @@ -66,20 +67,15 @@ > Good news🤩: 可直接使用中文 -## TODO -- [X] 允许直接使用中文 -- [X] 添加演示视频 -- [X] 添加对更多数据集的支持 -- [X] 上传预训练模型 -- [ ] 支持parallel tacotron -- [ ] 服务化与容器化 -- [ ] 🙏 欢迎补充 +## Release Note +2021.9.8 新增Hifi-GAN Vocoder支持 ## 引用及论文 > 该库一开始从仅支持英语的[Real-Time-Voice-Cloning](https://github.com/CorentinJ/Real-Time-Voice-Cloning) 分叉出来的,鸣谢作者。 | URL | Designation | 标题 | 实现源码 | | --- | ----------- | ----- | --------------------- | +| [2010.05646](https://arxiv.org/abs/2010.05646) | HiFi-GAN (vocoder)| Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis | 本代码库 | |[**1806.04558**](https://arxiv.org/pdf/1806.04558.pdf) | **SV2TTS** | **Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis** | This repo | |[1802.08435](https://arxiv.org/pdf/1802.08435.pdf) | WaveRNN (vocoder) | Efficient Neural Audio Synthesis | [fatchord/WaveRNN](https://github.com/fatchord/WaveRNN) | |[1703.10135](https://arxiv.org/pdf/1703.10135.pdf) | Tacotron (synthesizer) | Tacotron: Towards End-to-End Speech Synthesis | [fatchord/WaveRNN](https://github.com/fatchord/WaveRNN) diff --git a/README.md b/README.md index bcc0104..ca612f7 100644 --- a/README.md +++ b/README.md @@ -6,11 +6,11 @@ > English | [中文](README-CN.md) ## Features -🌍 **Chinese** supported mandarin and tested with multiple datasets: aidatatang_200zh, magicdata, aishell3 +🌍 **Chinese** supported mandarin and tested with multiple datasets: aidatatang_200zh, magicdata, aishell3, and etc. 🤩 **PyTorch** worked for pytorch, tested in version of 1.9.0(latest in August 2021), with GPU Tesla T4 and GTX 2060 -🌍 **Windows + Linux** tested in both Windows OS and linux OS after fixing nits +🌍 **Windows + Linux** run in both Windows OS and linux OS (even in M1 MACOS) 🤩 **Easy & Awesome** effect with only newly-trained synthesizer, by reusing the pretrained encoder/vocoder @@ -49,8 +49,9 @@ Allow parameter `--dataset {dataset}` to support adatatang_200zh, magicdata, ais ### 2.2 Use pretrained model of synthesizer > Thanks to the community, some models will be shared: -| author | Download link | Previow Video | -| --- | ----------- | ----- | +| author | Download link | Preview Video | Info | +| --- | ----------- | ----- |----- | +|@FawenYo | https://drive.google.com/file/d/1H-YGOUHpmqKxJ9FRc6vAjPuqQki24UbC/view?usp=sharing [Baidu Pan](https://pan.baidu.com/s/1vSYXO4wsLyjnF3Unl-Xoxg) Code:1024 | [input](https://github.com/babysor/MockingBird/wiki/audio/self_test.mp3) [output](https://github.com/babysor/MockingBird/wiki/audio/export.wav) | 200k steps with local accent of Taiwan |@miven| https://pan.baidu.com/s/1PI-hM3sn5wbeChRryX-RCQ code:2021 | https://www.bilibili.com/video/BV1uh411B7AD/ > A link to my early trained model: [Baidu Yun](https://pan.baidu.com/s/10t3XycWiNIg5dN5E_bMORQ) @@ -72,19 +73,13 @@ or > Good news🤩: Chinese Characters are supported -## TODO -- [x] Add demo video -- [X] Add support for more dataset -- [X] Upload pretrained model -- [ ] Support parallel tacotron -- [ ] Service orianted and docterize -- 🙏 Welcome to add more ## Reference > This repository is forked from [Real-Time-Voice-Cloning](https://github.com/CorentinJ/Real-Time-Voice-Cloning) which only support English. | URL | Designation | Title | Implementation source | | --- | ----------- | ----- | --------------------- | +| [2010.05646](https://arxiv.org/abs/2010.05646) | HiFi-GAN (vocoder)| Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis | This repo | |[**1806.04558**](https://arxiv.org/pdf/1806.04558.pdf) | **SV2TTS** | **Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis** | This repo | |[1802.08435](https://arxiv.org/pdf/1802.08435.pdf) | WaveRNN (vocoder) | Efficient Neural Audio Synthesis | [fatchord/WaveRNN](https://github.com/fatchord/WaveRNN) | |[1703.10135](https://arxiv.org/pdf/1703.10135.pdf) | Tacotron (synthesizer) | Tacotron: Towards End-to-End Speech Synthesis | [fatchord/WaveRNN](https://github.com/fatchord/WaveRNN)