MockingBird/synthesizer/utils/symbols.py

"""
Defines the set of symbols used in text input to the model.

The default is a set of ASCII characters that works well for English or text that has been run
through Unidecode. For other data, you can modify _characters. See TRAINING_DATA.md for details.
"""
# from . import cmudict

_pad        = "_"
_eos        = "~"
_characters = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz1234567890!\'(),-.:;? '

#_characters = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz12340!\'(),-.:;? ' # use this old one if you want to train old model 
# Prepend "@" to ARPAbet symbols to ensure uniqueness (some are the same as uppercase letters):
#_arpabet = ["@' + s for s in cmudict.valid_symbols]

# Export all symbols:
symbols = [_pad, _eos] + list(_characters) #+ _arpabet
Init to support Chinese Dataset. 2021-08-07 11:56:00 +08:00			`"""`
			`Defines the set of symbols used in text input to the model.`

			`The default is a set of ASCII characters that works well for English or text that has been run`
			`through Unidecode. For other data, you can modify _characters. See TRAINING_DATA.md for details.`
			`"""`
			`# from . import cmudict`

			`_pad = "_"`
			`_eos = "~"`
【bugfix】 fix bug causing non-sense output for long texts 修复多段文字发音错误 2021-08-22 23:44:25 +08:00			`_characters = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz1234567890!\'(),-.:;? '`
Fix compatibility issue of symbols 2021-08-29 00:45:49 +08:00
			`#_characters = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz12340!\'(),-.:;? ' # use this old one if you want to train old model`
Init to support Chinese Dataset. 2021-08-07 11:56:00 +08:00			`# Prepend "@" to ARPAbet symbols to ensure uniqueness (some are the same as uppercase letters):`
			`#_arpabet = ["@' + s for s in cmudict.valid_symbols]`

			`# Export all symbols:`
			`symbols = [_pad, _eos] + list(_characters) #+ _arpabet`