shartoo +

merlin语音合成讲义三:系统回归

本文总阅读量
欢迎star我的博客

1 概览

前馈神经网络

1.1 方向

但是

1.2 sequence-to-sequence

TTS merlin技术路线

TTS merlin技术路线

1.3 sequence-to-sequence中的对齐

04_prepare_conf_files.sh

echo "preparing config files for acoustic, duration models..."
./scripts/prepare_config_files.sh $global_config_file
echo "preparing config files for synthesis..."
./scripts/prepare_config_files_for_synthesis.sh $global_config_file

05_train_duration_model.sh

./scripts/submit.sh ${MerlinDir}/src/run_merlin.py $duration_conf_file

config files

[DEFAULT]
Merlin: <path to Merlin root directory>
TOPLEVEL: <path where experiments are created>
[Paths]
# where to place work files
work: <path where data, log, models and generated data are stored and created>
# where to find the data
data: %(work)s/data
# where to find intermediate directories
inter_data: %(work)s/inter_module
# list of file basenames, training and validation in a single list
file_id_list: %(data)s/file_id_list.scp
test_id_list: %(data)s/test_id_list.scp
in_mgc_dir: %(data)s/mgc
in_bap_dir : %(data)s/bap
[Labels]
enforce_silence: False
silence_pattern: ['*-sil+*']
# options: state_align or phone_align
label_type: state_align
label_align: <path to labels>
question_file_name: <path to questions set>
add_frame_features: True
# options: full, coarse_coding, minimal_frame, state_only, frame_only, none
subphone_feats: full
[Outputs]
# dX should be 3 times X
mgc : 60
dmgc : 180
bap : 1
dbap : 3
lf0 : 1
dlf0 : 3
[Waveform]
[Outputs]
# dX should be 3 times X
mgc : 60
dmgc : 180
bap : 1
dbap : 3
lf0 : 1
dlf0 : 3
[Waveform]
test_synth_dir: None
# options: WORLD or STRAIGHT
vocoder_type: WORLD
samplerate: 16000
framelength: 1024
# Frequency warping coefficient used to compress the spectral envelope into MGC (or MCEP)
fw_alpha: 0.58
minimum_phase_order: 511
use_cep_ap: True
[Architecture]
switch_to_keras: False
hidden_layer_size : [1024, 1024, 1024, 1024, 1024, 1024]
hidden_layer_type : ['TANH', 'TANH', 'TANH', 'TANH', 'TANH', 'TANH']
model_file_name: feed_forward_6_tanh
#if RNN or sequential training is used, please set sequential_training to True.
sequential_training : False
dropout_rate : 0.0
batch_size : 256
# options: -1 for exponential decay, 0 for constant learning rate, 1 for linear decay
lr_decay : -1
learning_rate : 0.002
# options: sgd, adam, rprop
optimizer : sgd
warmup_epoch : 10
training_epochs : 25
[Processes]
# Main processes
AcousticModel : True
GenTestList : False
# sub-processes
NORMLAB : True
MAKECMP : True
NORMCMP : True
TRAINDNN : True
DNNGEN : True
GENWAV : True
CALMCD : True

06_train_acoustic_model.sh

./scripts/submit.sh ${MerlinDir}/src/run_merlin.py $acoustic_conf_file

07_run_merlin.sh

inp_txt=$1
test_dur_config_file=$2
test_synth_config_file=$3
echo "preparing full-contextual labels using Festival frontend..."
lab_dir=$(dirname $inp_txt)
./scripts/prepare_labels_from_txt.sh $inp_txt $lab_dir $global_config_file
echo "synthesizing durations..."
./scripts/submit.sh ${MerlinDir}/src/run_merlin.py $test_dur_config_file
echo "synthesizing speech..."
./scripts/submit.sh ${MerlinDir}/src/run_merlin.py $test_synth_config_file

2 设计选择:声学模型

2.1 方向

3 波形生成(waveform generator)

3.1 从声学(acoustic)特征回到原始声码器(vocoder)特征

TTS merlin技术路线

3.2 WORLD:periodic excitation using a pulse train

TTS merlin技术路线

3.4 WORLD:重构周期性和非周期性的幅度频谱(magnitude spectra)

TTS merlin技术路线 TTS merlin技术路线

3.5 WORLD:生成波形

TTS merlin技术路线

4 拓展

4.1 经典单元选取

此处以音素单元为例,目标和join cost

TTS merlin技术路线 TTS merlin技术路线 TTS merlin技术路线

4.2 独立特征形式(Independent Feature Formulation(IFF))目标损失

TTS merlin技术路线

4.3 声学空间形式(Acoustic Space Formulation)目标损失

TTS merlin技术路线

4.4 混合语音合成就像使用Acoustic Space Formulation目标损失的单元选取

TTS merlin技术路线 TTS merlin技术路线

4.5 混合语音合成就像:统计参数语音合成,使用声码器(vocoder)的替换

TTS merlin技术路线 TTS merlin技术路线 TTS merlin技术路线 TTS merlin技术路线

4.6 混合语音合成就像:同时对目标和join cost使用混合密度网络

TTS merlin技术路线

7 声音转换

将源声转换为另外一个人的声音,而不改变声音内容

TTS merlin技术路线

使用神经网络完成

TTS merlin技术路线

7.1 输入和输出的声学特征的抽取和工程

TTS merlin技术路线 TTS merlin技术路线

7.2 输入输出的对齐

TTS merlin技术路线

7.3 最简单的方法:对齐输入和输出特征+逐帧回归

TTS merlin技术路线

7.4 当然,我们也可以用前馈神经网络做得更好

TTS merlin技术路线

我们可以使用Merlin/egs/voice_conversion/s1/目录下的脚本完成这个工作

03_align_src_with_target.sh

src_feat_dir=$1
tgt_feat_dir=$2
src_aligned_feat_dir=$3
src_mgc_dir=$src_feat_dir/mgc
tgt_mgc_dir=$tgt_feat_dir/mgc
echo "Align source acoustic features with target acoustic features..."
python ${MerlinDir}/misc/scripts/voice_conversion/dtw_aligner_festvox.py ${MerlinDir}/tools
${src_feat_dir} ${tgt_feat_dir} ${src_aligned_feat_dir} ${bap_dim}

TTS merlin技术路线

8 讲话人调整(Speaker Adaptation)

TTS merlin技术路线

8.1 使用DNN方法的讲话人调整

TTS merlin技术路线

共享层和hot swapping

TTS merlin技术路线

我的博客

观点

源码