Recommendation P.50ARTIFICIAL VOICES(Melbourne, 1988; amended at Helsinki, 1993, Geneva, 1999)The signal here described reproduces the characteristics of human speech for the purposes of characterizing linear and non-linear telecommunication systems and devices, which are intended for the transduction or transmission of speech. It is known that for some purposes, such as objective loudness rating measurements, more simple signals can be used as well. Examples of such signals are pink noise or spectrum-shape Gaussian noise, which nevertheless cannot be referred to as "artificial voice" for the purpose of this Recommendation.The artificial voice is a signal that is mathematically defined and that reproduces the time and spectral characteristics of speech which significantly affect the performances of telecommunication systems. Two kinds of artificial voice are defined, reproducing respectively the spectral characteristics of female and male speech.The following time and spectral characteristics of real speech are reproduced by the artificial voice:a) long-term average spectrum;b) short-term spectrum;c) instantaneous amplitude distribution;d) voiced and unvoiced structure of speech waveform;e) syllabic envelope.Appendix I/P.50 includes a CD-ROM containing useful test signals. The signals on this CD‑ROM include the signal described in Recommendation P.50 as well as other signals that have been found useful by some Administrations. Additionally, the full speech database that was used to develop Recommendation P.50 is also on this CD-ROM. Appendix I/P.50 is published separately.The artificial voice is aimed at reproducing the characteristics of real speech over the bandwidth 100 Hz-8 kHz. It can be utilized for characterizing many devices, e.g. carbon microphones, loudspeaking telephone sets, nonlinear coders, echo controlling devices, syllabic compandors, nonlinear systems in general.The artificial voice described in this Recommendation is mainly used for objective evaluation of speech processing systems and devices, in which a single-channel signal with continuous activity (i.e. without pauses) is sufficient for measuring characteristics. An example is evaluation of speech codecs. For objective evaluation that needs two signals with pauses (e.g. evaluation of devices with speech detectors), the artificial conversational speech signal described in Recommendation P.59 should be used.The use of the artificial voice instead of real speech has the advantage of both being more easily generated and having a smaller variability than samples of real voice. Of course, when a particular system is tested, the characteristics of the transmission path preceding it are to be considered. The actual test signal has then to be produced as the convolution between the artificial voice and the path response.The artificial voice is a signal, mathematically defined, which reproduces all human speech characteristics, relevant to the characterization of linear and nonlinear telecommunication systems. It is intended to give a satisfactory correlation between objective measurements and real speech tests.
推荐标准 P.50
仿真语音
(墨尔本,1988;于赫尔辛基修订,1993;日内瓦,1999)
1. 介绍
本标准描述的信号复制了人类语言的特征,以表现用于转换和传输语音的线性和非线性通信系统和设备的特点。基于某些目的,我们希望更简单的信号也可以用于主观响度率测量,如粉红噪声和高斯谱型噪声等仿真语音都是值得推荐的选择。
仿真语音是通过数学上定义,复制了主要是影响通信系统的性能的语音时域和频域特性的信号。本信号共定义了两种仿真语音,分别包含了男性和女性语音的谱型特征。
如下就是仿真语音所复制的真实语音的时域和频域特征。
A) 长期平均谱型
B) 短期平均谱型
C) 瞬态振幅分布
D) 语言波形的有声和无声结构
附件I/P.50包含了一张CD-ROM,里面包含了有用的测试信号。和其它信号一样,此CD-ROM包括了P.50推荐标准中所描述的信号已经被一些相关部门证实其实用性了。另外,所有用于表征P.50推荐标准的语音数据库也都包含在此CD-ROM中。而附件I/P.50会单独印制。
2,频率范围,目的及定义
2.1 频率范围和目的
此仿真语音的目标是复制从100Hz到8kHz带宽的真实语音的特征。它可用于很多设备,大概有碳粒麦克风,扬声器电话机,非线性编码器,回声控制设备,音节压缩扩展器,非线性系统等。
在此推荐标准中描述的仿真语音主要用于语音处理系统和设备的客观评估,一个单通道持续信号(无断续)已经足够其特性测试了。语音编码的评估就是一个例子。对于需要有停顿双信号(如语音探测器设备的评估)的主观评估,可以使用P.59推荐标准中所描述的仿真对话语音。
和真实的语音样本相比,使用仿真语音代替真实语音的优势是它更容易生成且具有更小的变化性特点。
当然,当测试特殊的系统时,之前所述的传输特性需要重新考虑。实际的测试信号就需要在仿真语音和通路相应之间互相反馈。
2.2 定义
仿真语音是一个数学上定义的,复制了所有人类语音特征,和线性及非线性通信系统相关的测试用信号。它可以比较令人满意地实现客观测试和主管语音测试相结合的效果。
|
|