Conformance testing for voice over IP transmission quality assessment models
|Publication Date:||1 November 2007|
This Recommendation specifies minimum criteria for objective speech quality assessment models that predict the impact of observed IP network impairments on the one-way listening quality experienced by the end-user in IP/UDP/RTP-based 3.1-kHz narrow-band telephony applications. An extension to 7 kHz wideband telephony is also provided in Annex B.
It is expected that the primary applications for such models are monitoring of transmission quality for operations and maintenance purposes, and measurements in support of service level agreements (SLAs) between service providers and their customers.
Models compliant with this Recommendation predict mean opinion scores (MOS) on the ACR listening quality scale. Their performance is based and estimated on the MOS scale (as defined in [ITU-T P.800]1). The primary quality prediction made by such a model is not based on the payload of the RTP stream being analysed, but assumes a typical, or generic, voice payload. Some additional diagnostic outputs may be based on the payload, if available. A model compliant with this Recommendation should always take the voice codec into account. If any input parameter is unavailable and an assumed value is used, this fact should be reported.
A model compliant with this Recommendation cannot provide a comprehensive end-to-end evaluation of transmission quality because its scores can only reflect the impairments on the IP network being measured that may only be part of the end-to-end connection.
The effects of speech level, acoustic background noise, delay, sidetone, echo and other impairments related to the payload are not reflected in the scores computed by such a model. Therefore, it is possible to have high scores with a model compliant with this Recommendation, yet have a poor quality of the connection overall.
The accuracy criteria herein were derived with the intent to avoid the frequent occurrence of "false positive" or "false negative" errors. This is an especially important consideration when a model compliant with this Recommendation is used, for example, in assessing compliance with SLAs.
The criteria for the model described in this Recommendation are applicable to devices that may reside anywhere within the packet transport network, including edge devices. As such, each will be able to use only information present at the location deployed. However, this does include information that can be extracted from RTCP-SR, RR and XR.
The accuracy criteria are based on a comparison of a model's performance with the P.862 perceptual evaluation of speech quality (PESQ) algorithm using the output mapping defined in [ITU-T P.862.1] for 3.1-kHz narrow-band telephony, hereafter referred to as P.862.1. Hence, compliance to this Recommendation shall only be claimed for factors, technologies or applications that are within the scope of P.862, or for which the operation of P.862 has been verified against subjective test data.
The speech test material described in clause 10.2.1 is provided in an electronic attachment and forms an integral and normative part of this Recommendation.
1 These scores can be translated into Ie-eff values using the formulae given in Annex B of [ITU-T G.107] (equation B-4) and in Appendix I of [ITU-T G.107] (equations I-1, I-2 and I-3).