ITU-T P.863-2018
Perceptual objective listening quality prediction (Study Group 12)

Standard No.
ITU-T P.863-2018
Release Date
2018
Published By
ITU-T - International Telecommunication Union/ITU Telcommunication Sector
Latest
ITU-T P.863-2018
Scope
This Recommendation1 defines a single algorithm for assessing the speech quality of current and near future telephony systems that utilize a broad variety of coding@ transport and speech enhancement technologies. Based on the benchmark results presented within the studies of ITU-T@ an overview of the test factors@ coding technologies and applications to which this Recommendation applies is given in Tables 1 to 4. Table 1 presents factors and applications included in the requirement specification and which were used in the selection phase of the ITU-T P.863 algorithm. It should be noted that the performance of the ITU-T P.863 algorithm under each individual condition in Table 1 is not reflected in this table. Table 2 presents a list of conditions for which this Recommendation is not intended to be used. Table 3 presents test variables for which further investigation is needed@ or for which ITU-T P.863 is subject to claims of providing inaccurate predictions when used in conjunction with these. Finally@ Table 4 lists factors@ technologies and applications for which the ITU-T P.863 algorithm has not currently been validated. Note that the ITU-T P.863 algorithm cannot be used to replace subjective testing. It should also be noted that the ITU-T P.863 algorithm does not provide a comprehensive evaluation of transmission quality. It only measures the effects of one-way speech distortion and noise on speech quality. The effects of delay@ sidetone@ echo@ and other impairments related to two-way interaction (e.g.@ centre clipper) are not reflected in the ITU-T P.863 scores. Therefore@ it is possible to have high ITU-T P.863 scores@ yet poor overall conversational quality. It has to be noted that ITU-T P.863 is more insensitive to very low noise floors in fullband mode than in narrowband mode. The test set of ITU-T P.863 covers the following languages: American English@ British English@ Chinese (Mandarin)@ Czech@ Dutch@ French@ German@ Italian@ Japanese@ Swedish@ Swiss German. The subjective experiments were conducted in subjective test laboratories located in these respective countries. ITU-T P.863 is the next-generation voice quality testing technology for fixed@ mobile and IP-based networks. ITU-T P.863 has been selected to form the new ITU-T voice quality-testing standard. This Recommendation was developed between 2006 and 2010 in a competition carried out by ITU-T@ in order to define a technology update for [b-ITU-T P.862]. The purpose of the objective ITU-T P.863 model is to predict overall listening speech quality for narrowband (300 to 3 400 Hz)@ wideband (50 to 7 000 Hz)@ super-wideband (50 to 14 000 Hz) and fullband (20 to 20 000 Hz) telecommunication scenarios as perceived by the user. This includes all speech-processing components usually considered for telecommunications in clean and noisy conditions. The term 'listening speech quality' means the overall speech quality as perceived and scored by human subjects in an absolute category rating experiment according to [ITU-T P.800] or [ITU-T P.830]. In fullband mode@ ITU-T P.863 scores are predicted on a MOS ACR fullband scale; details on the experiment design are provided in Appendix II. In narrowband mode@ ITU-T P.863 scores are predicted on a MOS ACR narrowband scale. The model outputs in the two modes are referred to as MOS-LQOn and MOS-LQOf. As is the case for [b-ITU-T P.861] and [b-ITU-T P.862]@ the approach of ITU-T P.863 is called 'fullreference' or 'double-ended'@ which means that the quality prediction is based on the comparison between an undistorted reference signal and the received signal to be scored. ITU-T P.863 can be applied to signals recorded at an electrical interface (as was the case for [b-ITU-T P.862]) but also to ?C in case of fullband operation mode ?C signals recorded using an artificial ear simulator. Other technologies or components@ such as speech storage formats@ or non-telephony applications@ such as public safety networks or professional mobile radio connections@ were not part of the competition and the selection criteria. ITU-T P.863 operational modes It is important to understand and consider the two different operational modes supported by ITU-T P.863: ? fullband@ and ? narrowband. Table 5 summarizes the applicability of ITU-T P.863 operational modes to different telecommunication scenarios. The main difference between both modes is the bandwidth of the reference speech signal used by the model as well as the frequency range where distortions will be detected. In fullband mode@ the received (and potentially degraded) speech signal is being compared to a fullband reference. Consequently@ band-limitations are considered as degradations and are scored accordingly. The listening quality is modelled as perceived by a human listener using a diffuse-field equalized headphone with diotic presentation (same signal at both ear-caps). The prediction uses a fullband listening quality scale where the ITU-T P.863 algorithm saturates at MOS-LQOf = 4.8 for a transparent fullband signal. The fullband signals were assessed in an ITU-T P.800 ACR listeningonly test in the ITU-T P.863 evaluation phase. In contrast@ in narrowband mode the received (and potentially degraded) speech signal is being compared to a narrowband (300 to 3 400 Hz) reference. Consequently@ normal telephone bandlimitations are not considered as severe degradations and are scored less. This narrowband mode maintains the compatibility to previously developed models such as [b-ITU-T P.862] in conjunction with [b-ITU-T P.862.1]. The listening quality is modelled as perceived by a human listener using a loosely coupled IRS type handset at one ear (monotic presentation). The prediction uses the common narrow-band listening quality scale where the ITU-T P.863 algorithm saturates at MOS-LQOn = 4.5 for a transparent narrowband signal. NOTE 1 ?C For the two operational modes@ the quality ratings are obtained on two different scales@ namely the traditional scale for the narrowband mode and the future oriented scale for the fullband mode. NOTE 2 ?C Acoustical recordings@ as well as the influence of the presentation level@ can only be predicted in fullband operational mode. The narrowband operational mode is restricted to electrical recordings and a nominal presentation for compatibility with [b-ITU-T P.862] in conjunction with [b-ITU-T P.862.1] application areas. NOTE 3 ?C For backward compatibility@ super-wideband reference files can be used in fullband mode as well. The difference between a super-wideband and a fullband signal is too small to be identified in a typical ACR test and therefore ITU-T P.863 will score them as equivalent. Distortions in the frequency range between superwideband and fullband will however be taken into account and may lead to significant differences between plain super-wideband and fullband measurements. If the degraded signal is also bandwidth limited to the superwideband range@ the results of P.863 Ed. 3 and P.863 Ed. 2.0 will be equivalent. 1 This Recommendation includes an electronic attachment containing detailed descriptions in pdf format (see Annex B) and conformance testing data (see Annex A).

ITU-T P.863-2018 history

  • 2018 ITU-T P.863-2018 Perceptual objective listening quality prediction (Study Group 12)



Copyright ©2024 All Rights Reserved