Subjective evaluation of speech quality with a crowdsourcing approach
|Publication Date:||1 June 2018|
This Recommendation contains advice to administrations on conducting subjective tests of speech quality with a crowdsourcing approach. It focuses on listening tests and absolute category rating (ACR) tasks. Other rating tasks, such as degradation category rating (DCR) and comparison category rating (CCR), as well as conversational tests in the crowd are still under study in ITU-T Study Group 12. The method described here is to be seen as complementary to the recommended methods in [ITU-T P.800]; the latter methods are carried out in a laboratory environment which is better controlled, whereas the crowdsourcing-based method described here covers a wider range of realistic listening environments and devices and thus their external validity may be higher.
Crowdsourcing-based methods are not expected to replace laboratory testing, as there are fundamental differences between both methods regarding their conception, the participants and their motivation, as well as technical and environmental factors, as detailed in [b-ITU-T Technical]. As a consequence, the results from crowdsourcing-based methods can be expected to deviate to a certain extent from those of laboratory testing. Depending on the target of the evaluation, the appropriate method has to be selected.
Further guidance on the general approach of crowdsourcing-based testing can be found in [b-ITU-T Technical].