Does AI Powered Speech-to-Text limit interviewer bias?

Interviewer bias refers to a systematic error in the data collected by means of an interview with a human interviewer. Answers given by panellists can be inaccurate for several reasons. For example, a panellist might give an answer that will lead to a positive evaluation of a product, which is referred to as acquiescence bias. A panellist might also give an answer they think is socially desirable, which is referred to as social desirability bias. A potential way to reduce these biases is to use Text-to-Speech and Speech-to-Text techniques. For more information on these biases, see our blog Are product evaluations with a human interviewer accurate?


We tested the effect of survey method on these two biases by comparing two survey methods: an interview with a human interviewer and a survey where Text-to-Speech and Speech-to-Text techniques were used. Our panellists participated in both survey methods and results were compared to see if the answers differed from each other.

To test the acquiescence bias panellists rated a sweet snack, Fruittella, on six different attributes: appearance, odor, taste, texture, aftertaste and general opinion. Then, a satisfaction score was calculated. The higher the score, the higher the satisfaction with the product. The panellists rated the same product twice, but they were told that one product was the regular Fruittella variant and that the other product was a new Fruittella variant, which was said to be a healthier version of the regular variant. This was done to test the effect of a health claim. To test the social desirability bias, panellists were asked which variant of the Fruittella they preferred. In addition, they answered questions about their food consumption behavior concerning the frequency of snack consumption and about the importance of health in food choices. From these questions, an unhealthiness score was calculated. The higher the score, the unhealthier the food consumption behavior of the panellists.


For the survey in which Text-to-Speech and Speech-to-Text techniques were used, a questionnaire with Text-to-Speech and Speech-to-Text functions was created in EyeQuestion Software. The questions were read out loud and the panellists answered these questions by using their voice, which would be automatically registered by the computer. For the interview with a human interviewer, the same questions were asked by a human interviewer via a videocall.

The results showed a significant difference in the satisfaction score between the survey methods. In the interview with a human interviewer, panellists rated the new variant significantly higher than in the survey where Text-to-Speech and Speech-to-Text techniques were used. This was not the case for the regular variant. Within the human interviewer session, panellists had a significant preference for the new variant. In the survey where Text-to-Speech and Speech-to-Text techniques were used, no preference was found. See figure 1. 

 

 
Figure 1. Frequencies of the panellists’ preference for the new or regular product for the expected model, which is based on chance, and the observed model.
 

These results provide evidence for acquiescence bias as well as for social desirability bias. However, the results failed to show a difference in how unhealthy the panellists rated their food consumption behavior between the two different interview methods. This can possibly be explained by the carry-over effect, where panellists tried to give the same answer in the second session as they did in the first session.

 

In conclusion, there does seem to be an interviewer bias when collecting data by means of an interview with a human interviewer. Using Text-to-Speech and Speech-to-Text techniques can limit the interviewer bias and will therefore lead to more accurate results.