home > publications > a67b
Contact
Quantitative assessment of second language learners' fluency by means of automatic speech recognition technology.
Catia Cucchiarini, Helmer Strik, Lou Boves (2000)
A2RT, Dept. of Language & Speech, University of Nijmegen
P.O. Box 9103, 6500 HD Nijmegen, The Netherlands

J. of the Acoustical Society of America, Vol. 107 (2), pp. 989-999.

Quantitative assessment of second language learners' fluency by means of automatic speech recognition technology.

Abstract

To determine whether expert fluency ratings of read speech can be predicted on the basis of automatically calculated temporal measures of speech quality, an experiment was conducted with read speech of 20 native and 60 non-native speakers of Dutch. The speech material was scored for fluency by nine experts and was then analyzed by means of an automatic speech recognizer in terms of quantitative measures such as speech rate, articulation rate, number and length of pauses, number of dysfluencies, mean length of runs, and phonation/time ratio. The results show that expert ratings of fluency in read speech are reliable (Cronbach's varies between 0.90 and 0.96) and that these ratings can be predicted on the basis of quantitative measures: for six automatic measures the magnitude of the correlations with the fluency scores varies between 0.81 and 0.93. Rate of speech appears to be the best predictor: correlations vary between 0.90 and 0.93. Two other important determinants of reading fluency are the rate at which speakers articulate the sounds and the number of pauses they make. Apparently, rate of speech is such a good predictor of perceived fluency because it incorporates these two aspects. (c)2000 Acoustical Society of America.

PACS: 43.70.Kv, 43.71.Es, 43.71.Gv, 43.71.Hw


INTRODUCTION

The term fluency is routinely used by teachers and researchers to describe both native and non-native language performance. The fact that fluency is a frequently applied notion might suggest that there is general agreement as to its precise meaning. However, a review of relevant literature reveals that the term fluency has been used to refer to a wide range of different skills and different speech characteristics (e.g. Leeson, 1975; Fillmore, 1979; Brumfit, 1984; Lennon, 1990; Schmidt, 1992; Chambers, 1997).

In spite of this great variation, though, there is general agreement on two matters. First, although it is obvious that fluency can be used to describe written performance (Lennon, 1990), most authors restrict the use of the term to the oral modality. Furthermore, although some authors have underlined the importance of fluency-related factors in receptive processes (Leeson, 1975; Segalowitz, 1991), there seems to be a tacit agreement among teachers and researchers that fluency mainly refers to productive language performance. However, even this more restricted definition of fluency as a descriptor of oral production is amenable to different interpretations.

In considering the various possibilities, we may draw a distinction between fluency with respect to native language performance and fluency in the context of foreign language teaching and testing. In the latter case, fluency is viewed as an important criterion by which non-native performance can be judged (Riggenbach, 1991), despite the vagueness of the exact meaning of the concept. This is clear from the fact that fluency is often included in tests and evaluation schemes. With respect to native speakers' oral performance, fluency may be used to characterize the performance of a speaker, but does not really constitute an evaluation criterion. The term dysfluent, on the other hand, is often used in connection with certain speech disorders such as stuttering, where dysfluent speech is characterized by "an abnormally high frequency and/or duration of stoppages in the forward flow of speech" (Peters and Guitar 1991: 9).

In considering native speakers' oral production Fillmore (1979: 93) identifies four different abilities that might be subsumed under the term fluency: a) "the ability to talk at length with few pauses", b) "the ability to talk in coherent, reasoned, and "semantically dense" sentences", c) "the ability to have appropriate things to say in a wide range of contexts", and d) "the ability...to be creative and imaginative in...language use".

In foreign language teaching and testing various definitions of fluency are also found. For instance, in communicative language teaching the emphasis has been on fluency as opposed to accuracy. According to the definition provided by Brumfit (1984: 57) fluency is "the maximally effective operation of the language system so far acquired by the student". In this definition of fluency, native-speaker-like performance does not constitute the target to be achieved (Brumfit, 1984: 56). Alternatively, native-like performance is viewed as the final goal in the more common interpretation of fluency as a synonym for oral command of a language. In everyday language use, this definition may be extended to indicate overall language proficiency (Lennon, 1990; Chambers, 1997). Finally, in a more restricted sense, the term fluency has been used to refer to one aspect of oral proficiency, in particular the temporal aspect (Nation, 1989; Lennon, 1990; Riggenbach, 1991; Schmidt, 1992; Freed, 1995; Towel et al., 1996). However, even when the term fluency is used in this more limited sense, there is still uncertainty as to what exactly contributes to perceived fluency. It is this -admittedly rather vague- temporal interpretation of fluency that will be the focus of the present paper.

In trying to define the temporal aspect of fluency, it has often been assumed that the goal in language learning consists in producing "speech at the tempo of native speakers, unimpeded by silent pauses and hesitations, filled pauses...self-corrections, repetitions false starts and the like" (Lennon, 1990: 390). However, quantitative studies of pause-related phenomena have revealed that native speech is not always smooth and continuous, but exhibits a lot of hesitations and repairs (Raupach, 1983; Lennon, 1990; Riggenbach, 1991). This would seem to imply that the presence of hesitation phenomena is not sufficient to distinguish between natives and non-natives and that the difference rather lies in the frequency and distribution of these phenomena, as suggested by Möhle (1984). As a matter of fact, studies that have compared a number of quantitative fluency measures in L1 and L2 speech of the same speaker have shown that there may be considerable differences between the two speech types (Möhle 1984; Towell et al., 1996).

In an attempt to gain more insight into the temporal aspects of fluency, Lennon (1990), Riggenbach (1991) and Freed (1995) carried out studies in which samples of spontaneous speech produced by non-native speakers of English were judged by experts on fluency and were then analyzed in terms of quantitative variables such as speech rate, phonation-time ratio, mean length of runs, and number and length of pauses. The results of these studies show that fluency ratings are affected by quantitative variables such as speech rate and number of pauses. In addition, these studies also reveal that studying the relationship between fluency ratings and temporal variables in spontaneous speech may be rather complex, because in this case the fluency ratings turn out to be affected by non-temporal properties of speech utterances, such as grammar, vocabulary and accent (Lennon, 1990: 408; Riggenbach, 1991: 434; Freed, 1995: 135).


The aim of the research reported in this paper is to determine whether expert fluency ratings of read speech can be predicted on the basis of temporal measures of speech quality. The decision to limit this investigation to read speech is related to the methodological complexities involved in studying fluency in spontaneous speech. If the present approach appears to be feasible, it will be applied to spontaneous speech too. Identifying quantitative correlates of perceived fluency is important with a view to developing objective testing instruments for fluency assessment. An important characteristic of the present investigation is that the quantitative variables are calculated automatically. In turn this suggests that if the objective measures used in this study appear to be able to predict perceived fluency, this approach may have potential for the development of automatic tests of fluency in read speech.

The goal of this study will be pursued by relating expert fluency ratings of speech read by native and non-native speakers of Dutch with a set of quantitative measures of speech quality that are supposed to be related to perceived fluency. In this way it can be determined to what extent expert judgments of fluency can be predicted on the basis of automatically obtained temporal measures of speech quality. In other words, the expert fluency ratings will constitute the reference for the evaluation of the automatic fluency measures. Of course, this will be possible only if the expert ratings exhibit acceptable levels of reliability. To this end, we will ask different groups of raters to evaluate the same material on fluency. Moreover, each rater will be asked to score part of the material twice so that it will possible to establish reliability.

In addition, these analyses will make it possible to determine the contribution of the various quantitative variables to perceived fluency. In turn this will shed some light on the determinants of fluency in read speech.

Furthermore, since the data gathered in this investigation concern both natives and non-natives, this will offer the possibility of determining whether native and non-native speakers differ on the fluency ratings and on the temporal variables. It is clear that distinguishing between these two groups is not the aim of a fluency test, which, instead, should distinguish between fluent and non-fluent speakers. However, for the development of a test of this kind, data on native performance are necessary to establish benchmarks. Moreover, given that fluency is often equated with native-like performance (see above), it is interesting to determine whether the two groups of natives and non-natives significantly differ from each other on the variables under study.


IV. CONCLUSIONS

On the basis of the results of the present investigation we can draw the following conclusions. First, expert listeners are able to evaluate fluency with a high degree of reliability. Second, expert fluency ratings of read speech are mainly influenced by two factors: speed of articulation and frequency of pauses. Third, expert fluency ratings can be accurately predicted on the basis of automatically calculated measures such as rate of speech, articulation rate, phonation-time ratio, number and total duration of pauses and mean length of runs. Of all these measures rate of speech appears to be the best one. Fourth, native speakers are more fluent than non-natives and the temporal measures are significantly different for the two groups.

To conclude, these findings indicate that temporal measures of fluency may be employed to develop objective testing instruments of fluency in read speech. In turn, the fact that these measures can be automatically calculated by means of automatic speech recognition techniques suggests that this approach may contribute to developing automatic tests of fluency, at least for read speech. If we then consider that these results were obtained with telephone speech, then it seems that this approach is likely to have important consequences for the future of fluency assessment.



ACKNOWLEDGMENTS

This research was supported by SENTER (an agency of the Dutch Ministry of Economic Affairs), the Dutch National Institute for Educational Measurement (CITO), Swets Test Services of Swets and Zeitlinger and KPN. The research of Dr. H. Strik has been made possible by a fellowship of the Royal Netherlands Academy of Arts and Sciences. The authors would like to thank Tim Bunnel and an anonymous reviewer for their valuable comments and suggestions.



APPENDIX


Group 1 sentences


1) Vitrage is heel ouderwets en past niet bij een modern interieur.

2) De Nederlandse gulden is al lang even hard als de Duitse mark.

3) Een bekertje warme chocolademelk moet je wel lusten.

4) Door jouw gezeur zijn we nu al meer dan een uur te laat voor die afspraak.

5) Met een flinke garage erbij moet je genoeg opbergruimte hebben.


Group 2 sentences


1) Een foutje van de stuurman heeft het schip doen kapseizen.

2) Gelokt door een stukje kaas liep het muisje keurig in de val.

3) Het ziet er naar uit dat het deze week bij ons opnieuw gaat regenen.

4) Na die grote lekkage was het dure behang aan vervanging toe.

5) Geduldig hou ik de deur voor je open.


REFERENCES


Brumfit, C. (1984). Communicative Methodology in Language Teaching: The Roles of Fluency and Accuracy (Cambridge University Press, Cambridge).

Butcher, A. (1981). Phonetic correlates of perceived tempo in reading and spontaneous speech. Work in Progress, Phon Lab Univ. Reading, pp. 105-117.

Chambers, F. (1997). "What Do We Mean by Fluency?," System, 4, 535-544.

Dechert, H.W. and Raupach, M. (1980a). Temporal Variables in Speech: Studies in Honour of Frieda Goldman-Eisler (Mouton, The Hague).

Dechert, H.W. and Raupach, M. (1980b). Towards a Cross-Linguistic Assessment of Speech Production (Lang, Frankfurt).

den Os, E.A., Boogaart, T.I., Boves, L., and Klabbers, E., (1995) "The Dutch Polyphone Corpus," Proceedings Eurospeech95, 825-828.

Ferguson, G.A. (1987) Statistical Analysis in Psychology and Education (McGraw-Hill, Singapore).

Fillmore, C.J. (1979). "On Fluency," in Individual Differences in Language Ability and Language Behavior, edited by C. Fillmore, D. Kempler, and W.S.-Y. Wang (Academic, New York), pp. 85-101.

Flege, J.E. & K.L. Fletcher (1992). "Talker and listener effects on degree of perceived foreign accent", Journal of the Acoustical Society of America, 91 (1) 370-389.

Freed, B.F. (1995). "What Makes Us Think that Students Who Study Abroad Become Fluent?," in Second Language Acquisition in a Study-Abroad Context, edited by B.F. Freed (John Benjamins, Amsterdam), pp. 123-148.

Goldman-Eisler, F. (1968). Psycholinguistics: Experiments in Spontaneous Speech (Academic, New York).

Grosjean, F. (1980). "Temporal Variables Within and Between Languages," in Towards a Cross-Linguistic Assessment of Speech Production, edited by H.W. Dechert and M. Raupach (Lang, Frankfurt), pp.39-53.

Grosjean, F. and Deschamps, A. (1975). "Analyse Contrastive des Variables Temporelles de l'Anglais et du Francais: Vitesse de Parole et Variables Composantes, Phénomènes d'Hésitation," Phonetica, 31, 144-184.

Leeson, R. (1975). Fluency and Language Teaching (Longman, London).

Lennon, P. (1990). "Investigating Fluency in EFL: A Quantitative Approach," Language Learning, 3, 387-417.

Levelt, W.J.M., (1989) Speaking. From Intention to Articulation (MIT Press, Cambridge, MA).

Möhle, D., (1984) "A Comparison of the Second Language Speech Production of Different Native Speakers," in Second Language Productions, edited by H.W.Dechert, D. Möhle, and M. Raupach (Narr, Tübingen), pp. 26-49.

Nation, P. (1989). "Improving Speaking Fluency," System, 3, 377-384.

Peters, T.J. and Guitar, B. (1991) Stuttering. An Integrated Approach to Its Nature and Treatment (William and Wilkins, Baltimore).

Raupach, M. (1980). "Temporal Variables in First and Second Language Speech Production," in, Temporal Variables in Speech: Studies in Honour of Frieda Goldman-Eisler, edited by H.W. Dechert and M. Raupach (Mouton, The Hague), pp. 263-270.

Raupach, M. (1983). "Analysis and Evaluation of Communicative Strategies," in Strategies in Interlanguage Communication, edited by C. Faerch and G. Kasper (Longman, London), pp. 263-270.

Riggenbach, H. (1991). "Toward an Understanding of Fluency: A Microanalysis of Non-native Speaker Conversations," Discourse Processes, 14, 423-441.

Schmidt, R. (1992). "Psychological Mechanisms Underlying Second Language Fluency," Studies in Second Language Acquisition, 14, 357-385.

Segalowitz, N. (1991). "Does Advanced Skill in a Second Language Reduce Automaticity in the First Language?," Language Learning, 41, 59-83.

SPEX http://lands.let.ru.nl/spex.

Strik, H., Russel, A., Van den Heuvel, H., Cucchiarini, C., Boves, L., (1997). "A Spoken Dialog System for the Dutch Public Transport Information Service," International Journal of Speech Technology, 2,121-131.

Towell, R. (1987). "Approaches to the Analysis of the Oral Language Development of the Advanced Learner," in The Advanced Language Learner, edited by J.A. Coleman and R. Towell (CILT,London), pp. 157-181.

Towell, R., Hawkins, R., and Bazergui, N. (1996). "The Development of Fluency in Advanced Learners of French," Applied Linguistics, 1, 84-119.

Last updated on 22-05-2004