Quantitative assessment of second language learners' fluency
by means of automatic speech recognition technology.
Catia Cucchiarini, Helmer Strik, Lou Boves (2000)
A2RT,
Dept. of Language & Speech, University of Nijmegen
P.O. Box 9103, 6500 HD Nijmegen, The Netherlands
J. of the Acoustical Society of America, Vol. 107 (2), pp. 989-999.
Quantitative assessment of second language learners' fluency
by means of automatic speech recognition technology.
Abstract
To determine whether expert fluency ratings of read speech can be
predicted on the basis of automatically calculated temporal
measures of speech quality, an experiment was conducted with read speech
of 20 native and 60 non-native speakers of Dutch. The
speech material was scored for fluency by nine experts and was then
analyzed by means of an automatic speech recognizer in terms of
quantitative measures such as speech rate, articulation rate, number and
length of pauses, number of dysfluencies, mean length of
runs, and phonation/time ratio. The results show that expert ratings of
fluency in read speech are reliable (Cronbach's varies
between 0.90 and 0.96) and that these ratings can be predicted on the
basis of quantitative measures: for six automatic measures the
magnitude of the correlations with the fluency scores varies between 0.81
and 0.93. Rate of speech appears to be the best predictor:
correlations vary between 0.90 and 0.93. Two other important determinants
of reading fluency are the rate at which speakers articulate
the sounds and the number of pauses they make. Apparently, rate of speech
is such a good predictor of perceived fluency because it
incorporates these two aspects. (c)2000 Acoustical Society of America.
PACS: 43.70.Kv, 43.71.Es, 43.71.Gv, 43.71.Hw
INTRODUCTION
The term fluency is routinely used by teachers and researchers to describe both
native and non-native language performance. The fact that fluency is a frequently
applied notion might suggest that there is general agreement as to its precise meaning.
However, a review of relevant literature reveals that the term fluency has been used to
refer to a wide range of different skills and different speech characteristics (e.g. Leeson,
1975; Fillmore, 1979; Brumfit, 1984; Lennon, 1990; Schmidt, 1992; Chambers, 1997).
In spite of this great variation, though, there is general agreement on two matters.
First, although it is obvious that fluency can be used to describe written performance
(Lennon, 1990), most authors restrict the use of the term to the oral modality.
Furthermore, although some authors have underlined the importance of fluency-related
factors in receptive processes (Leeson, 1975; Segalowitz, 1991), there seems to be a
tacit agreement among teachers and researchers that fluency mainly refers to productive
language performance. However, even this more restricted definition of fluency as a
descriptor of oral production is amenable to different interpretations.
In considering the various possibilities, we may draw a distinction between
fluency with respect to native language performance and fluency in the context of
foreign language teaching and testing. In the latter case, fluency is viewed as an
important criterion by which non-native performance can be judged (Riggenbach,
1991), despite the vagueness of the exact meaning of the concept. This is clear from the
fact that fluency is often included in tests and evaluation schemes. With respect to
native speakers' oral performance, fluency may be used to characterize the performance
of a speaker, but does not really constitute an evaluation criterion. The term dysfluent,
on the other hand, is often used in connection with certain speech disorders such as
stuttering, where dysfluent speech is characterized by "an abnormally high frequency
and/or duration of stoppages in the forward flow of speech" (Peters and Guitar 1991: 9).
In considering native speakers' oral production Fillmore (1979: 93) identifies four
different abilities that might be subsumed under the term fluency: a) "the ability to talk
at length with few pauses", b) "the ability to talk in coherent, reasoned, and
"semantically dense" sentences", c) "the ability to have appropriate things to say in a
wide range of contexts", and d) "the ability...to be creative and imaginative
in...language use".
In foreign language teaching and testing various definitions of fluency are also
found. For instance, in communicative language teaching the emphasis has been on
fluency as opposed to accuracy. According to the definition provided by Brumfit (1984:
57) fluency is "the maximally effective operation of the language system so far
acquired by the student". In this definition of fluency, native-speaker-like performance
does not constitute the target to be achieved (Brumfit, 1984: 56). Alternatively, native-like performance is viewed as the final goal in the more common interpretation of
fluency as a synonym for oral command of a language. In everyday language use, this
definition may be extended to indicate overall language proficiency (Lennon, 1990;
Chambers, 1997). Finally, in a more restricted sense, the term fluency has been used to
refer to one aspect of oral proficiency, in particular the temporal aspect (Nation, 1989;
Lennon, 1990; Riggenbach, 1991; Schmidt, 1992; Freed, 1995; Towel et al., 1996).
However, even when the term fluency is used in this more limited sense, there is still
uncertainty as to what exactly contributes to perceived fluency. It is this -admittedly
rather vague- temporal interpretation of fluency that will be the focus of the present
paper.
In trying to define the temporal aspect of fluency, it has often been assumed that
the goal in language learning consists in producing "speech at the tempo of native
speakers, unimpeded by silent pauses and hesitations, filled pauses...self-corrections,
repetitions false starts and the like" (Lennon, 1990: 390). However, quantitative studies
of pause-related phenomena have revealed that native speech is not always smooth and
continuous, but exhibits a lot of hesitations and repairs (Raupach, 1983; Lennon, 1990;
Riggenbach, 1991). This would seem to imply that the presence of hesitation
phenomena is not sufficient to distinguish between natives and non-natives and that the
difference rather lies in the frequency and distribution of these phenomena, as
suggested by Möhle (1984). As a matter of fact, studies that have compared a number of
quantitative fluency measures in L1 and L2 speech of the same speaker have shown that
there may be considerable differences between the two speech types (Möhle 1984;
Towell et al., 1996).
In an attempt to gain more insight into the temporal aspects of fluency, Lennon
(1990), Riggenbach (1991) and Freed (1995) carried out studies in which samples of
spontaneous speech produced by non-native speakers of English were judged by experts
on fluency and were then analyzed in terms of quantitative variables such as speech
rate, phonation-time ratio, mean length of runs, and number and length of pauses. The
results of these studies show that fluency ratings are affected by quantitative variables
such as speech rate and number of pauses. In addition, these studies also reveal that
studying the relationship between fluency ratings and temporal variables in spontaneous
speech may be rather complex, because in this case the fluency ratings turn out to be
affected by non-temporal properties of speech utterances, such as grammar, vocabulary
and accent (Lennon, 1990: 408; Riggenbach, 1991: 434; Freed, 1995: 135).
The aim of the research reported in this paper is to determine whether expert
fluency ratings of read speech can be predicted on the basis of temporal measures of
speech quality. The decision to limit this investigation to read speech is related to the
methodological complexities involved in studying fluency in spontaneous speech. If the
present approach appears to be feasible, it will be applied to spontaneous speech too.
Identifying quantitative correlates of perceived fluency is important with a view to
developing objective testing instruments for fluency assessment. An important
characteristic of the present investigation is that the quantitative variables are
calculated automatically. In turn this suggests that if the objective measures used in this
study appear to be able to predict perceived fluency, this approach may have potential
for the development of automatic tests of fluency in read speech.
The goal of this study will be pursued by relating expert fluency ratings of speech
read by native and non-native speakers of Dutch with a set of quantitative measures of
speech quality that are supposed to be related to perceived fluency. In this way it can be
determined to what extent expert judgments of fluency can be predicted on the basis of
automatically obtained temporal measures of speech quality. In other words, the expert
fluency ratings will constitute the reference for the evaluation of the automatic fluency
measures. Of course, this will be possible only if the expert ratings exhibit acceptable
levels of reliability. To this end, we will ask different groups of raters to evaluate the
same material on fluency. Moreover, each rater will be asked to score part of the
material twice so that it will possible to establish reliability.
In addition, these analyses will make it possible to determine the contribution of
the various quantitative variables to perceived fluency. In turn this will shed some light
on the determinants of fluency in read speech.
Furthermore, since the data gathered in this investigation concern both natives
and non-natives, this will offer the possibility of determining whether native and non-native speakers differ on the fluency ratings and on the temporal variables. It is clear
that distinguishing between these two groups is not the aim of a fluency test, which,
instead, should distinguish between fluent and non-fluent speakers. However, for the
development of a test of this kind, data on native performance are necessary to
establish benchmarks. Moreover, given that fluency is often equated with native-like
performance (see above), it is interesting to determine whether the two groups of
natives and non-natives significantly differ from each other on the variables under
study.
IV. CONCLUSIONS
On the basis of the results of the present investigation we can draw the following
conclusions. First, expert listeners are able to evaluate fluency with a high degree of
reliability. Second, expert fluency ratings of read speech are mainly influenced by two
factors: speed of articulation and frequency of pauses. Third, expert fluency ratings can
be accurately predicted on the basis of automatically calculated measures such as rate
of speech, articulation rate, phonation-time ratio, number and total duration of pauses
and mean length of runs. Of all these measures rate of speech appears to be the best
one. Fourth, native speakers are more fluent than non-natives and the temporal
measures are significantly different for the two groups.
To conclude, these findings indicate that temporal measures of fluency may be
employed to develop objective testing instruments of fluency in read speech. In turn,
the fact that these measures can be automatically calculated by means of automatic
speech recognition techniques suggests that this approach may contribute to developing
automatic tests of fluency, at least for read speech. If we then consider that these results
were obtained with telephone speech, then it seems that this approach is likely to have
important consequences for the future of fluency assessment.
ACKNOWLEDGMENTS
This research was supported by SENTER (an agency of the Dutch Ministry of
Economic Affairs), the Dutch National Institute for Educational Measurement (CITO),
Swets Test Services of Swets and Zeitlinger and KPN. The research of Dr. H. Strik has
been made possible by a fellowship of the Royal Netherlands Academy of Arts and
Sciences. The authors would like to thank Tim Bunnel and an anonymous reviewer for
their valuable comments and suggestions.
APPENDIX
Group 1 sentences
1) Vitrage is heel ouderwets en past niet bij een modern interieur.
2) De Nederlandse gulden is al lang even hard als de Duitse mark.
3) Een bekertje warme chocolademelk moet je wel lusten.
4) Door jouw gezeur zijn we nu al meer dan een uur te laat voor die afspraak.
5) Met een flinke garage erbij moet je genoeg opbergruimte hebben.
Group 2 sentences
1) Een foutje van de stuurman heeft het schip doen kapseizen.
2) Gelokt door een stukje kaas liep het muisje keurig in de val.
3) Het ziet er naar uit dat het deze week bij ons opnieuw gaat regenen.
4) Na die grote lekkage was het dure behang aan vervanging toe.
5) Geduldig hou ik de deur voor je open.
REFERENCES
Brumfit, C. (1984). Communicative Methodology in Language Teaching: The Roles of
Fluency and Accuracy (Cambridge University Press, Cambridge).
Butcher, A. (1981). Phonetic correlates of perceived tempo in reading and spontaneous
speech. Work in Progress, Phon Lab Univ. Reading, pp. 105-117.
Chambers, F. (1997). "What Do We Mean by Fluency?," System, 4, 535-544.
Dechert, H.W. and Raupach, M. (1980a). Temporal Variables in Speech: Studies in
Honour of Frieda Goldman-Eisler (Mouton, The Hague).
Dechert, H.W. and Raupach, M. (1980b). Towards a Cross-Linguistic Assessment of
Speech Production (Lang, Frankfurt).
den Os, E.A., Boogaart, T.I., Boves, L., and Klabbers, E., (1995) "The Dutch Polyphone
Corpus," Proceedings Eurospeech95, 825-828.
Ferguson, G.A. (1987) Statistical Analysis in Psychology and Education (McGraw-Hill,
Singapore).
Fillmore, C.J. (1979). "On Fluency," in Individual Differences in Language Ability and
Language Behavior, edited by C. Fillmore, D. Kempler, and W.S.-Y. Wang
(Academic, New York), pp. 85-101.
Flege, J.E. & K.L. Fletcher (1992). "Talker and listener effects on degree of perceived
foreign accent", Journal of the Acoustical Society of America, 91 (1) 370-389.
Freed, B.F. (1995). "What Makes Us Think that Students Who Study Abroad Become
Fluent?," in Second Language Acquisition in a Study-Abroad Context, edited by
B.F. Freed (John Benjamins, Amsterdam), pp. 123-148.
Goldman-Eisler, F. (1968). Psycholinguistics: Experiments in Spontaneous Speech
(Academic, New York).
Grosjean, F. (1980). "Temporal Variables Within and Between Languages," in Towards
a Cross-Linguistic Assessment of Speech Production, edited by H.W. Dechert and
M. Raupach (Lang, Frankfurt), pp.39-53.
Grosjean, F. and Deschamps, A. (1975). "Analyse Contrastive des Variables
Temporelles de l'Anglais et du Francais: Vitesse de Parole et Variables
Composantes, Phénomènes d'Hésitation," Phonetica, 31, 144-184.
Leeson, R. (1975). Fluency and Language Teaching (Longman, London).
Lennon, P. (1990). "Investigating Fluency in EFL: A Quantitative Approach," Language
Learning, 3, 387-417.
Levelt, W.J.M., (1989) Speaking. From Intention to Articulation (MIT Press,
Cambridge, MA).
Möhle, D., (1984) "A Comparison of the Second Language Speech Production of
Different Native Speakers," in Second Language Productions, edited by
H.W.Dechert, D. Möhle, and M. Raupach (Narr, Tübingen), pp. 26-49.
Nation, P. (1989). "Improving Speaking Fluency," System, 3, 377-384.
Peters, T.J. and Guitar, B. (1991) Stuttering. An Integrated Approach to Its Nature and
Treatment (William and Wilkins, Baltimore).
Raupach, M. (1980). "Temporal Variables in First and Second Language Speech
Production," in, Temporal Variables in Speech: Studies in Honour of Frieda
Goldman-Eisler, edited by H.W. Dechert and M. Raupach (Mouton, The Hague),
pp. 263-270.
Raupach, M. (1983). "Analysis and Evaluation of Communicative Strategies," in
Strategies in Interlanguage Communication, edited by C. Faerch and G. Kasper
(Longman, London), pp. 263-270.
Riggenbach, H. (1991). "Toward an Understanding of Fluency: A Microanalysis of
Non-native Speaker Conversations," Discourse Processes, 14, 423-441.
Schmidt, R. (1992). "Psychological Mechanisms Underlying Second Language
Fluency," Studies in Second Language Acquisition, 14, 357-385.
Segalowitz, N. (1991). "Does Advanced Skill in a Second Language Reduce
Automaticity in the First Language?," Language Learning, 41, 59-83.
SPEX http://lands.let.ru.nl/spex.
Strik, H., Russel, A., Van den Heuvel, H., Cucchiarini, C., Boves, L., (1997). "A Spoken
Dialog System for the Dutch Public Transport Information Service," International
Journal of Speech Technology, 2,121-131.
Towell, R. (1987). "Approaches to the Analysis of the Oral Language Development of
the Advanced Learner," in The Advanced Language Learner, edited by J.A.
Coleman and R. Towell (CILT,London), pp. 157-181.
Towell, R., Hawkins, R., and Bazergui, N. (1996). "The Development of Fluency in
Advanced Learners of French," Applied Linguistics, 1, 84-119.
|