H. Strik & L. Boves (1994a)
Proceedings of the Dept. of Language and Speech, University of Nijmegen, Vol. 16/17, pp. 96-105.
There are many articles dealing with the relation between fundamental frequency
(Fo) and the underlying physiological processes, which show that subglottal pressure
(Psb) and the activity of the cricothyroid (CT), vocalis (VOC), and sternohyoid (SH) muscles are
important factors in the control of Fo (Rubin, 1963; Shipp & McGlone, 1971; Collier, 1975;
Baer, Gay, & Niimi, 1976; Maeda, 1976; Atkinson, 1978; Shipp, Doherty, & Morrissey, 1979;
Hirose & Sawashima, 1981; Gelfer, 1987). However, most of the data described in these
papers concern singing and sustained phonation. Moreover, in many studies measurements
were obtained for either the respiratory system or the activity of the laryngeal muscles.
There are relatively few studies in which simultaneous registrations of respiratory and laryngeal
activity were made for running speech. This may be an important reason why it is not completely
clear yet how these factors cooperate in the regulation of Fo for running speech.
The purpose of the research reported in this paper is to clarify the relation
between Fo and the physiological mechanisms for running speech. To this end simultaneous
measurements of laryngeal and respiratory activity were made for two subjects. Our main
goal is to propose a comprehensive model for the physiological control of intonation. From our
own data, as well as from the literature (Gay et al., 1972; Ladefoged, 1967), it appears that
there are differences between subjects in the physiology underlying intonation. To overcome the
problems deriving from this variability, we made ample use of data available in the literature.
Especially the data found in Ladefoged (1967), Lieberman (1967), Collier (1975), and Gelfer
(1987) proved to be useful.
From the outset we did not want to subscribe to one of the extant models
of intonation. Instead, we wanted to analyse the data as objectively as possible, i.e.
our methodology is essentially data driven. As a consequence, we try to avoid theory-laden
terms like 'declination' and 'baseline' as much as possible.
The outline of the article is as follows. Section 1 describes the material
and the method used in our research. The results for running speech are presented in section
2. The physio logical model resulting from our investigations is given in section 3. Finally,
in section 4 we discuss our findings and their relation to previous research.
1. Material and method
For two Dutch male subjects recordings were made of the audio signal, electroglottogram, lung volume (Vl), Psb, SH, and VOC. In addition to these signals the CT
was also measured for subject LB, and oral pressure (Por) for subject HB. In the latter case
transglottal pressure (Ptr) was calculated by taking the difference of Psb and Por.
The measurements were made while the subjects produced sustained vowels
and meaning ful Dutch sentences with different intonation patterns. The subjects repeated
each sentence 5 to 8 times. The signals of these repetitions were used to calculate average
signals. The method of non-linear time-alignment and averaging (Strik & Boves, 1991) was used
to average the signals. An advantage of this method is that it also yields an average Fo
contour. All signals shown in the present article are average signals, which are time-aligned
with the audio signal. The procedures used for recording and processing the data are described
in more detail in Strik & Boves (1992).
2. Running speech
A speaker can use many different physiological mechanisms to control Fo.
Therefore, it is remarkable that the within-subject variation between the signals of repetitions
of the same utterance is relatively small. This suggests that speakers have a good notion
of the manner in which they want to produce an utterance, and that they have good control
over these mechan isms. Consequently, meaningful averaging of the data is possible (Strik
& Boves, 1991).
Although there are some individual differences, consistent behaviour between
subjects can be observed in the data. Ladefoged (1967) also noted that for his subjects
"the results obtained so far are sufficiently consistent to suggest the general pattern of the
relationships involved". This consistency led to the discovery of general patterns in the behaviour
of Psb, CT, VOC, and SH, that we want to describe in this section.
From our data it appears that both Fo and the physiological signals have
two components, viz. a global and a local one. This was also found by Maeda (1980), and
Gelfer (1987). In Strik & Boves (1992) we showed that this qualitative observation has a quantitative
statistical basis: on a global level Psb explains most of the observed variance of Fo,
while on a local level the laryngeal muscles become more important. Our treatment focuses on the
linguistically significant aspects of Fo, especially those connected with stress, phrasing
and the question-statement distinction. Initial rise and final lowering of Fo are
treated separately for reasons that are explained in section 2.2.5.
2.1. Global level
From the recordings shown in Figures 1 and 2 it is apparent that Psb has
a global and a local component. The global pattern of Psb will be called Psb,g. In most sentences
there is a gradual lowering of Psb,g. This can also be seen in the data of Lieberman (1967),
Collier (1975), and Gelfer (1987).
The shape of Psb,g differs among speakers. For subject LB the shape is concave
(see e.g. Figure 1b): the slope is steep initially, and it gradually becomes more
flat towards the end. The same pattern is observed for the two subjects in the studies of Collier
(1975) and Gelfer (1987), and for speaker 2 in the study of Lieberman (1967). However, for
subject HB the pattern is more convex (see e.g. Figure 2a): Psb,g decreases slowly in the
beginning, and more rapidly near the end. The same pattern is also found for speakers 1 and
3 in the study of Lieberman (1967). In spite of the differences between subjects, the behaviour
is relatively consistent within subjects.
The global reference level of CT, VOC, and SH seems to be constant (see
e.g. Figure 1f). In other words, these laryngeal muscles usually do not have a global component. Consequently, the global behaviour of Fo is generally determined by Psb,g.
The global component of Fo will be called Fo,g. A downtrend in Psb,g will result in
a downtrend in Fo,g. This downtrend in Fo,g has been observed in declarative utterances of many languages.
2.2. Local level
Fo, Psb, and the laryngeal muscles have a local component: if there are
local variations in Fo, then local variations in Psb and in the laryngeal muscles are often observed
(see e.g. Figures 1f, 2d, 2f). This can also be seen in the data of Ladefoged (1967), Lieberman
(1967), Collier (1975) and Gelfer (1987).
The conclusion of many studies was that of all physiological factors known
to affect Fo, the CT shows the most consistent relation to Fo (Collier, 1975; Maeda, 1976;
Atkinson, 1978; Shipp, Doherty, & Morrissey, 1979; Erickson, Baer, & Harris, 1983; Gelfer,
1987). Also in our data we see that for local variations in Fo there usually is a local
variation in the activity of the CT. At the moment there is no doubt that the CT is an important factor
in the control of Fo. The local variation in CT explains (at least) part of the local variation
For local Fo movements we usually observe a covariance of Fo and VOC in
our data. This relation was also studied by Maeda (1976) and Atkinson (1978) for sentences
with various intonation patterns. No direct relation between VOC and the Fo movements
was found by Maeda for a single subject. However, Atkinson did find a positive correlation
between VOC and Fo for his subject. VOC is used to control Fo for sustained phonation
and singing (Rubin, 1963; Sawashima, Gay, & Harris, 1969; Shipp & McGlone, 1971; Gay et al.,
1972; Shipp, Doherty, & Morrissey, 1979). Hirose & Gay (1972) observed an increase in
the activity of CT and VOC for stressed vowels in isolated words. Probably there is a synergism
of CT and VOC in the control of Fo.
2.2.3. Subglottal pressure
Both measurements (Ladefoged, 1967; Lieberman, 1967; Collier, 1975; Baer,
Gay, & Niimi, 1976; Atkinson, 1978; Baken & Orlikoff, 1987; Gelfer, 1987) and modelling
(Titze, 1989) have shown that a change in Psb will affect Fo, ceteris paribus. During
local Fo movements a covariation of Fo and Psb is often observed in our data, and in the data
of Ladefoged (1967), Lieberman (1967), Collier (1975) and Gelfer (1987). Part of this local Psb
variation might be due to a change in the impedance of the glottis which, in turn, results
from changes in the activity of the laryngeal muscles (e.g. the changes in CT and VOC as noted
above). However, part of the Psb variation could also be due to changes in pulmonic activity.
For instance, increased activity of the respiratory muscles for stressed syllables was
found by Ladefoged (1967) and van Katwijk (1974). Whatever the cause of a Psb variation, the
result is a change in Fo.
The function of the SH in the control of Fo is not completely understood.
Erickson & Atkinson (1976), Maeda (1976) and Erickson, Baer, & Harris (1983) postulated
that Fo falls are initiated by a relaxation of the CT, which is followed by increased
activity of the SH. Collier (1975) argued that SH cannot be the primary effector of an Fo fall.
Atkinson (1978) found a high negative correlation between SH and Fo, while Erickson, Liberman,
& Niimi (1977) concluded that the SH has "a slightly negative relation to Fo". For
some sentences in our data there is also a small negative correlation between SH and Fo. However,
this negative correlation is mainly brought about by the increase in SH and the lowering
of Fo at the end of many utterances (the so-called final lowering, see section 2.2.5.). Of course,
final lowering will affect the correlation coefficient to a greater extent if the utterances
are short, like those used by Atkinson (1978). The SH is probably used in some Fo lowerings, but
it is also used for articulatory gestures such as jaw lowering, tongue lowering and retraction.
Therefore, the relation between the SH and Fo is probably complex. This is illustrated
in Figure 2c. During the Fo lowering there is a peak in the activity of SH, and in this case
the SH could have assisted in lowering Fo. But similar peaks can be observed also when Fo
increases or remains steadily high. Anyhow, no consistent, transparent relation can be found
in our data nor in the data of Collier (1975) and Gelfer (1987).
2.2.5. Initial rise and final lowering
High values of Fo, CT, VOC, and Psb are often observed at the beginning
of utterances, both in our data and in the data of Collier (1975), Maeda (1976), and Gelfer (1987).
This effect shows up more prominently in the utterances of subject LB, especially in the longer
ones, while it is less evident in the utterances of subject HB. In questions this initial
rise is slightly reduced compared to the statements.
Towards the end many utterances Fo and Psb often decrease substantially,
while there is a marked increase in the SH activity. Final lowering has also been observed
by Collier (1975) and Maeda (1976). Increased SH activity and the large drop in Psb usually
take place before phonation has stopped. However, in interrogative sentences both changes
are often delayed till after the utterance. Furthermore, the small drop in Psb, which sometimes
remains, is counterbalanced by a large increase in the activity of CT and VOC. Therefore,
the final lowering of F0 is rarely observed in questions.
The initial rise probably is the result of laryngeal adjustments that are
needed to start phonation (prephonatory tuning), while the final lowering could be a preparation
for the next inhalation (Wyke, 1983). Both kinds of local Fo variations could therefore
be seen as the by-product of physiological manoeuvres that are necessary for speech production.
Initial rise and final lowering are not generally used to signal stress, but still they
could be linguistically significant.
Prosody plays an important role in communication. It is used, among other
things, to mark the boundaries between phrases (Breckenridge, 1977; Cooper & Sorensen, 1981). Pierrehumbert (1979) suggested that the downtrend in Fo and IL may be important
in the perception of phrasing. The Fo fall that results from the downtrend in Fo,
is often enlarged by initial rise and final lowering. Consequently, both effects could assist
in the signalling of boundaries. In interrogative utterances the indications of a linguistic
control of both phenomena are especially clear. In these utterances initial rise and final
lowering were often reduced. Of course, a high Fo at the beginning, and especially a lowering of Fo at
the end of an utterance would interfere with the desired rising intonation.
From our data it is not manifest whether initial rise and final lowering
are linguistically controlled variables, or if they are primarily the by-product of physiological
gestures that are needed in speech production. That is the reason why these local Fo movements
are treated separately from the other local Fo movements which obviously do have a linguistic
3. A physiological model of intonation
3.1. The model
In this section we propose a qualitative model of Fo control in running
speech. It describes consistent behaviour of Psb, CT, VOC, and SH that was observed in the data
of various subjects. Although the SH is consistently used in final lowerings, no transparent
relation was found between the SH and other local Fo variations. Therefore, in our model
the SH does not play a role in the control of the latter type of local variations.
Intonation and its physiological control take place at two levels, viz.
a global and a local level.
CT, VOC, and SH generally do not seem to have a global component. Therefore,
the global component of Fo (Fo,g) is determined by Psb,g. Psb,g has a tendency to decline,
which could be due to an economic principle. The downtrend in Psb,g will lead to a downtrend
At the beginning of utterances CT, VOC, and Psb may have extra high values
(initial rise). At the end of utterances SH often shows an increase while Psb drops sharply.
If these effects occur during voiced sounds at the end of the utterance, final lowering is
observed. Alternatively, SH activity and Psb release may be delayed until after the
last voiced sound, in which cases final lowering is absent. The initial rise and final lowering of Fo will
add to the Fo fall that results from the downtrend in Fo.
Besides initial rise and final lowering, other local variations in Fo often
occur. These local variations in Fo are generally caused by variations in CT, VOC, and Psb.
Fo can be raised by increasing CT, VOC and Psb, and Fo can be lowered by decreasing Psb and
relaxing CT and VOC.
3.2. Some remarks
Compatible behaviour has been found in the data of Dutch, British English
and American English subjects. Our model describes the behaviour, that seems to be shared
by many speakers. Individual differences were found, though, and it is always possible
that an individual uses a different strategy to control intonation.
The SH is usually involved in final lowering of Fo. In our model the other
Fo lowerings are brought about by a relaxation of CT, VOC, and Psb, i.e. the same mechanisms
used to raise Fo are also used to lower it. According to our data and the data of Collier
(1975) and Gelfer (1987), no separate mechanism (like SH) seems to be needed to produce these
low tones. The strap muscles are probably used to produce very low tones, as during final
lowering. It is possible that these extra low tones do not occur often in those parts of
utterances that precede final lowering. This would imply that the role of the SH in the control
of F0 in running speech is limited.
The reference line of a laryngeal muscle is the activity observed when the
muscle is not active. Consequently, the activity of the CT and VOC can only be lowered
if it has been raised previously. For local variations of Psb it is also observed that
Psb is first raised, relative to Psb,g, and then it is lowered again. Thus it seems that a local lowering
of CT, VOC, and Psb is always preceded by a local rise. The question is what happens if a sentence
starts with a high Fo that is part of the intonation contour proper (i.e., it is not an
initial rise). As there is no such intonation pattern in our data nor in the data of Collier (1975)
and Gelfer (1987), we can only speculate on the answer. In this case we would expect CT, VOC,
and Psb to rise before phonation has started, and to remain high until the first Fo lowering.
In previous intonation studies the term baseline was used regularly. In
general it is defined as a line "drawn near or through the low values of Fo occurring in an utterance"
(Cooper & Sorensen, 1981). This baseline will resemble Fo,g, although they are not
identical. In our model Fo,g is the global component of Fo, i.e. the component that remains after
all local effects have been removed. Initial rise, final lowering, and the rise at the end of questions
are considered to be local effects, and thus are not part of Fo,g. According to the definition
given above, they probably are part of the baseline. The baseline also differs from Fo,g when
Fo is lowered by Fo-lowering mechanisms (e.g., the strap muscles). In that case the baseline
will drop below Fo,g.
Our physiological model of intonation is based on our own data, and on the
data of Lieberman (1967), Ladefoged (1967), Collier (1975) and Gelfer (1987). However, some
of the conclusions that were expressed in these articles are different from our
Lieberman (1967) made measurements of Psb, but he did not measure the activity
of the laryngeal muscles. He observed a resemblance in the behaviour of Fo and
Psb, except at the end of interrogative utterances. At the end of questions there was an increase
in Fo, while Psb generally did not increase. His assumption was that the activity of the
laryngeal muscles increased at the end of questions, but remained relatively steadily otherwise.
Based on this assumption he concluded that, apart from questions, Fo is a function of
Psb alone. This conclusion can easily be verified by calculating the frequency-to-pressure
ratio in his data. The rate of Fo changes that result from a change in Psb alone should be in the
range 2-7 Hz/cm H2O (e.g. Ladefoged, 1967; Baer, 1979). According to Lieberman (1967: 97)
this ratio is about 20 Hz/cm H2O in his data, while Ohala (1990) claims that it is even
larger. In any case, Psb alone cannot explain all the variation in Fo, and other mechanism must
have been involved. It is likely that the laryngeal muscles were involved, not only at the end
of questions but also in other parts of the utterances. Although we do not agree with his conclusion,
our model fits the general pattern in his data: Psb,g gradually declines, and local variations
in Psb explain part of the local variations in Fo.
The conclusion of Ladefoged (1967) that both vocal cord tension and Psb
contribute to stress is in agreement with our model. He presents data for utterances with
stress on the last word, and part of the utterances is also produced with a rising intonation
(questions). In his data it can be seen that Psb has a local component, for Psb generally increases
for stressed words and at the end of questions. This is also in line with our model.
As these Psb increases are present at the end of most of his utterances, Psb of these short utterances
is about level or slightly increases. This seems to be in contradiction with our claim that
Psb,g is generally decreasing. However, to study the behaviour of Psb,g the local variations
in Psb have to be removed. After this has been done, it is likely that Psb,g will decline,
also in Ladefoged's data.
The conclusions of Collier (1975) are based on the data of one subject.
The majority of the physiological data presented in Gelfer (1987) concern the same subject,
while she also shows data for one other subject. Although they do not offer an explicit model,
their main conclusions are similar: Psb controls the gradual falling baseline, while
local Fo movements are controlled by the CT. They both observed local variations in CT and Psb
for local Fo movements, and found that the frequency-to-pressure ratio for these movements
is higher than the expected 2-7 Hz/cm H2O. They argued that as Psb cannot explain all the variation
in Fo, it must be the CT that is the most important factor in the control of Fo. However,
one can calibrate the Fo-Psb ratio, but it is almost impossible to calibrate the
Fo-EMG ratio for a laryngeal muscle. An important reason is that the magnitude of an EMG signal
depends on many factors that are difficult to control (for instance, the magnitude
is dependent on the exact place of the electrode in the muscle). The conclusion is that one can check
whether Psb explains all of the variance in Fo, but the same check cannot be made for
a laryngeal muscle. Besides CT other factors could be involved. In fact, Psb and VOC (and probably
other factors) are usually involved in the local Fo movements. Because it is difficult
to calibrate the Fo-EMG ratio, it is hardly possible to decide on quantitative grounds which factor
is most important.
This research was supported by the Foundation for Linguistic Research, which
is funded by the Netherlands Organization for Scientific Research (N.W.O.). Special thanks
are due to Haskins Laboratories were one of the experiments was carried out, especially
to dr. Thomas Baer who made this possible; and to dr. Hiroshi Muta and dr. Philip Blok
who inserted the EMG electrodes and the pressure catheter in the experiments in New Haven
and Nijmegen respectively.
Atkinson, J.E. (1978) Correlation analysis of the physiological features
controlling fundamental voice frequency. JASA 63: 211-222.
Baer, T. (1979) Reflex activation of laryngeal muscles by sudden induced
subglottal pressure changes. JASA 65: 1271-1275.
Baer, T.; Gay, T. & Niimi, S. (1976) Control of fundamental frequency, intensity
and register of phonation. HLSRSR 45/46: 175-185.
Baken, R.J. & Orlikoff, R.F. (1987) Phonatory responses to step-function
changes in supraglottal pressure. In: Laryngeal function in phonation and respiration
(T. Baer, C. Sasaki and K.S. Harris, editors), pp. 273-290. Boston: College-Hill Press.
Breckenridge, J. (1977) Declination as a phonological process. Bell Labatories
Technical Memorandum, Murray Hill, New Jersey.
Collier, R. (1975) Physiological correlates of intonation patterns. J. Acoust.
Soc. Am. 58: 249-255.
Cooper, W.E. & Sorensen, J.M. (1981) Fundamental frequency in sentence production. Springer-Verlag, New York.
Erickson, D. & Atkinson, J.E. (1976) The functions of the strap muscles
in speech. HLSRSR SR-45/46: 205-210.
Erickson, D.; Baer, T.; Harris, K.S. (1983) The role of the strap muscles
in pitch lowering. In: D.M. Bless & J.H. Abbs (eds.), Vocal Fold Physology. College-Hill press,
Erickson, D.; Liberman, M. & Niimi, S. (1977) The geniohyoid and the role
of the strap muscles. HLSRSR SR-49: 103-110.
Gay, T.; Hirose, H.; Strome, M. & Sawashima, M. (1972) Electromyography
of the intrinsic laryngeal muscles during phonation. Annals of otology, rhinology and laryngology
81 (8): 401-409.
Gelfer, C.E. (1987) A simultaneous physiological and acoustic study of fundamental frequency declination. Ph.D. dissertation, Univ. of New York.
Hirose, H. & Gay, T. (1972) The activity of the intrinsic laryngeal muscles
in voicing distinction. Phonetica 25: 140-164.
Hirose, H. & Sawashima, M. (1981) Functions of the laryngeal muscles in
speech. In: K.N. Stevens and M. Hirano (eds.), Vocal Fold Physiology. University of Tokyo
van Katwijk, A. (1974) Accentuation in Dutch: An experimental study. Ph.D.
dissertation, Utrecht Univ.
Ladefoged, P. (1967) Three areas of experimental phonetics. Oxford University
Lieberman, P. (1967) Intonation, Perception and Language. The M.I.T. Press,
Maeda, S. (1976) A characterization of American English intonation. Ph.D.
thesis, MIT, Cambridge.
Ohala, J.J. (1990) Respiratory activity in speech. In: Speech production
and speech modelling (W.J. Hardcastel & A. Marchal, eds.), pp. 23-53, Kluwer Academic Publishers,
Pierrehumbert, J.B. (1979) The perception of fundamental frequency declination.
J. Acoust. Soc. Am. 66: 363-369.
Rubin, H.J. (1963) Experimental studies on vocal pitch and intensity in
phonation. The Laryngoscope 8: 973-1015.
Sawashima, M.; Gay, T.J. & Harris, K.S. (1969). Laryngeal muscle activity
during vocal pitch and intensity changes. HLSRSR 19/20: 211-220.
Shipp, T. & McGlone, R.E. (1971) Laryngeal dynamics associated with voice
frequency change. J. of Speech and Hearing Research 14: 761-768.
Shipp, T.; Doherty, E.T. & Morrissey, P. (1979) Predicting vocal frequency
from selected physiologic measures. JASA 66: 678-684.
Strik ,H. & Boves ,L. (1991) A dynamic programming algorithm for time-aligning
and averaging physiological signals related to speech. Journal of Phonetics
19, pp. 367-378.
Strik ,H. & Boves ,L. (1992) Control of fundamental frequency, intensity
and voice quality in speech. Journal of Phonetics 20, pp. 15-25.
Titze ,I. (1989) On the relation between subglotal pressure and fundamental
frequency in phonation. JASA 85 (2), pp. 901-906.
Wyke, B. (1983) Neuromuscular control systems in voice production. In: Vocal
Fold Physiology (D.M. Bless & J.H. Abbs, editors), pp. 71-76. San Diego: College-Hill