H. Strik & L. Boves
Proceedings Eurospeech 89, Paris, Vol. Two, pp. 425-428.
ABSTRACT
It is known that subglottal pressure (Psb) is a major factor in the control
of fundamental frequency (Fo) in speech. Yet, the details of this relation
remain unclear. Estimates of the Fo to Psb ratio (FPR) from speech and special
phonation tasks yield values between 5 and 15 Hz/cmH2O [1,2,3,4]. In another
type of experiments pressure variations are induced externally, either subglottally
or supraglottally. The FPR's measured in these experiments tend towards
values of 2-5 Hz/cmH2O [5,6,7,8]. There seems to be no a priori reason for
the FPR to be different in both kinds of experiments. After all, the voice
source is the same and why should it behave differently during both kinds
of phonation tasks? Therefore we carried out experiments that aimed at resolving
this discrepancy.
I. THE FPR IN EXPERIMENTS WITH INDUCED PRESSURE VARIATIONS
INTRODUCTION
The FPR in experiments with artificially induced pressure variations was
studied first, because we had some ideas why estimates of the FPR in these
experiments could be too low. These ideas are described below, and are formalized
in three hypotheses.
Except for Psb there are other factors that control Fo. If we want to know
the effect of Psb alone on Fo then we must check whether all other factors
are constant. It is known that Fo is also controlled by the laryngeal muscles.
Baer [5] studied the influence of the laryngeal muscles on the FPR in an
experiment in which the subject is pushed on the chest to increase Psb.
He found a consistent increase in the EMG activity of vocalis (VOC) and
interarytenoid 30-40 ms after each push. Even for the fastest laryngeal
muscles it takes about 15-20 ms before a change in the activity of a muscle
is followed by a change in Fo [9,10]. So the first 45-60 ms following a
push the laryngeal muscles probably do not affect Fo. Baer calculated the
FPR during the first 30 ms and found a value of 2-4 Hz/cmH2O in the chest
register, a value that did not deviate from the values reported earlier
by others. We did not reexamine the effect of the laryngeal muscles on the
FPR.
The first hypothesis:
a sudden rise in Psb is followed by a rise in Psp.
In most experiments either sub- or supraglottal pressure (Psp) is measured
and varied, while the other pressure signal (Psp resp. Psb) is not measured.
During sustained phonation of a vowel the impedance of the glottis is high
but finite. A change of the pressure on either side of the glottis could
leak through the glottis. If this would happen the change in transglottal
pressure (Pt) is smaller than the change in the measured pressure signal.
Because it is really Pt that controls Fo [11], it is also the change in
Pt that has to be related to a change in Fo. The effect would be that the
estimated FPR is smaller than the ratio between change in Fo and Pt.
The second hypothesis:
a change in Fo lags a change in Psb.
The scatter plots of Fo versus Psb in Baer's article [5] exhibit hysteresis.
The hysteresis is already visible during the first 45 ms, so before laryngeal
muscle activity could influence Fo. This could be an indication that the
Fo change lags the Psb change. During the sustained vowel the vibratory
system is in a steady state. When Psb is changed it takes some time for
the vocal folds to reach a new steady state. The time constant of this adaptation
process depends on the total Psb change. Furthermore, this lag would only
show up if the time constant of the Psb change is less than the time constant
of the adaptation process. In speech the rate of Psb change during an utterance
of about 1-8 cmH2O/s is probably slow enough for the vocal folds to adjust
almost instantaneously to the new vibratory conditions. Both Ladefoged [6]
and Baer [5] used short pushes to vary Psb. During these pushes the estimated
rate of Psb change is substantially larger than the aforementioned rate
of Psb change in speech. If the changes in Fo would lag the changes in Psb,
then the duration of their pulsatile Psb changes could be too short for
the vocal folds to reach a new steady state. The result would be an underestimation
of dFo, and hence an underestimation of dFo/dPsb.
The third hypothesis:
the FPR is different in Psb rising and lowering.
In utterances that exhibit declination both Fo and Psb decrease during the
course of the utterance. This is most clearly seen in declarative utterances
with a single accent early in the utterance. In these cases the FPR is calculated
for decreasing Fo and Psb. On the other hand, in experiments where Psb is
changed by pushing on the chest the FPR is calculated for increasing Fo
and Psb. Differences between Fo rising and Fo falling have been reported
and Breckenridge [12] summarizes them by stating that "it has been found
that falling tones are more common in the world's languages than rising
tones, can be produced faster, and furthermore fall more than rising tones
rise." Maybe the FPR is different for Psb rising and lowering, i.e. the
ratio in lowering is higher.
In short, three hypotheses were postulated that could explain why estimates
of the FPR in experiments with induced pressure changes are too low: 1.
a rise in Psb is followed by a rise in Psp 2. a change in Fo lags the change
in Psb 3. the FPR is different in Psb rising and lowering These three hypotheses
were tested with the data of an experiment.
METHOD
An experiment was carried out in which simultaneous recordings of acoustic
signal, electroglottogram (EGG), Psb, Psp and sternohyoid (SH) were obtained
while a subject sustained a vowel /a/ at a comfortable Fo and intensity
level. During phonation he was pushed on the chest to increase Psb, the
chest was held down to keep Psb high, and finally the chest was released
again to lower Psb. In normal speech the fall of Psb during an utterance
generally varies from 2 to 12 cmH2O (see section II). In earlier experiments
the magnitude of the induced pressure change was about 1-4 cmH2O [5,7,8].
We tried to induce larger pressure variations. The Psb changes were induced
as fast as possible, in order to produce a large Psb gradient.
All measured signals were stored on a 14-channel instrumentation recorder
(TEAC XR-510). The signals are A/D-converted off-line at a 10 kHz sampling
rate. Fo was calculated from the EGG signal with a frame rate of 200 frames/s.
The pressure signals were low-pass filtered and downsampled to 200 Hz.
RESULTS AND DISCUSSION
The results of this experiment were used to test the three hy-potheses.
The hypotheses were tested in the same order as they are presented in the
introduction above. In Figure 1 the Fo, Psb and Psp signals are shown for
one of the pushes.
Figure 1. Fo, Psb en Psp during a chest push. [see postscript version]
The fact that a change in Psb is followed immediately by a change in Fo
indicates that there must be a direct relation between these two variables.
It is observed that we succeeded fairly well in keeping Psb high for some
time. In all cases Psb decreased during the time that the chest was held
down. This could be caused by a partial release of the chest, an adjustment
of the respiratory muscles, or it could be a by-product of the decreasing
lung volume. In the example in Fig. 1 the stepwise increase in Psb was 9.2
cmH2O, while the Psb sudden decrease was 7.0 cmH2O. This means that we also
succeeded in inducing pressure variations of substantial magnitude. Both
the average rate of change and the maximum rate of change are about the
same during rising and lowering (25 cmH2O/s for the average resp. 55 cmH2O/s
for the maximum). This value is much larger than the rate of Psb change
during speech utterances, that is known to be in the range of 3-8 cmH2O/s
(see section II), and therefore the Psb changes seem fast enough to test
whether there is a lag between Fo and Psb changes. For three chest pushes
the Psb variation was as intended: the rise and fall are fast and large
enough, and Psb is kept high for some time. Particularly the data of these
pushes are used to test the hypotheses. This is discussed below.
During Psb rising and lowering no significant changes in Psp were observed,
as can be seen from the example in Fig. 1. This was the case for the three
'succesful' pushes mentioned above, but also for all other pushes. A Psb
rise was never followed by a Psp rise, so our first hypothesis was rejected.
Figure 2. Fo and Psb during a chest push. [see postscript version]
The Fo and Psb signals of Figure 1 are plotted together in Figure 2. Fo
changes instantaneously with Psb, even if the total Psb change is 9 cmH2O
and if the rate of Psb change is 55 cmH2O/s. A lag between Fo and Psb was
not found. The voice source apparently is capable of adjusting very fast
to changing phonatory conditions.
Figure 3. Fo(Psb) during Psb rising (*) and lowering (+). [see postscript
version]
A scatter plot of Fo versus Psb is shown in Figure 3. Shown are the data
during Psb rising (*) and lowering (+). One can see that the FPR is almost
the same during rising and lowering. A substantial difference in the FPR
during rising and lowering was not observed.
CONCLUSIONS
All three postulated hypothesis were falsified. At the moment there seems
to be no reason to doubt the values of the FPR found in the experiments
with induced pressure variations. Therefore the values obtained from measurements
on normal speech have to be questioned.
II. THE FPR IN SPEECH
INTRODUCTION
There are two mutually exclusive explanations why estimates of the FPR in
speech utterances showing Fo declination are larger than estimates in experiments
with induced pressure variations: the FPR is really larger in speech, or
the estimates obtained from measurements in speech are wrong. The second
explanation seemed more probable to us, so we first examined the methods
that are used to calculate the FPR in the experiments on declination [1,2,3,4].
Usually the Fo and Psb values are taken at two instants, one near or at
the beginning (Ti) and one near or at the end (Tf) of an utterance. An estimate
of the FPR is then calculated with these values:
FPR1 = [Fo(Ti) - Fo(Tf)]/[Psb(Ti) - Psb(Tf)]
Figure 4. Fo(Psb) during first 200 ms (*), during last 200 ms (+). and during
intermediate period (.). [see postscript version]
In a plot of Fo as a function of Psb FPR1 is the slope of the line connecting
the data measured at Ti and Tf. In Figure 4 a scatter plot of Fo versus
Psb is given for one of the sentences of this experiment. Shown are the
first 40 voiced samples (*), the last 40 voiced samples (+), and the intermediate
samples (.). It can be seen that the value of FPR1 strongly depends on the
exact choice of Ti and Tf. Compared to the data in Figure 3 the data are
much more scattered here because apart from Psb there are many other physiological
processes that influence Fo. This makes it hazardous to make an estimation
based on the values at two instants only. It would just be a matter of coincidence
if the influence of all other factors on Fo is the same at those two instants.
A statistically better method would be to calculate the regression from
Psb on Fo. The slope of the regression line would be a better estimate of
the FPR, because it takes into account all Fo,Psb pairs, not just two of
them. Define:
FPR2 = regression coefficient between Fo and Psb
The fact that the calculated FPR in experiments on declination (FPR1) is
almost always larger than 2-5 Hz/cmH2O (the value obtained in experiments
with induced pressure variations) was an indication that the other Fo regulating
processes could participate in the decline of Fo, i.e. their influence on
Fo could be such that the total fall of Fo is larger than the fall of Fo
resulting from the fall of Psb alone. If this is the case then the regression
coefficient between Fo and Psb is not a good measure of the FPR in speech.
Fo first has to be corrected for the influence of other variables. This
is achieved by partitioning out the effects of the additional factors from
Fo. The regression coefficient between corrected Fo (Fo') and Psb would
then be a better estimate of the rate of Fo change resulting from a change
in Psb alone. Define:
FPR3 = regression coefficient between Fo' and Psb
Our hypothesis is that the true FPR is the same in 'normal speech' and sustained
phonation with induced pressure variations. Estimates of the FPR in 'normal
speech' (FPR1) often are too high because other processes also participate
in the decline of Fo. To test this hypothesis two experiments were carried
out in which, apart from Psb, also other physiological processes were measured
that could control Fo.
METHOD
In the first experiment simultaneous recordings of the acoustic signal,
EGG, Psb, lung volume (Vl), and EMG activity of the cricothyroid (CT), vocalis
(VOC) and SH were obtained while the subject performed several speech tasks,
a.o. the repeated production of a short and a long Dutch sentence. The short
sentence was also produced in reiterant form, using either the syllable
/fi/ or /vi/. The sentences had to be produced with three different intonation
contours, i.e. a 'flat hat pattern' (FH), two 'pointed hats' (PH) and question
intonation (Q). Each of the 12 sentence-contour pairs (4 sentences x 3 intonation
contours) was repeated at least five times to make averaging possible.
In the second experiment recordings of the supraglottal pressure (Psp) were
also made, but activity of the CT was not recorded. Near the end of the
experiment the subject was asked to produce an utterance spontaneously.
After he spoke this sentence, he was asked to repeat the same sentence 29
times.
Preprocessing of the data was done with the Haskins Labo-ratories EMG data
processing system. The repetitions were time aligned using line-up points.
A DTW algorithm was used to correct for the differences in the temporal
structure between repetitions. Median values were then calculated for all
variables. The exact procedure of data measurement and data processing is
described in [11].
RESULTS AND DISCUSSION
The resulting signals were used to calculate the values given in Table I.
The values for the utterances in which all syllables are replaced by /vi/
were deviating. In these sentences voicing starts well before the initial
peak in Fo and Psb. As a result the Fo and Psb values are small for the
first voiced sample, and dFo and dPsb are small too. We could have chosen
another instant (Ti) to measure Fo and Psb, but that is beyond the scope
of this paper. The total fall in Psb varied between 4.0 and 11.9 cmH2O,
and the overall rate of Psb change varied between 1.7 and 8.1 cmH2O/s.
The values of FPR1 and FPR2 for the questions are not relevant, because
Fo rises markedly near the end of these sentences. The other values of FPR1
vary from 6.1 to 8.9 Hz/cmH2O. This is in agreement with the results of
previous studies [1,2,3,4], and therefore these sentences seem suitable
to test our hypothesis.
The values of FPR2 for non-questions always are smaller than the values
of FPR1, and vary between 4.0 and 6.9 Hz/cmH2O. But one has to be careful
in interpreting these values. The value of a regression coefficient is dependent
on the value of the correlation coefficient, and therefore a smaller correlation
coefficient would result in a smaller regression coefficient. In any case,
the values of FPR2 are stil higher than the FPR values obtained in experiments
with artificially induced Psb variations.
The results for one sentence (long-FH) are shown in Figure 5. In most sentences
CT and VOC were especially active during the first syllable, and their activity
was suppressed at the end. This effect can also often be observed in the
data of previous experiments on declination in which muscle activity was
measured [1,2,3,4]. The peak activity of these Fo raising muscles is much
larger during a stressed syllable at the beginning than during a stressed
syllable at the end. And if the first syllable is not stressed, then CT
and VOC still show increased activity. On the average the Fo raising muscles
CT and VOC are more active at the beginning than at the end of utterances.
Figure 5. Fo, Psb, SH, CT and VOC signals [see postscript version]
It is often observed that the SH is especially active just before phonation
[1,13,14], and it is assumed that the SH helps in preparing the larynx for
the 'speech mode.' This was also observed in some of the utterances of this
experiment. Usually SH activity has dropped to its base level when phonation
starts. At the end of utterances Fo often falls abruptly (the so called
final fall), and often this is accompanied by a rise of SH activity (and
a lowering of the larynx). This is observed in the data of the present experiments,
but also in the data of previous experiments [1,2,3,4]. On the average the
Fo lowering muscle SH is more active at the end than at the beginning of
utterances.
Thus it seems that the laryngeal muscles participate in the de-clination
of Fo, so part of the decline in Fo is due to the activity of the laryngeal
muscles. If we want to calculate the FPR we first have to correct Fo for
these influences. This is done by calculating the regression equation between
Fo and SH and VOC for the average signals of the spontaneous utterance,
and the regression equation between Fo and SH and CT for the other 12 sentences.
The value of FPR3 is then calculated. Except for the long-FH-type the values
vary between 1.5 and 3.3. Again we want to stress that regression coefficients
do depend on the correlation between the variables. For instance in the
long-FH-type the correlation was extremely low causing the value of FPR3
to be very low. Still, if we compare the values of FPR3 with those of FPR2
we see that correction for the influence of two important laryngeal muscles
resulted in a lowering of the estimate of the FPR in all non-questions.
For the questions the Fo rise at the end is mainly controlled by the combined
activity of CT and VOC. The value of FPR3 is corrected for this increase
in CT activity, and therefore the value of FPR3 is also relevant for questions.
The values thus obtained are in the same range as the FPR3 values for declarative
utterances.
CONCLUSIONS
The data obtained in the two experiments described above do support our
hypothesis that the FPR is the same in speech and in experiments with induced
pressure variations. Our data, and data of previous experiments on declination,
suggest that laryngeal muscles participate in the Fo declination during
an utterance.
ACKNOWLEDGEMENTS
This research was supported by the foundation for linguistic reasearch,
which is funded by the Netherlands Organization for the Advancement of Scientific
Research N.W.O. Special thanks are due to Haskins Laboratories were one
of the experiments was carried out; to dr. Thomas Baer who helped organizing
and running the experiment at Haskins; to dr. Hiroshi Muta who inserted
the EMG electrodes and the subglottal pressure sensor in the experiment
at Haskins; and to dr. Philip Blok who inserted the EMG electrodes and the
pressure catheter in the other two experiments.
REFERENCES
[1] Collier, R. (1975). Physiological correlates of intonation patterns.
J. Acous. Soc. Am. 58: 249-255.
[2] Maeda, S. (1976). A characterization of American English intonation.
Ph.D.thesis, MIT, Cambridge.
[3] Gelfer, C.; Harris, K.; Collier, R. and Baer, T. (1983). Is Declination
Actively Controlled? In: Titze, I.R. and Scherer, C. (eds.), Vocal Fold
Physiology. The Denver Center for the Performing Atrs, Inc., Denver, Colorado.
[4] Collier, R. and Gelfer, C.E. (1984). Physiological Explanations of Fo
Declination. In: Van den Broecke, M.P.R. and Cohen, A. (eds.), Proc. of
the tenth Int. Congres of Phonetic Sciences. Foris Publications Holland,
Dordrecht.
[5] Baer, T. (1979). Reflex activation of laryngeal muscles by sudden induced
subglottal pressure changes. J. Acoust. Soc. Am. 65: 1271-1275.
[6] Ladefoged, P. (1963). Some physiological parameters in speech. Language
and Speech 6: 109-119.
[7] Rothenberg, M. and Mahshie, J. (1986). Induced transglottal pressure
variations during voicing. J. of Phon. 14: 365-371.
[8] Baken, R.J. and Orlikoff, R.F. (1987). Phonatory Response to Step-Function
Changes in Supraglottal Pressure. In: Baer, T.; Sasiki, C. and Harris, K.
(eds.), Laryngeal Function in Phonation and Respiration. College-Hill Press,
Boston, Massachusetts.
[9] Sawashima, M. (1974). Laryngeal research in experimental phonetics.
In: Sebeok, T.A. et al (eds.), Current Trends in Linguistics, Vol. 12: 2303-2348.
Mouton, The Hague.
[10] Atkinson, J.E. (1978). Correlation analysis of the physio-logical features
controlling fundamental voice frequency. JASA 63: 211-222.
[11] Strik, H. and Boves, L. (1988). Averaging physiological signals with
the use of a DTW algorithm. Proceedings SPEECH'88, 7th FASE Symposium, Edinburgh,
Book 3: 883-890.
[12] Breckenridge, J. (1977). Declination as a phonological pro-cess. Bell
Labatories Technical Memorandum, Murray Hill.
[13] Hirose, H. and Sawashima, M. (1981). Functions of the laryngeal muscles
in speech. In: K.N. Stevens and M. Hirano (eds.), Vocal Fold Physiology.
University of Tokyo press, To-kyo.
[14] Strik, H. and Boves, L. (1987). Regulation of intensity and pitch in
chest voice. Proceedings 11th International Congres of Phonetic Sciences,
Tallinn, Vol. VI: 32-35.