Control of fundamental frequency, intensity and voice quality in speech

home > publications > a12

Contact

Control of fundamental frequency, intensity and voice quality in speech
H. Strik & L. Boves (1992a)
Journal of Phonetics 20, pp. 15-25.

This article has appeared in the Journal of Phonetics. Therefore, I only have a printed version with the final text and the final layout. If you want a copy of this article, you can find it in Journal of Phonetics 20, or you can contact me. The text of the ASCII version below is slightly different from the text of the article.

Running title: Control of F0, IL and voice quality

Abstract

In this paper the control of fundamental frequency (F0), intensity level of the radiated acoustic signal (IL), and voice quality is studied in normal conversational speech. It is shown that the physiological factors that best explain measured features of the speech wave depend on the part of the utterance taken into account. Also, it appears that in speech transglottal pressure (Ptr) is more important than subglottal pressure (Psb). We conclude that currently available mathematical models that describe the waveform of glottal volume flow (Ug) lack a number of parameters necessary for a better understanding of the physiological control of the speech parameters investigated in this study.

1. Introduction

The relation between Psb and laryngeal configurations on the one hand, and F0, IL, and Ug on the other is extremely complex. Moreover, Psb and especially laryngeal configurations and the ways in which they are brought about are difficult to measure. Perhaps due to the measurement problems most investigations of laryngeal control and its effects on the radiated acoustic signal have dealt with sustained vowels produced in widely different ways, rather than with "normal" speech production. In many studies F0 was varied over several octaves and Psb over a range from approximately 5 cm aq. to well above 30 cm aq. There are several reasons why the results obtained in those studies may not directly be applied to speech production. In "normal neutral" speech the ranges are much smaller. This may imply that some of the control mechanisms needed to span the wide ranges in 'phonation experiments' are much less important in speech. Also, in sustained vowels oral pressure (Por) may be considered equal to atmospheric pressure. But in speech production, where non-negligible constrictions of the vocal tract occur, Por is much more important. In the present study we have looked into the relation of laryngeal characteristics to IL and F0 in normal speech. We will touch upon some methodological aspects of the research. Also, we will pay due attention to the role of Por and Ptr.

2. Material and methods

2.1. Experimental procedure

The subject in this study was a male native speaker of Dutch, with no experience in phonetics or linguistics and with no history of respiratory or laryngeal dysfunction. During the production of various utterances (sustained vowels, sentences with different intonation patterns) simultaneous recordings of the acoustic signal, electroglottogram (EGG), lung volume (Vl), Psb, Por, and electromyographic (EMG) activity of the sternohyoid (SH) and vocalis (VOC) muscles were obtained. Near the end of the recording session he was asked to produce an utterance spontaneously. He replied by saying (in Dutch): "Ik heb het idee dat mijn keel wordt afgeknepen door die band" (I have the feeling that my throat is being pinched off by that band). After he spoke this sentence, he was asked to repeat it 29 times.

2.2. Data recording

The speech signal was transduced by a condenser microphone (B&K 4134) placed about 10 cm in front of the mouth, and amplified by a measuring amplifier (B&K 2607).

The EGG was recorded with a Fourcin-Abberton laryngograph (Fourcin, 1974).

The pressure signals were recorded using a Millar(R) catheter with four miniature pressure transducers, in the way described by Cranen and Boves (1985). The catheter was introduced into the pharynx via the nose, and then into the trachea via the posterior commisure. It did not have a noticeable effect on phonation (Boves, 1984).

The EMG signals were recorded using hooked-wire electrodes (Hirose, 1971). The electrodes were inserted percutaneously, and correct electrode placement was confirmed by audio-visual monitoring of the signals during various functional manoeuvres.

The perimeter of chest and abdomen were measured with mercury filled strain-gauge wires (Strik and Boves, 1988).

All signals were recorded on a 14-channel instrumentation recorder (TEAC XR-510), using a bandwidth of 5 kHz.

2.3. Processing of the data

All signals were A/D converted off-line at a 10 kHz sampling rate. The files were stored on a microVAX computer. Because of the sluggishness of the articulators it seems sufficient to use a sampling frequency of 200 Hz. Therefore, the goal of preprocessing is to derive physiological signals which all have a sampling rate of 200 Hz.

F0 and IL were calculated with the SIF program of ILS. Both values were calculated every 5 ms, resulting in F0 and IL signals sampled at a 200 Hz rate.

Pressure signals, chest and abdomen signals were low-pass filtered and downsampled to 200 Hz. Lung volume was calculated from the low-pass filtered chest and abdomen signals.

The integrated rectified EMG was calculated in the way described by Basmajian (1967): first the signal is full-wave-rectified, and then it is integrated over successive periods of 5 ms. The integrator is reset after each integration. Finally, the signal is smoothed by convolving it with a triangular function (base length 35 ms).

There is a time delay between the change of the electric potential of a muscle and the resulting effect in the acoustic signal (Atkinson, 1978). To overcome this delay, all EMG signals were shifted forward over their mean response times.

After preprocessing median signals were calculated with the method of non-linear time-alignment that is described in Strik and Boves (1991), in which the fifth repetition was used as a reference.

2.4. The parameters of the glottal volume flow

There are a number of different ways to parameterize the glottal volume velocity waveform (Klatt & Klatt, 1990; Fant, 1986; Cranen & Boves, 1987). Although we do not believe that it is the best model from a physiological point of view, we will use the Liljencrants-Fant (LF) model in this paper, mainly because it seems to be the model used in most recent studies. It seems that many of the features of this model are motivated from a perceptual point of view, i.e. by the ease with which they allow one to approximate or explain (spectral) characteristics of the speech wave that are important from a perceptual point of view. However, most of the parameters can also be related to what is known about the physiology of phonation. Specifically, the LF-model allows one to describe the maximum amplitude of the flow during the open glottis interval (U0), the duty cycle of the flow pulses, the amount of skewing of the pulses, the amplitude of dUg/dt at the moment of glottal closure (Ee) and the time delay between the moment of major vocal tract excitation and the instant where the glottal flow becomes quasi-constant (Ta).

2.5. Calculation of glottal volume flow

Of course, no direct recordings of the glottal volume flow were made; this signal is derived from the speech waveform by means of inverse filtering. Closed Glottis Interval Covariance LPC was used to estimate the parameters of the inverse filter. In de Veth, Cranen, Strik, and Boves (1990) it was shown that this procedure outperforms more complicated ones that attempt to estimate the parameters of the inverse filter by means of Robust ARMA analysis.

Inverse filtering yields an estimate of dUg/dt. Integration of this signal gives the flow signal. For the present article we only wanted to measure peak glottal flow U0, excitation strength Ee, and Ptr for each voiced period. The value of Ee is obtained by taking the minimum of the differentiated flow in each pitch period. Likewise U0 is found by looking for the maximum of the flow signal. Ptr is measured at the moment of maximum glottal flow. Its value is obtained from a low-pass filtered pressure signal.

Inverse filtering was done on the fifth utterance, because that is the one that is used as a reference in the method of non-linear time-alignment. Inverse filter results were obtained for all voiced periods, including vowels and voiced consonants.

3. Results

3.1. Control of fundamental frequency

The relation between Psb and F0 has been the object of quite a number of experimental (e.g. Atkinson, 1978; Collier, 1975; Maeda, 1976; Strik and Boves, 1989) and modelling studies (e.g. Ishizaka and Flanagan, 1972; Titze and Talkin, 1979). Yet, the details of this relation remain unclear. Estimates of the F0 to Psb ratio from speech and special phonation tasks resulted in values between 5 and 15 Hz/cm aq. (Collier, 1975; Maeda, 1976; Strik and Boves, 1989). In another type of experiment pressure variations are induced externally. The F0 to Psb ratios measured in these experiments tend towards values of 2-5 Hz/cm aq. (Baer, 1979; Strik and Boves, 1989). Strik and Boves (1989) showed that the ratio of an F0 change resulting from a Psb change alone probably is the same in both experiments, viz. 2-5 Hz/cm aq. In "normal" speech there are other factors that control F0, especially the laryngeal muscles. Due to the simultaneous operation of these factors the ratio of total F0 change to Psb change in utterances is often larger than 2-5 Hz/cm aq. The latter ratio is in agreement with the ratio of 2-3 Hz/cm aq. that was found by Ishizaka and Flanagan (1972) for their self-oscillating two-mass model.

Furthermore, it seems that in most experiments, and therefore in most presently existing models, the effects of Por on F0 are not sufficiently taken into account. Probably this is due to the fact most experiments were done with sustained vowel phonation, in which the variation in Por is much smaller than in normal speech. Strik and Boves (1988) studied the relation of F0 to Psb, Por, and Ptr in connected speech. The median signals for the 29 sentence repetitions of this experiment, obtained with the method of non-linear time-alignment, are shown in Fig. 1. These signals were used to calculate correlations between the variables of interest. In Table I the correlations are given for a long voiced interval, while Table II contains the same correlations for all voiced frames.

The most important conclusion that can be drawn from the data in Tables I and II is that the pattern of correlations between Psb, Por, Ptr, and F0 depends very much on the part of the utterance over which the measurements are taken. If measurements are limited to a single voiced interval, Ptr (and Por) are much better predictors of F0 than Psb (see Table I). When measured over a complete utterance, however, Psb and Ptr explain essentially the same proportion of the variation in F0 (see Table II). This is due to the fact that the range of Psb in individual voiced intervals is rather small (see Table I). The range spanned by Ptr, on the other hand, is much wider, because of the fact that Por varies between Psb in voiceless stops and zero in open vowels. In a complete declarative utterance, on the other hand, the correlation between Psb and F0 is much enhanced by the fact that both show some amount of declination. The data in the Tables were obtained from a single subject and therefore should be verified on a larger population. Yet, from a physiological point of view (as well as on statistical grounds) they seem to be quite plausible.

Our results show that one must be very cautious in interpreting the outcomes of experiments on physiological control of F0 (and all other speech parameters, for that matter). Such caution is, of course, the more necessary with respect to single subject studies, like our present study. One must be especially cautious in generalizing the results of experiments to other situations than those under which they were obtained. In fact, only results that can be explained by a fairly comprehensive model may be generalized to situations where a similar model can be assumed, operating in the same regime. We are confident that the conclusion of our investigation are supported by a sufficiently complete model.

3.2. Control of Intensity and Voice Quality

Even if the relation between Psb and F0 has received some attention in the literature, one still must be aware that the effects of Psb and laryngeal configurations are not limited to F0; on the contrary, factors like the acoustic power generated at the glottis and the waveshape of the glottal volume flow pulses are also affected. These relations are much less studied. That may, at least in part, be due to the assumption that voice intensity and voice timbre are of less importance from a linguistic point of view. However, if it comes to a better understanding of the fundamentals of phonation and of para-linguistic phenomena like voice quality and its variations, radiated intensity and details of the glottal volume velocity waveform become of crucial importance. In the present study we contribute some measurement data related to control of IL and voice quality obtained from connected speech and show how these data can fit in with modelling research.

3.2.1. The relation between IL and Pressure

It has long been known that there must be a relation between Psb and IL, if only because Psb is the major source of phonatory energy (cf. Rubin, 1963). Most measurement data on the relation between Psb and IL seem to stem from in vitro experiments, however, or at best from experiments where sustained vowels were produced with intensity and pressure variations spanning a range larger than usually found in speech (e.g. Bouhuys, Mead, Proctor, and Stevens, 1968; Cavagna and Margaria, 1968; Isshiki, 1964; Tanaka and Gould, 1983).

In our own investigation of the best predictor of IL in the production of voiced speech sounds, we found that Ptr outperforms Psb by far (Strik and Boves, 1988). The result is true both on a local (i.e. within words or voiced intervals) and on a global level (i.e. looking over complete utterances). In both situations the correlation between Ptr and IL exceeds 0.92, while the correlation with Psb is at most 0.49 (when measured over a complete sentence, see Tables I and II). So, at least for this subject, it seems that Ptr is more important in the control of IL than Psb.

Bouhuys et al. (1968), Cavagna and Margaria (1968), Isshiki (1964), and Tanaka and Gould (1983) all found high correlations between IL and the logarithm of Psb when subjects produced sustained vowels. For sustained vowel phonation Por is almost constant and close to zero and, as a result, Ptr is almost equal to Psb. In our data Por, Ptr, and IL vary quickly and considerably, while Psb decreases slowly during the course of the utterance (Fig. 1). This explains why in our data the relation between IL and Psb is rather weak.

Except for the correlation the regression coefficient is also of importance, because it predicts the amount of change in IL due to a given change in Ptr. In order to be able to compare our findings with previous results we calculated the regression equation between IL and the logarithm of Ptr. Based on the 293 voiced frames of the median signals of the current experiment (see Fig. 1) we found the following relation

IL = 41.6 + 30.3*log(Ptr) (N = 293, R = 0.90)

Or, in other words, the intensity (I) of the radiated speech wave is proportional to Ptr to the power 3.03. Interestingly enough the value of the power in the resulting relation between I and Ptr is quite comparable to results reported in the literature about the relation between I and Psb. For sustained vowel phonation Cavagna and Margaria (1968) found a value of 3.0 +/- 1.0, Isshiki (1964) found a value of 3.3 +/- 0.7, and Tanaka and Gould (1983) found a value of 3.18; while Bouhuys et al. (1968) reported a value of 3.0 for singing.

At a first glance it seems strange that comparable regression equations are found for different relations (IL and Ptr vs. IL and Psb), obtained for different kinds of speech (normal conversational speech vs. sustained phonation) and different ranges of IL and pressure (2-7 cm aq. vs. 2-60 cm aq.). But closer inspection reveals that both relations are not really different. For sustained vowel phonation, and singing of constant tones, Por usually is close to zero and Psb and Ptr are almost equal. Therefore, for these modes of phonation, the relations between IL and Ptr and between IL and Psb are very similar. The conclusion is that the relation between IL and Ptr obtained by Bouhuys et al. (1968), Cavagna and Margaria (1968), Isshiki (1964), and Tanaka and Gould (1983) for sustained phonation and large ranges of IL and Psb is comparable to the relation obtained in this experiment for normal conversational speech.

It may still be that Psb is an important factor in the control of IL, certainly if it is varied over ranges that are much wider than normally found in conversational speech but that are not unusual in singing or in very loud speech. But our data suggest that the faster variations of IL related to articulatory manoeuvres are primarily determined by variations in Por that cause similar variations in Ptr, whereas the gradual decrease of IL observed during many (declarative) utterances in a large number of languages is caused by a gradual decrease in Psb. The finding that IL is mainly controlled by Ptr makes it interesting to further investigate the detailed way in which IL is influenced by Ptr via the characteristics of the glottal volume flow.

3.2.2. Flow waveform characteristics and Ptr

From the literature it is known that the parameters Ee and U0 in the LF-model have most effect on IL (Gauffin and Sundberg, 1989). Thus we measured Ee and U0 for all 181 pitch periods of the fifth repetition, for which reliable inverse filter results could be obtained. Most of these periods pertained to vowels, but a substantial part comes from voiced consonants. We wanted to examine the relation of Ee and U0 to Ptr.

The relation between Ee and Ptr is shown in Fig. 2a. It seems as if this relation shows three different regimes. The bulk of the samples (148 out of a total of 181) falls into the category of, what we call, steady phonation. For the data of this category an exponential fit (R = 0.79, see Fig. 2a) is slightly better than a linear fit (R = 0.73). The second category consists of the pulses in V-UV transitions (i.e. both V/UV and UV/V transitions). For this category Ee is often relatively lower, compared to steady phonation, especially at the beginning of voicing. On the other hand, for the vowel /a/ from the very last syllable of the utterance, Ee is relatively higher (the reasons for taking the utterance final syllable apart are more fully explained in section 3.3).

The correlation between U0 and Ptr is depicted in Fig. 2b. Again, for the data of the category 'steady phonation', an exponential fit (R = 0.72, see Fig. 2b) is somewhat better than a linear fit (R = 0.65). The data for the vowel /a/ of the last syllable still deviate considerably from the regression line, while the data for V-UV transitions are scattered on both sides of the regression line.

From a look at the spectra of the glottal flow waves it is immediately apparent that the spectral slopes in the three regimes are quite different. In the pitch pulses taken from vowel onsets and from the final stressed syllable the spectral tilt is much steeper than in the central pitch periods of the vowels taken from the beginning and middle of the utterance. The slope difference is more than large enough to have clear perceptual consequences. Thus, the observed effects in voice quality are of sufficient interest to take them into account in the description of speech production and to model them in high quality speech synthesis.

Although Ee and U0 appear as separate parameters in the LF-model, they are not unrelated themselves, since Ee is dUg/dt at the moment of major excitation. Thus, if U0 increases, Ee should also increase, everything else being equal. Therefore, we looked at the relation between Ee and U0, which is shown in Fig. 2c. For steady phonation the correlation between Ee and U0 is very high (viz. 0.80). Apparently the effect of other parameters (like T0, skewness, and duty cycle) on this relation is not large in "normal" speech.

3.3. The utterance final syllable

Towards the end of the utterance F0, IL, Ptr, and Psb decrease substantially, while there is a marked increase in the SH activity during the last syllable (see Fig. 1). This phenomenon, the so called final fall, is observed more often (Collier, 1975; Maeda, 1976; Strik and Boves, 1989). Presumably, the larynx returns to its rest position, and the lowering of the larynx already starts before phonation has stopped (Maeda, 1976). One would expect that these gross changes in the posture of the larynx should affect the mode of vibration of the vocal folds. This observation motivated a seperate study of the glottal flow pulses in the utterance final vowel.

The fact that the characteristics of the vowel in the utterance final syllable deviate from those of the preceding vowels was also observed by Klatt and Klatt (1990). For the last syllable they found increased noise in the F3 region of the spectrum, indicating a greater glottal airflow. But they also found a weaker first harmonic (relative to the amplitude of the second harmonic) in an utterance final syllable, indicative of a pressed voice with a slightly smaller open quotient. Therefore they introduced a novel breathy-laryngealized mode of vibration.

We tried to verify their hypothesis by comparing the data of the (stressed) vowel /a/ of the last syllable, with the data of the first (unstressed) vowel /a/ in the utterance. The spectrum of the utterance final vowel indeed showed increased noise at frequencies above roughly 1.4 kHz. But the amplitude of the first harmonic (relative to the amplitude of the second harmonic) was about 1.5 dB stronger, and the open quotient was approximately 50% in both vowels. Consequently, there is evidence for a breathy mode of phonation at the end of this utterance, but not for laryngealization.

Generally, everything else being equal, a decrease in Ptr would lead to a decrease in U0 (Ishizaka and Flanagan, 1972). Ptr decreases from 5.5 cm aq. for the first vowel /a/, to 4.2 cm aq. for the last vowel /a/, but the amplitude of the AC-component of glottal flow (U0) increases with roughly 6%. Substantial differences in the degree of adduction are not likely, since the open quotient is about 50% in both vowels. Presumably in the utterance final vowel the vocal folds are slackened, either to facilitate the maintenance of voicing with decreased Ptr, or due to a general relaxation of muscular activity and a preparation for breathing at the end of an utterance.

Comparing both vowels /a/ it is observed that there is a decrease in Ee of approximately 14% in the last vowel, even though U0 increases slightly. We found that, generally, the effect of other parameters (like T0, skewness, and duty cycle) on the relation between Ee and U0 is not very large for the data of the present experiment (see Fig. 2c). But for the last vowel /a/ T0 is substantially larger than T0 of the first vowel /a/. After correction for this temporal difference, i.e. when the same number of flow pulses are plotted on the same horizontal scale for both vowels, no major differences in the shape of the glottal volume flow are observed. Consequently, the change in U0 (+6%) combined with the change in F0 (-20%) determines the change in Ee (-14%). The fact that, apart from time-stretching, no major differences were found in duty cycle and shape of the glottal pulses between both vowels /a/, also indicates that the degree of adduction has not changed substantially.

4. Conclusion

In this paper we have shown that the control of F0, IL and voice quality in normal speech may be somewhat different from what is known from the literature on studies based on sustained vowels or singing. In speech Ptr seems to be more important than Psb, mainly because Por cannot be considered as constant and negligible. Also, it was shown that the relative importance of physiological parameters that affect F0, IL and voice quality depends very much on the nature of the speech from which they are derived. Although the results are based on a single subject study, they fit in very nicely with current models of the physiology of phonation.

Especially from the results on the control of IL and voice quality it became clear that descriptive mathematical models of the glottal flow waveform do not allow one to make the step from description to explanation. High correlations were found for Ee and Ptr, and for U0 and Ptr. But in the LF-model there is no relation between Ee and Ptr, or between U0 and Ptr, for the simple reason that Ptr does not figure in the model. Thus, the LF-model will never allow one to explain these relations, or why several different regimes should exist in the relation between Ptr and basic parameters in the model. One will have to take recourse to models that have a firm physiological basis, like the ones proposed in Titze (1984) and Cranen (1990).

Acknowledgements

This research was supported by the Foundation for linguistic research, which is funded by the Netherlands Organization for Scientific Research, N.W.O. Special thanks are due to Philip Blok M.D. who inserted the EMG electrodes and the catheter in the present experiment.

References

Atkinson, J.E. (1978) Correlation analysis of the physiological features controlling fundamental voice frequency, Journal of the Acoustical Society of America, 63, 211-222.

Basmajian, J.V. (1967) Muscles Alive, their functions revealed by electromyography (second edition). Baltimore: The Williams & Wilkins company.

Baer, T. (1979) Reflex activation of laryngeal muscles by sudden induced subglottal pressure changes, Journal of the Acoustical Society of America, 65, 1271-1275.

Bouhuys, A.; Mead, J.; Proctor, D.F. and Stevens, K.N. (1968) Pressure-Flow Events during Singing. In Annals of the New York Academy of Sciences (M. Krauss, M. Hammer, & A. Bouhuys, editors), Vol. 155, pp. 165-176.

Boves, L. (1984) The phonetic basis of perceptual ratings of running speech, pp. 73-78. Dordrecht: Foris Publications.

Cavagna, G.A. and Margaria, R. (1968) Pressure-Flow Events during Singing. In Annals of the New York Academy of Sciences (M. Krauss, M. Hammer & A. Bouhuys, editors), Vol. 155, pp. 152-164.

Collier, R. (1975) Physiological correlates of intonation patterns, Journal of the Acoustical Society of America, 58, 249-255.

Cranen, B. (1990) Simultaneous modeling of EGG, PGG and Glottal Flow. To appear in Proceedings of the sixth Vocal Fold Physiology Conference, Stockholm.

Cranen, B. and Boves, L. (1985) Pressure measurements during speech production using semiconductor miniature pressure transducers: Impact on models for speech production, Journal of the Acoustical Society of America, 77, 1543-1551.

Cranen, B. and Boves, L. (1987) The acoustic impedance of the glottis: modeling and measurements. In Laryngeal function in phonation and respiration (Th. Baer, C. Sasaki and K. Harris, editors), pp. 203-218. Boston: College-Hill.

Fant, G. (1986) Glottal flow: Models and interaction, Journal of Phonetics, 14, 393-400.

Fourcin, A.J. (1974) Laryngographic examination of vocal fold vibration. In Ventilatory and phonatory control systems (B. Wyke, editor), pp. 315-326. London: Oxford University Press.

Gauffin, J. and Sunberg, J. (1989) Spectral correlates of glottal voice source waveform characteristics, Journal of Speech and Hearing Research, 32, 556-565.

Hirose, H. (1971) Electromyography of the Articulatory Muscles: Current Instrumentation and Techniques, Haskins Laboratory Status Report on Speech Reasearch, SR-25/26, 73-86.

Isshiki, N. (1964) Regulatory mechanisms of voice intensity variation, Journal of Speech and Hearing Research, 7, 17-29.

Ishizaka, K. and Flanagan, J.L. (1972) Synthesis of voiced sounds from a two-mass model of the vocal cords, Bell Systems Technical Journal, 51, 1233-1268.

Klatt, D.H. and Klatt, L. (1990) Analysis, synthesis, and perception of voice quality variations among female and male talkers, Journal of the Acoustical Society of America, 87, 820-857.

Maeda, S. (1976) A characterization of American English intonation, Ph.D. thesis, MIT, Cambridge.

Rubin, H.J. (1963) Experimental studies on vocal pitch and intensity in phonation, The Laryngoscope, 8, 973-1015.

Strik, H. and Boves, L. (1988) Data processing of physiological signals related to speech. In Proceedings of the Dept. of Language and Speech, Phonetics Section, Nijmegen University, pp. 41-56.

Strik, H. and Boves, L. (1989) The fundamental frequency - subglottal pressure ratio. In Proceedings of EUROSPEECH-89, Vol. 2, 425-428.

Strik, H. and Boves, L. (1991) A DP algorithm for time-aligning physiological signals related to speech. This issue.

Tanaka, S. and Gould, W.J. (1983) Relationships between vocal intensity and noninvasively obtained aerodynamic parameters in normal subjects, Journal of the Acoustical Society of America, 73, 1316-1321.

Titze, I.R. (1984) Parameterization of glottal area, glottal flow, and vocal fold contact area, Journal of the Acoustical Society of America, 75, 570-580.

Titze, I.R. and Talkin, D.T. (1979) A theoretical study of various laryngeal configurations on the acoustics of phonation, Journal of the Acoustical Society of America, 66, 60-74.

Veth, J. de; Cranen, B.; Strik, H.; and Boves, L. (1990) Extraction of control parameters for the voice source in a text-to-speech system. In Proceedings of ICASSP-90, paper 21.S6a.2

Figure captions

Figure 1. Median physiological signals, obtained by the method of non-linear time-alignment. Plotted are, from top to bottom, F0, IL, Ptr, Por, Psb, Vl, SH, and VOC.

Figure 2. Scatterplots of respectively (a) Ee and Ptr; (b) U0 and Ptr; and (c) Ee and U0. Given are regression lines for exponential or linear fits, and the correlation coefficients for the fits for the data of the category 'steady phonation'. Ee and U0 values are given relative to the maximum observed value for each quantity.

Tables

Table I. Correlation matrix, means and standard deviations of the median physiological signals for a voiced interval (N=66, |R|>0.315 for p<0.01).

F0 IL Ptr Por Psb mean SD

F0 1.000 0.808 0.851 -0.783 0.478 118.58 3.70

IL 1.000 0.960 -0.983 0.111 63.23 3.38

Ptr 1.000 -0.968 0.274 5.42 0.88

Por 1.000 -0.054 1.16 0.91

Psb 1.000 6.35 0.16

Table II. Correlation matrix, means and standard deviations of the median physiological signals for all voiced frames (N=293, |R|>0.151 for p<0.01).

F0 IL Ptr Por Psb mean SD

F0 1.000 0.667 0.729 -0.153 0.772 115.87 8.59

IL 1.000 0.923 -0.663 0.492 62.20 4.25

Ptr 1.000 -0.638 0.612 4.95 1.17

Por 1.000 0.211 0.89 0.95

Psb 1.000 5.65 0.90

Last updated on 22-05-2004