ArticlePDF Available

Abstract and Figures

The relation between syntax and prosody is evident, even if the prosodic structure cannot be directly mapped to the syntactic one and vice versa. Syntax-to-prosody mapping is widely used in text-tospeech applications, but prosody-to-syntax mapping is mostly missing from automatic speech recognition/understanding systems. This paper presents an experiment towards filling this gap and evaluating whether a HMM-based automatic prosodic segmentation tool can be used to support the reconstruction of the syntactic structure directly from speech. Results show that up to 85% of syntactic clause boundaries and up to about 70% of embedded syntactic phrase boundaries could be identified based on the detection of phonological phrases. Recall rates do not depend further on syntactic layering, in other words, whether the phrase is multiply embedded or not. Clause boundaries can be well assigned to intonational phrase level in read speech and can be well separated from lower level syntactic phrases based on the type of the aligned phonological phrase(s). These findings can be exploited in speech understanding systems, allowing for the recovery of the skeleton of the syntactic structure, based purely on the speech signal.
Content may be subject to copyright.
   
   
 
György Szaszák1and András Beke2
1     
       
2   
     

Keywords:
prosody,
syntax,
phonological
phrase,
boundary
detection
          
         
         
       
      
         
        
         
           
          
          
           
         
           
          
          
        
           

Journal of Language Modelling       
György Szaszák, András Beke

        
          
         
          
         
       
          
         
          
         
           
        
             
           
         
        
           
         
           
          
           
         
            
 
         
          
          
           
        
          
          
prosodic structure hypothesis       
            
         
         
         
  
Prosody for Syntactic Boundary Detection
          
            
    
         
         
         
          
       
         
       
          
         
           
          
         
          
         
    
         
        
          
       
            
         
           
          
        
          
        
         
       
         
     segmental domain    
          
         
 suprasegmental domain          
         
          
  
György Szaszák, András Beke
          
       
         
          
       
       
          
       
          
           
        
       
        
   
         
           
         
            
           
        
           
         
       
          
         
          
             
           
          
          
         
           
      
       
          
        
         

  
Prosody for Syntactic Boundary Detection
           
         
         
             
      
       
       
          
          
          
            
           
       
          
        
 
           
          
         
         
         
         
         

 
         
         
         
           
        
          
          
         
        

  
György Szaszák, András Beke
 Specicities of Hungarian syntax
         
        
           
           
          
           
          
           
           
         
          
           
        
         
         
          
            
            
         
         
 
        
          
         
   
   
         
            
           
         
            
             
           
            
            
  
Prosody for Syntactic Boundary Detection
           
            
    
 Syntactic phrasing
        
          
          
         
           
      phrase-structure grammar 
  lexical databases   morphological analyser  
        
        
         
         
        
         
          
          
  
        head-driven
        
           
        
           
          
            
            
          
[[<<Gróf(NP)>Vásárhelyi(NP)> <Görögországban(NP)>
<kötött ki(VV)>Clause)] és(Conj) [<titkárul(NP)><szerződtette(VV)>
<a(Art) <főkonzul(NP)>lányát(NP)>(Clause)] (Sentence)]
 Morphological analysis
and disambiguation of syntactic analysis
          
          
  
György Szaszák, András Beke
           
Gróf Vásárhelyi Görögországban kötött ki, és titkárul szerződtette a főkonzul lányát
Count Vásárhelyi docked in Greece, and hired the daughter of the consul as his sec-
retary
           
           
   
         
         
           
          
       
         
           
            
           
           
          
 
  
 
 Prosodic hierarchy model
            
prosodic structure hypothesis       
  
Prosody for Syntactic Boundary Detection
           
Gróf Vásárhelyi Görögországban kötött ki, és titkárul szerződtette a főkonzul lányát
         
    intonational phrase     
 phonological phrase     
         
         
       
            
          
          
          
         
         
         
         
         
        
              
         
      Gróf Vásárhelyi Görö-
gországban kötött ki, és titkárul szerződtette a főkonzul lányát Count
Vásárhelyi docked in Greece, and hired the daughter of the consul as a sec-
retary          
   [[<Gróf Vásárhelyi> <<Görögországban> <kötött ki
és>>][<<titkárul> <szerződtette a>> <főkonzul lányát>]]
        
    phonological word   
            
          
           
          
  
György Szaszák, András Beke
         
            
          
          
      
          
         
        
        
          
        
           
          
       
          
         
          
          
     
       
        
        
          
          
             
           
         
          
        
         
            
          
         
          
         
        
           
        
  
Prosody for Syntactic Boundary Detection
Prosodic label Description
   
   
   
    
     
  
 
   
  

             
           
        
       
             
        
           
          
     
       
        
 co    ce     
         
  ss     cr  
        
           ls 
        
         
          
          
          
        
        
        
           
         
  
György Szaszák, András Beke
             
  Gróf Vásárhelyi Görögországban kötött ki, és titkárul szerződtette a
főkonzul lányát        
        
 Automatic alignment of phonological phrases
        
        
        
        
         
          
            
      
        
              
        
         
            
          

        
          
        
      
           
  
Prosody for Syntactic Boundary Detection
          
     Gróf Vásárhelyi Görögországban kötött ki,
és titkárul szerződtette a főkonzul lányát
          
         
           
       clause onset (co) 
         
           
    continuation rise (cr)       
         low clause ending
(ce)          
           
        
           
              
           
        
        
         
           
           
         
           
          
         
          
  
György Szaszák, András Beke
           
            
         
          
            
          

       
        
       
         
            
           
         
         
         
         
      co    
      ce     
  cr       
       
ss      cr     
     ss  ms  
       ss    
           
    ls     
       cr  
   sil      
         
            
        
          
 Acoustic-prosodic pre-processing
       
         
         
             
  
Prosody for Syntactic Boundary Detection
              
            
            
             
          
        
            
          
         
        
 Training of the prosodic segmenter
        
           
        
         
       
          
           
          
        
        
         
          
      
 Initial testing of the prosodic segmenter
          
        
          
           
               
             
             
   
        
         
         
  
György Szaszák, András Beke
         
       
       
         

       
 =𝑡𝑝
𝑡𝑝+𝑓𝑛,
 𝑡𝑝          
          
   𝑓𝑛         
        

   
 =𝑡𝑝
𝑡𝑝+𝑓𝑝,
 𝑓𝑝       
           
        
       
      
    𝜎    
      
𝜎=1
𝑡𝑝

𝑡 𝑡
, 
 𝑡𝑝         
          
𝑡      𝑖    𝑡
          
        𝜎= 50.4 
         
     𝑡𝑝   
    𝑡𝑝
 =𝑡𝑝
𝑡𝑝 .
       
  
Prosody for Syntactic Boundary Detection
 Prosodic segmentation vs. word boundaries
         
        
      
         
         
          
          
        
        
          
         
        
        
         
           ±
        
        
              
         
        
          
        

  
            
       
         
          
          
           
           
          
            
            
  
György Szaszák, András Beke
        
    
 Material and method
          
         
         
         
         
         
            
             
         
       
        
         
          
         
         

        
          
        
           
          
           
       
         
          
           
          
            
          
     
       
        
       
            
  
Prosody for Syntactic Boundary Detection
  −1−2−3−4       
        −1  
    −2       −4 
  −1  −2          
    −3    −4  
            

         
         
         
        
         
         
      
            
     
            
       
          
      
        
           
 
        
          
          
        
         
  
 Recovering syntactic phrase boundaries
         
       
  recall        𝑡𝑝 
       
 𝑓𝑛        
          
  
György Szaszák, András Beke
 
  
  
 
  
  
   
  
   
 
 
 
     
 1B/W  1B/W  
0.85  0.79  
 0.45  0.48  
 0.42  0.48  
 0.44  0.45  
 0.48  0.50  
 0.54  0.55  
           
        
      
          
       
           
          
          
         
        −1  −2  
            
           −1  −2   
            
 
          
          
           
         
         
        
         
 𝜒= 6430.606;𝑝 < 0.000
        
          
          
    𝑍 = −7.807;𝑝 < 0.000 
  
Prosody for Syntactic Boundary Detection
  −1  −2        
       𝑍 = −0.407,𝑝 > 0.1  
𝑍 = −0.016;𝑝 > 0.1       
 −3  −4
 < 0        
            
          
            
        
𝜒= 0.224;𝑝 > 0.1        
        
          
         
  
 Reliability of the syntactic phrase recovery
           
        sil  
          
              
              
         𝑡𝑝  
        
    𝑓𝑝    
         
          
          
            co  
           
      ce      
0        
      ce     
         
       ce  cr  
          
  0     
          
    
  
György Szaszák, András Beke
          
           
 Towards a reconstruction of syntactic layering
            
          
            
         
            
            
          
          
       
        
          
         
            
          
           
        
         co  
           
           
      −1     
       ssmsce    
 cr    ls      
           
       −2  
  
Prosody for Syntactic Boundary Detection
       
    
 0.86    
  0.78   
  0.83   
  0.80   
 0.22 0.72   
 0.50 0.41   
     
 
  
  
   
  
 
  
  
       
    
  0.74   
  0.68 0.20  
  0.68 0.18  
 0.83    
 0.60 0.28   
  0.64 0.17  
     
 
  
  
   
  
 
  
  
        
          
          
        
           
          
      −1    
  −1          
    
  ce      
            
       cr     
         −1  
         
  
György Szaszák, András Beke
   co    −1    
     ssms  ls   
 −1  −2     −1   
          
    
 Head classication of the syntactic phrase
            
         
          
          
     𝜒= 0.349;𝑝 > 0.1   
          
          
        
          
          
      
         
          
         
          
        
 Robust intonational phrase – clause recovery
        
       ms  ls 
         
  ss     ms    ls 
         
         
         
          
            
           
        
          
          
  
Prosody for Syntactic Boundary Detection
       
    
     
     
     
     
     
    
 
  
    
  
  
  
   
     
 
       
    
     
     
     
     
     
    
 
  
    
  
  
  
  
    
   
  ms  ls     ss   
    ss     
         
         
             
            
       
           
     

          
          
        
            
  
György Szaszák, András Beke
          
           
     
       
         
        
          
        
         
          
       
            
 
          
            
        
           
         
           
          
          
          
            
           
  −1        
        
         
  
Prosody for Syntactic Boundary Detection
          
          
 
          
       
         
        
          
       
       
          
       
        
       
 

        
          
       
          
   
       
         
        
     

           
    Proc. of the 6 International Symposium on
Computational Intelligence  
              
       
    Proc. Eurospeech 2001, Vol. 4.  
 
         
    Speech prosody    
  
György Szaszák, András Beke
          
        Proceedings
of the 1992 DARPA Speech and Natural Language Workshop  
            
        
 International Conference on Spoken Language Processing
          
         
Journal of Memory and Language    
         
        
   
          Generalized Phrase
Structure Grammar       

          
         
       Proceedings of
ISCA Tutorial and Research Workshop on Prosody in Speech Recognition and
Understanding     
         
   Arfticial Intelligence    
      Rhythmic and interface categories in prosody 
         
         
       
 Proc. 6th European Conference on Speech Communication and
Technology (Eurospeech 99)    
         
 Academic Press  
        Cambridge University Press 
           
       International Journal of Speech
Technology    
            
         Biological
Psychology        
           
           
  IEEE Trans     
  
Prosody for Syntactic Boundary Detection
        
University of Chicago Press
        
      Journal of the Acoustical Society of
America      
          
Proc. of the 4 International Conference on Speech and Language Processing
     
   The Syntax-Phonology Interface     
        
    
          
       
    
         
        
  
          
      Proc. ARPA Workshop on Human Language
Technology  
      
           
NeuroImage      
          
         
  
          
        
     
            
          
       
        
         
 
         
        International Journal
of Speech Technology      
  
György Szaszák, András Beke
          
 Speech Communication      
   Prosody and recursion   
This work is licensed under the Creative Commons Attribution 3.0 Unported License.
http://creativecommons.org/licenses/by/3.0/
  
... Previous research on speech prosody revealed that prosody can be usefully exploited to perform automatic phrasing (a segmentation for phrases) of input speech, where phrases can be prosodic phrases [5], phonological phrases [6], or even further down in the prosodic hierarchy, prosodic words [7]. The depth of this phrasing in terms of the prosodic hierarchy is to some extent language specific [6]. ...
... Previous research on speech prosody revealed that prosody can be usefully exploited to perform automatic phrasing (a segmentation for phrases) of input speech, where phrases can be prosodic phrases [5], phonological phrases [6], or even further down in the prosodic hierarchy, prosodic words [7]. The depth of this phrasing in terms of the prosodic hierarchy is to some extent language specific [6]. In several studies, such prosodic phrasing (or boundary information based on prosody) was successfully used to improve speech recognition [8] or understanding [9]. ...
... Since in the majority of languages, prosody does not yield so rich information about word boundaries, and as keywords, even if reflected somewhat by prosody, may occur in different grammatical role and hence with quite different prosodic realization, this approach does not seem applicable in our case. The proposed approach in this paper consists in exploiting phonological phrasing information, especially as previous research has shown that -at least for fixed-stress languages, like Hungarian -(phonological) phrase boundary information can be powerfully used to perform partial word-boundary detection [6], [8]. ...
... Halliday and Hasan (1976) later were in line with the Prague School in proposing that one of the uses of intonation in English is to sign up which information the utterer is considering new and which information the utterer is considering given. Apparently, nowadays, the discourse of 'given' and 'new' information to the stretch of syntactic structures that are attached to realize the categories of information has become one of the topics of interest in the linguistic area (Kashiwadate, Yasuda, Fujita, Kita, & Kobayashi, 2020;Szaszák & Beke, 2012;Chen & Zechner, 2011). Halliday and Hasan (1976) proposed that the speaker tries to signify the essence of the utterance (the foundational unit in his grammatical analysis). ...
Article
Full-text available
This study seeks to figure out how certain new and given information as information structure of syntactic forms are revealed in Barack Obama’s remarks in Jakarta. The study is focused on the beginning parts of Obama’s remarks, as in that he recalled his childhood memories of staying in Jakarta, Indonesia, for four years. In order to investigate the information structures, we collected the data from digital documents (scripts and videos) of the remark; then, we analyzed the syntactic forms of article “a” (indefinite) and “the” (definite) and also the rheme and theme of the script and the video of the remarks using close textual analysis. The results indicate that the uses of these articles construct certain messages whose tones are either distancing, getting close, or neutralizing the speaker against the audience. Furthermore, the information contained in Obama’s speech reflected the context-awareness of the speaker and also the audience. The speech could also open up further study on (political) critical discourse analysis, as it was delivered in the political contexts between Indonesia and the USA.
... Halliday and Hasan (1976) later were in line with the Prague School in proposing that one of the uses of intonation in English is to sign up which information the utterer is considering new and which information the utterer is considering given. Apparently, nowadays, the discourse of 'given' and 'new' information to the stretch of syntactic structures that are attached to realize the categories of information has become one of the topics of interest in the linguistic area (Kashiwadate, Yasuda, Fujita, Kita, & Kobayashi, 2020;Szaszák & Beke, 2012;Chen & Zechner, 2011). Halliday and Hasan (1976) proposed that the speaker tries to signify the essence of the utterance (the foundational unit in his grammatical analysis). ...
Article
Full-text available
The use of mobile technology in learning and teaching English has been on the rise all over the world over the past few decades and hence, has received considerable attention and importance from the academics in recent years. As a result, a number of experimental studies have so far been carried out about the use and effectiveness of mobile phones in the teaching/learning process. However, there have been only a small number of studies on the topic of mobile-assisted listening comprehension. This study basically aims to explore whether Mobile Assisted Language Learning (MALL) is effective in teaching/learning listening to the students of university-level English language programs and could better enhance students’ listening ability. It also endeavors to assess why some MALL strategies/techniques are more effective than the others. For review purpose, the study exclusively used the secondary data available on the broader topic- the use and efficacy of mobile phones in teaching/learning listening skill. The results of this research indicated that the MALL is meaningfully efficacious in teaching/learning ESL/EFL listening skill and using appropriate strategies could positively contribute to bringing about better learning. Besides outlining a brief overview of MALL, the study also attempts to recommend some practical and useful stratagems that ESL/EFL educators can use while designing MALL listening tasks/activities.
... The work will not cover in detail the task of sentence/phrase boundary detection as this was already deeply studied in [54], [55], [56] for Czech and later in [57] for Hungarian as another fixed-stress language. In [58], performance for Czech was compared to English. ...
Thesis
Full-text available
This doctoral thesis covers the theme of prosody utilization in automatic recognition of continuous speech. Even though automatic speech recognition (ASR) systems have improved immensely over the last several decades, they still lack making use of one of the most important aspect of information using speech, which is a prosody. There have already been proofs from other languages about the favourableness of prosody usage in ASR and doctoral thesis tries to investigate the potential of Czech regarding prosody usage. The research activities can be divided into three main areas: a) pitch detection algorithms (PDA) as needed prerequisite for prosodic feature extraction, b) Czech lexical stress system as potential clue from acoustic signal for word boundary detection (and its usage in ASR) and c) classification of sentence/phrase modality in Czech based purely on an acoustic signal. Firstly, the field of pitch detection algorithms, a framework for their evaluation and comparison is presented. Several new evaluation criteria are proposed as an extension to existing ones together with metrics evaluation over four speech pitch reference databases. Besides pure comparison, few modifications of existing PDA methods are presented. Namely a transition probability function in PDA post-processing is investigated in terms of candidate distance measure and new temporal forgetting principle for speech is brought in as extension of method by time domain. Czech as a fixed-stress language with lexical stress on the first syllable is known to have a weak lexical stress acoustic correlation. Nevertheless, methods of how stressed syllables or stress-group boundaries can be detected from speech signal were investigated. A system with sophisticated feature extraction followed by statistical machine learning methods to model those phenomenon in Czech is presented. Detected stress-group boundaries can be (in most of cases) mapped to word boundaries which can be used for prosodic evaluation of ASR hypothesis. A metric for such prosodic score, which can be directly used in prosodic N-best evaluation or ASR error detection, is proposed. Also, ASR lattice rescoring algorithm for Czech is presented. Czech phrase modality detection from acoustic signal is covered and together with existing phrase boundary detector can such system serve as an punctuation module for Czech dictation ASR system or in Czech dialogue system to support its natural language processing (NLP) part. Keywords: Prosody, speech technology, ASR, F0, pitch, lexical stress, stress group, modality, melodeme, prosodic hypothesis scoring.
... The remaining 6 models model phonological phrases with different properties regarding the strength of the stress and the following intonation contour. The overall approach is documented in [7] in detail. For the current application, the intonation contour is irrelevant (at this stage) only the strength of the stress is extracted to derive a 3 level stress labelling schema: unstressed, stressed and strongly stressed syllables are differentiated. ...
Conference Paper
Stress annotations in the training corpus of speech synthesis systems are usually obtained by applying language rules to the transcripts. However, the actual stress patterns seen in the waveform are not guaranteed to be canonical, they can deviate from locations defined by language rules. This is driven mostly by speaker dependent factors. Therefore, stress models based on these corpora can be far from perfect. This paper proposes a waveform based stress annotation technique. According to the stress classes, four feedforward deep neural networks (DNNs) were trained to model fundamental frequency (F0) of speech. During synthesis, stress labels are generated from the textual input and an ensemble of the four DNNs predict the F0 trajectories. Objective and subjective evaluation was carried out. The results show that the proposed method surpasses the quality of vanilla DNN-based F0 models.
Conference Paper
Full-text available
Automatic classification methods are frequently used in early diagnosis of different diseases that affect speech production. These methods can also be applied to identify speech samples from patients affected by Parkinson's disease (PD) or depressive disorder (DD). This paper is interested in applying automatic stress detection and prosodic phrasing approaches on pathological speech samples in order to assess to what extent these tools can be useful either in characterizing in an unsupervised manner the prosodic attributes of pathological samples from individuals affected by PD and DD, or classifying samples as belonging to healthy or non-healthy individuals. We formulated hypotheses in connection with the duration of phonological phrases and the number of words grouped by them. We also briefly analyzed the phrase distributions. Our results show that healthy and pathological samples can be separated from each other by means of these prosodic analysers, and deep neural network or support vector machine based classifiers built on top of them.
Conference Paper
This paper addresses speech summarization of highly spontaneous speech. Speech is converted into text using an ASR, then segmented into tokens. Human made and automatic, prosody based tokenization are compared. The obtained sentence-like units are analysed by a syntactic parser to help automatic sentence selection for the summary. The preprocessed sentences are ranked based on thematic terms and sentence position. The thematic term is expressed in two ways: TF-IDF and Latent Semantic Indexing. The sentence score is calculated as linear combination of the thematic term score and a sentence position score. To generate the summary, the top 10 candidates for the most informative/best summarizing sentences are selected. The system performance showed comparable results (recall: 0.62, precision: 0.79 and F-measure 0.68) with the prosody based tokenization approach. A subjective test is also carried out on a Likert scale.
Conference Paper
Full-text available
Since the prosody of a spoken utterance carries information about its discourse function, salience, and speaker attitude, prosody models and prosody generation modules have played a crucial part in text-to-speech (TTS) synthesis systems from the beginning, especially those set not only on sounding natural, but also on showing emotion or particular speaker intention. Prosody transfer within speech-to-speech translation is a recent research area with increasing importance, with one of its most important research topics being the detection and treatment of salient events, i.e. instances of prominence or focus which do not result from syntactic constraints, but are rather products of semantic or pragmatic level effects. This paper presents the design and the guidelines for the creation of a multilingual speech corpus containing prosodically rich sentences, ultimately aimed at training statistical prosody models for multilingual prosody transfer in the context of expressive speech synthesis.
Article
Information extraction from written or spoken archives is a challenging infocommunication task, especially if a deep automatic analysis of the information structure is also targeted. The present research investigates focus detection approaching from an automatic analysis point of view for text (NLP) and speech (prosody) modalities. Deep syntactic analysis is performed with an NLP tool on speech transcripts and optionally combined with prosodic features extracted from speech to automatically detect the focus. Results show that in Hungarian, characterized by free word order and strong topic prominence, the detection of the focus based on NLP can be improved by adding prosodic features. Results also reflect however, that for the exploration of focus marking in speech, neither syntax nor prosody are sufficient: it is likely that semantic and pragmatic context also play an essential role in this process.
Conference Paper
Full-text available
Common tasks involving orthographic words include spellchecking, stemming, morphological analysis, and morphological synthesis. To enable significant reuse of the language-specific resources across all such tasks, we have extended the functionality of the open source spellchecker MySpell, yielding a generic word analysis library, the runtime layer of the hunmorph toolkit. We added an offline resource management component, hunlex, which complements the efficiency of our runtime layer with a high-level description language and a configurable precompiler.
Article
Full-text available
We describe three analyses on the effects of spontaneous speech on continuous speech recognition performance. We have found that: (1) spontaneous speech effects significantly degrade recognition performance, (2) fluent spontaneous speech yields word accuracies equivalent to read speech, and (3) using spontaneous speech training data can significantly improve performance for recognizing spontaneous speech. We conclude that word accuracy can be improved by explicitly modeling spontaneous effects in the recognizer, and by using as much spontaneous speech training data as possible. Inclusion of read speech training data, even within the task domain, does not significantly improve performance.
Article
Full-text available
This work assesses the contribution of domain-specific prosodic modelling to synthetic speech quality in a name-and-address information service. A prosodic processor analyzes the textual structure of labelled input strings, and inserts markers which specify the intended prosody for the DECtalk text-to-speech synthesizer. These markers impose discourse-level prosodic organization, annotate the information structure, and adapt the speaking rate to listeners in real time. In a quantitative comparison of this domain-specific modelling with the default rules in DEC-talk, the domain-specific prosody was found to reduce the transcription error rate from 14.6% to 6.4%. reduce the number of repeats requested by listeners from 2.6 to 1.1. and to sound significantly easier to understand and more natural. This result demonstrates the importance of prosodic modelling in synthesis, and implies an even more important role for prosody in more complicated domains and discourse structures.
Article
Clearly written and comprehensive in scope, this is an essential guide to syntax in the Hungarian language. It describes the key grammatical features of the language, focusing on the phenomena that have proved to be theoretically the most relevant and have attracted the most attention. The analysis of Hungarian in the generative framework since the late Seventies has helped to bring phenomena which are non-overt in the English language into the focus of syntactic research. As Kiss shows, its results have been built into the hypotheses that make up universal grammar. The textbook explores issues at the centre of theoretical debates including the syntax and semantics of focus, the analysis of quantifier scope, and negative concord. This useful guide will be welcomed by students and researchers working on syntax and those interested in Finno-Ugric languages.
Article
This paper presents two empirical studies that examine the in- fluence of different linguistic aspects on prosody in German. First, we analysed a German corpus with respect to the ef- fect of syntax and information status on prosody. Second, we conducted a listening test which investigated the prosodic re- alisation of constituents in the German 'Vorfeld' depending on their information status. The results were used to improve the prosody prediction in the German text-to-speech synthesis sys- tem MARY.
Article
A key result of studies in prosodic phonology since the 1970's has been the finding that in language after language phonological processes are localized in the same small set of phonological domains, and do not appear to make use of the vast set of potential domains that are in principle made available by grammatical (syntactic and morphological) structure. Prosodic Hierarchy Theory (Selkirk 1978, Nespor and Vogel 1983, etc.) holds that speech is organized into a set of genuinely phonological domains that form a hierarchy of containment, with each non-terminal constituent made up of a sequence of smaller constituents at the next level down. The guiding idea is that prosodic levels cannot be skipped or repeated (i.e., must be strictly layered, Selkirk 1984). Although this research program has been vastly successful in advancing our understanding of the relation between syntactic/morphological structure and phonological form, many questions, both of detail and of principle, have remained open. Detailed empirical investigations as well as advances in theory have shown that strict layering does not always hold, but rather constitutes a prosodic ideal. Level skipping has been assigned a proper place in the weak layering model of Ito and Mester 1992(2003) (and its optimality-theoretic interpretation by Selkirk 1996). Level repetition is instantiated in the recursive intonational and phonological phrasing demonstrated by Ladd 1986, 1996, Gussenhoven 2005, and others. Given these developments in theory and analysis, it is perhaps time to take stock of the overall model and ask what has been established and what still remains open. First, there are intrinsic—and not just size-related—differences among parts of the hierarchy. Broadly speaking, the word-internal units (syllable, foot, and perhaps mora) are intrinsically defined in terms of sonority-related phonetic factors and speech rhythm, whereas the parsing of higher-level units (prosodic word, phonological phrase, intonational phrase, etc.) is regulated by constraints, alignment-based and other, on the correspondence between syntactic/morphological and phonological constituents. We refer here to the former (smaller prosodic units) as rhythmic categories, and the latter (larger prosodic units) as interface categories. The general form of rhythmic categories (word-internal prosody), with syllables grouped into rhythmic feet which are in turn assembled into a prosodic word, is relatively uncontroversial, apart from questions of detail (such as the status of the mora as a genuine prosodic constituent vs. a property of syllables, etc.). The picture is less clear for the interface categories, even as to the exact number and/or content of the levels of the hierarchy. A large number of different prosodic categories have been proposed in order to provide enough separate domains for different processes, including utterance, intonational phrase, phonological phrase, major phrase, intermediate phrase, minor phrase, accentual phrase, tone group, clitic group, prosodic word, minor word. The totality of these categories has never been instantiated in a single language, however, and their crosslinguistic identification has remained a largely unsolved problem. Even within a single language, the doctrine of strict layering has led to a considerable multiplication of categories. Whenever a process is found to operate in a slightly different domain than some other process, the model required setting up two separate categories. Once repetition of levels (adjunction structures) becomes an option, however, "constituent domain" no longer equals "category", raising the suspicion that perhaps some of the categories proposed in the earlier prosodic