PhD Thesis

home
 
 
In my PhD thesis, I developed probabilistic syllable models for German and English. These models provide information on how certain phonological structures are. Beyond this information, the models can be used for syllabification and grapheme-to-phoneme conversion in a speech synthesis system.
My PhD thesis was supervised by Professor Dr. Grzegorz Dogil and PD Dr. Bernd Möbius.
  • Probabilistic Syllable Modeling Using Unsupervised and Supervised Learning Methods
    AIMS 2002, Vol.8, No.3 PhD Thesis, University of Stuttgart, Institute of Natural Language Processing (IMS)

    bibtex entry

    @PhdThesis{Mueller:2002,
    author = {Karin M{\"u}ller},
    title = {{Probabilistic Syllable Modeling Using Unsupervised and Supervised Learning Methods}},
    school = {University of Stuttgart}
    year = {2002},
    type = {{PhD thesis}},
    address = {Institute of Natural Language Processing (IMS), Stuttgart}
    }
    titel page, table of content (.ps), (.pdf)
    German introduction(.ps), (.pdf)
    Chapter 1 (.ps), (.pdf) - introduction
    Chapter 2 (.ps), (.pdf) - probablistic context-free grammars for syllabification and grapheme-to-phoneme conversion
    Chapter 3 (.ps), (.pdf) - inducing probabilistic syllable classes using multivariate clustering
    Chapter 4 (.ps), (.pdf) - automatic detection of syllable boundaries combining treebank training and bracketed corpora training
    Chapter 5 (.ps), (.pdf) - probabilistic context-free grammars for phonology
    Conclusion (.ps), (.pdf)
    References (.ps), (.pdf)
    Appendix A (.ps), (.pdf) - German 5-dimensional syllable clustering model (50 classes)
    Appendix B (.ps), (.pdf) - English 5-dimensional syllable clustering model (50 classes)
    Appendix C (.ps), (.pdf) - Positional syllable structure grammar
    (trained model = PCFG, inc. frequency and probability of each rule)
Here is some plain frequency data:
  • information about syllables from a corpus which consists of automatically transcribed words
    from the BNC (using CELEX). The data has been checked manually for obvious errors.
    The description about the phoneme set can be found here (English Linguistic User Guide, page 24).
    • list of onsets and their frequency of occurrence
    • list of nuclei and their frequency of occurrence
    • list of codas and their frequency of occurrence