next up previous
Next: Reading material

Probabilistic Grammars and DOP
Unsupervised Statistical Estimation for Natural Language Processing

Tejaswini Deoskar
Institute for Logic, Language and Computation (ILLC)
University of Amsterdam, Amsterdam, The Netherlands
T dot Deoskar at uva dot nl

General information

Lecturer:
Tejaswini Deoskar and guest lecturers
Place:
Rec-C 2.12
Time:
Monday 11:00-13:00

The contents of these pages will be constantly updated throughout the semester.

For many tasks in natural language processing, we need to estimate a probability distribution over a set of linguistic structures. The straight-forward way to estimate such a distribution is to count the relative frequency of the relevant structures in a representative sample corpus.

However, many linguistic structures (such as syntactic structure) are hidden and relative frequency estimation for such structures must be performed from specially annotated corpora. Since such corpora are difficult and expensive to create, it is important to be able to estimate a distribution over structures from corpora which contain only partial information about these structures. This is estimation from incomplete data, which will be the focus of this course.

After a general introduction to estimation from incomplete data and the EM algorithm, we will examine the application of these techniques to two main problems: parsing and translation. The first part of the course will discuss various methods for unsupervised parsing, including U-DOP, a DOP based approach to this problem. The second part of the course will give an introduction to statistical machine translation and its estimation problems.

The course will consist of lectures (including guests) and student presentations of research papers from a list of papers provided by the lecturer (see below).

Note:
This course extends the course Language and Speech Processing (L&SP - http://staff.science.uva.nl/%7Esimaan/D-LangAndSpeech0809/LSPhomepage/), which is a prerequisite for participation in the course.




next up previous
Next: Reading material