Next: Lecture schedule 2008/9
Up: Language and Speech Processing
Previous: Description of course
Reading material
- Primary books (first book is the main source, second is a reference to basic notions):
- Chris Manning and Hinrich Schü"Foundations of Statistical Natural Language Processing",
MIT Press. Cambridge, MA: May 1999. (see http://nlp.stanford.edu/fsnlp/)
- Daniel Jurafsky and James H. Martin. `"SPEECH and LANGUAGE PROCESSING":
An Introduction to Natural Language Processing, Computational Linguistics, and Speech
Recognition. Prentice-Hall, 2000.
- Eugene Charniak. "Statistical Language Learning", Cambridge, Mass, MIT Press, 1993.
- Other usefull references:
- Tom Mitchell. "Machine Learning", McGraw-Hill Series in Computer Science, 1997.
- Simple Good-Turing and Good-Turing Smoothing Without Tears (William Gale paper)
http://www.grsampson.net/AGtf.html
- Joshua Goodman and Stanley Chen. "An empirical study of smoothing techniques for language
modeling". Technical report TR-10-98, Harvard University, August 1998.
See http://research.microsoft.com/~joshuago/. A correction of a small error in the
statement of Katz formula in this report can be found here.
- Why Probabilistic Models in NLP and Computational Linguistics?
- Fernando Pereira. Formal grammar and information theory: Together again?.
Philosophical Transactions of the Royal Society, 358(1769):1239-1253, April 2000.
http://www.cis.upenn.edu/
- K. Sima'an. Empirical validity and technological viability: Probabilistic models of Natural
Language Processing. In R. Bernardi and M. Moortgat (eds.), Linguistic Corpora and Logic
Based Grammar Formalisms, CoLogNET Area 6, 2003.
http://staff.science.uva.nl/~simaan/D-Papers/colognet02.ps
- Data Oriented Parsing:
- Remko Scha: "Virtuele Grammatica's en Creatieve Algoritmes." Gramma/TTT 1, 1 (1992), pp. 57-77.
[Translated into English as: "Virtual Grammars and Creative Algorithms."]
http://cf.hum.uva.nl/computerlinguistiek/scha/IAAA/rs/inaugureE.html.
- Remko Scha: "Taaltheorie en taaltechnologie; competence en performance."
In: R. de Kort and G.L.J. Leerdam (eds.): Computertoepassingen in de Neerlandistiek.
Almere: LVVN, 1990, pp. 7-22. [Translated into English as:
"Language Theory and Language Technology; Competence and Performance."]
http://cf.hum.uva.nl/computerlinguistiek/scha/IAAA/rs/Leerdam.html
- R. Bod, R. Scha and K. Sima'an (editors), Data Oriented Parsing, CSLI Publications, 2003 (book).
- Other readings on Statistical Parsing:
- Statistical Machine Translation:
A statistical approach to machine translation.
Computational Linguistics, Volume 16 , Issue 2 (June 1990)
Pages: 79 - 85 Year of Publication: 1990.
Authors Peter F. Brown John Cocke Stephen A. Della Pietra Vincent J. Della Pietra Fredrick John D. Lafferty Jelinek Robert L. Mercer Paul S. Roossin
http://portal.acm.org/ft_gateway.cfm?id=92860&type=pdf&coll=GUIDE&dl=GUIDE&CFID=53991625&CFTOKEN=77629815
- Furthermore: The NLTK-Lite toolkit provides an excellent suite for expolorations:
http://nltk.sourceforge.net/lite/doc/en/
Next: Lecture schedule 2008/9
Up: Language and Speech Processing
Previous: Description of course
Khalil Sima'an
2008-10-02