Tejaswini Deoskar



I am a post-doc and lecturer at the Institute for Logic, Language & Computation (ILLC), University of Amsterdam, in the Language and Computation (LaCo) Group.
I recently finished my dissertation from Cornell University , Ithaca, NY.


Contact

Email : t DOT deoskar   AT   uva   DOT   nl
Tel : +31 (0)20 525 8251
Fax : +31 (0)20 525 5206

Postal Address : P.O. Box 94242
1090 GE Amsterdam
The Netherlands

Visiting Address: ILLC,
Room C3-123
Science Park 904
1098 XH Amsterdam
The Netherlands

Research Interests

Computational Linguistics: statistical parsing, unsupervised grammar induction, richer computational models of syntax

Linguistics, Typology: syntax of SOV languages, syntax and typology of South Asian languages in general (esp. Hindi, Marathi)

I am interested in statistical models of natural language. In particular, I work on statistical parsing. I am interested in building statistical model of syntax with rich representations, particularly lexical representations, and in estimation of accurate statistics for these from annotated/unannotated data. Currently I am experimenting with using a modified version of the inside-outside algorithm along with a treebank-trained PCFG to learn fine-grained lexical information from large sources of data.
I am also interested in the syntax and typology of languages with Subject-Verb-Object word order, in particular languages spoken in the subcontinent of South-Asia.

CV

Dissertation :
INDUCTION OF FINE-GRAINED LEXICAL PARAMETERS OF TREEBANK PCFGS WITH INSIDE-OUTSIDE ESTIMATION AND LEXICAL TRANSFORMATIONS
[pdf]

Advisor: Mats Rooth web
Other committe members: Lillian Lee web , John Whitman web

Minor: Cognitive Science Cognitive Science at Cornell

Current Projects

Under VIDI project (PI: Khalil Sima'an web )
  • Domain adaptation of statistical parsers (ILLC, UvA)
  • Supertagging for complex lexical categories. (ILLC, UvA)
  • Older Projects

  • Building enhanced and fine-grained Treebank-based unlexicalized PCFGs (Cornell)
  • Semi-supervised learning of fine-grained lexical categories using Inside-Outside (Cornell/ILLC)

    Papers

    Tejaswini Deoskar, Mats Rooth and Khalil Sima'an. 2009. Smoothing PCFG Lexicons. Proceedings of the 11th International Workshop on Parsing Technologies (IWPT), Paris, France. [pdf]

    Deoskar, Tejaswini. 2008. Re-estimation of Lexical Parameters for Treebank PCFGs. Proceedings of COLING 2008, Manchester, UK. [pdf]

    Deoskar Tejaswini and Rooth, Mats. 2008. Induction of Treebank-Aligned Lexical Resources. Proceedings of Sixth International Conference on Language Resources and Evaluation. Marrakech. Morocco. [pdf]

    Deoskar, Tejaswini and Rooth, Mats. 2007. Corpus Induction of Lexicons for Treebank PCFGs by Inside-Outside Estimation and Frequency Transformations. Ms. [pdf]

    Deoskar, Tejaswini. 2006. Marathi Light Verbs. Proceedings of the 36th Annual Meeting of the Chicago Linguistics Society. [pdf]

    An empirical study on the phonological adaptation of speakers of Indian English when exposed to American English. [pdf]

    A paper on Serial Verbs in Khoekhoe [pdf]




    Teaching

    2009-2010

    Semester I (Fall 2009, Sept-Dec):
    Elements of Language Processing and Learning (Master of AI, Master of Logic) course website

    Semester II (Spring 2010, Feb-May):
    Statistical Structure in Language Processing (Core course, Master of AI, Natural Language Processing track) course website

    Natuurlijke Taalverwerking (Bachelor Informatica) A Bachelor's course on Language modelling. UvA Blackboard website

    2008-2009

    Semester I (Fall 2008, Sept-Dec):
    Language and Speech Processing (co-taught with Khalil Sima'an) course website

    Semester II (Spring 2009, Feb-May):
    Probabilistic Grammars and Data-Oriented Parsing course website

    Taalmodellen - A Bachelor level course on language modelling. UvA Blackboard website

    Cornell
    Spring 2008 and 2006: Teaching Assistant, Introduction to Semantics and Pragmatics

    Fall and Spring 2005: Instructor. I taught a course called Biological Foundations of Language (Freshman Writing Seminar) which introduced freshmen to basic issues in the study of human language from the opposing points of view of linguistics (theoretical) and cognitive science.

    Spring 2004: Teaching Assistant, Introduction to Hindi, Intermediate Hindi: Developed a lot of material for teaching Hindi, a language that does not have many classroom resources for teaching as a second-language

    Fall 2005: Teaching Assistant, Introduction to Hindi


    Presentations

    "Smoothing fine-grained PCFG lexicons" Talk at IWPT 2009, Paris.
    "Identifying fine-grained lexical categories: Supertaggers versus Parsers" August 2009, Talk at Microsoft Research, Bangalore.
    "Impact of lexical probabilities on adapting a PCFG to a new domain" at The 19th Meeting of Computational Linguistics in The Netherlands (CLIN), Groningen.
    "Induction of Treebank-Aligned Lexical Resources. LREC 2008
    "Estimation of lexical probabilities for Treebank PCFGs" May 2008. Talk at Institute for Logic, Language and Computation (ILLC), University of Amsterdam.
    "Re-estimation of Lexical Parameters for Treebank PCFGs" COLING 2008
    "EM-based clustering of Local Syntactic Contexts of words", Sept.2007 Talk at the NLP Seminar, Cornell University.
    "Marathi Light Vers" Talk at Chicago Linguistics Society, CLS 42.
    "Disambiguation of Small Clause versus Ditransitive Verbs in a Treebank PCFG" Talk at Department of Linguistics, Cornell University.

    Non-academic Work Experience

    2001 to 2002: Worked on creating "meta-directories" that integrated information from telecom devices and telecom databases, using LDAP (Lightweight Directory Access Protocol). Also worked as a consultant to design and deploy metadirectory products (Meta-Connect).

    1997 to 2001: Worked as an embedded systems designer to build a CCD Camera controller for the IUCAA telescope at Giravali (Maharashtra, India) using the Analog Devices (ADSP) DSP microprocessor family. Also interfaced the CCD camera controller to a Linux network for programmable camera control and image acquisition (very cool, wrote linux devicedrivers for this).

    1996 to 1997: Network management of Sun/Silicon Graphics/PC unix/windows networks.



    Links

    Cornell NLP Page
    IUCAA Observatory, Giravali, India