ILLC Universiteit
                        van Amsterdam
'Home 'Contact 'Brief CV 'Publications 'Activities
'Research 'Teaching 'Misc.

Khalil Sima'an



Highlighted publications

Markos Mylonakis and Khalil Sima'an. Learning Hierarchical Translation Structure with Linguistic Annotations. In the Proceedings of The 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL:HLT 2011). Feb 2011. Syntactic statistical Machine Translation; Estimation; Learning reordering
pdf file
Hany Hassan, Khalil Sima'an and Andy Way. A Morpho-Syntactically Enriched Direct Translation Model with Efficient Decoding. Machine Translation Journal, Springer, 2011. Syntactic statistical Machine Translation; Incremental decoding and parsing
pdf file
Markos Mylonakis and Khalil Sima'an. Learning Probabilistic Synchronous CFGs for Phrase Translation Models. In Proceedings of the Fourteenth Conference on Computational Natural Language Learning (CoNLL 2010), Uppsala, Sweden, July 2010
Statistical Machine Translation; Estimation; Learning reordering pdf
Reut Tsarfaty, Khalil Sima'an and Remko Scha. Evaluating an Alternative to Head-Driven Approaches to Parsing a (Relatively) Free Word-Order Language. In Proceedings of the Conference on Empirical Methods in NLP (EMNLP'09), Singapore. Statistical Parsing; Morphology-syntax interface
pdf
Hany Hassan, Khalil Sima'an and Andy Way. A Syntactified Direct Translation Model with Linear-Time Decoding. In Proceedings of the Conference on Empirical Methods in NLP (EMNLP'09), Singapore. Syntactic Machine Translation; Incremental decoding and parsing pdf
Hany Hassan, Khalil Sima'an and Andy Way. Syntactically Lexicalized Phrase-Based Statistical Translation. In IEEE Transactions on Audio, Speech and Language Processing, Volume 16, Number 7. September 2008.
Syntactic Machine Translation; Incremental decoding and parsing pdf
Barbara Plank and Khalil Sima'an. Parsing with Subdomain Instance Weighting from Raw Corpora. In proceedings Interspeech 2008, Australia, Sep. 2008. Subdomain; Domain Adaptation; Inference from unannotated data.
pdf
Markos Mylonakis and Khalil Sima'an. Phrase Translation Probabilities with ITG Priors and Smoothing as Learning Objective. In Proceedings of the Conference on Empirical Methods in NLP (EMNLP'08), 2008.
Statistical Machine Translation; Estimation; Learning pdf
Reut Tsarfaty and Khalil Sima'an. Relational Realizational Parsing. In proceedings COLING 2008, Manchester, UK, August 2008.
Statistical Parsing; Morphology-syntax interface pdf
Roy Bar-Haim, Khalil Sima'an and Yoad Winter.    Part-of-Speech Tagging of Modern Hebrew Text.  In  Journal of Natural Language Engineering (J-NLE), 14(2):223-251, 2008.   HMM-based Morphological disambiguation for Hebrew and Arabic
pdf
Markos Mylonakis,  Khalil Sima'an and Rebecca Hwa.    Unsupervised Estimation for Noisy-Channel Models . In Proceedings 24th Annual International Conference on Machine Learning (ICML 2007).
Learning lexicon probabilities from non-parallel data
pdf
Hany  Hassan, Khalil Sima'an and Andy Way.     Supertagged Phrase-Based Statistical Machine Translation. In Proceedings 45th Annual Meeting of the Assoc. for Comp. Linguistics, Prague, 2007 (ACL 2007).
Syntactic Machine Translation; Incremental decoding and parsing pdf
Andreas  Zollmann and Khalil Sima'an.  A Consistent and Efficient Estimator for Data-Oriented Parsing. Journal of Automata, Languages and Combinatorics (JALC)  Vol. 10 (2005) Number 2/3, pages 367-388.
Presents a consistent estimator for DOP with a proof of consistency.
Consistent estimators for DOP
pdf
Khalil Sima'an.  Robust Data-Oriented Understanding of Spoken Utterances. In  H. Bunt, J. Carroll and G. Satta (eds.), New Developments in Parsing Technologies,  pages 323-338, Kluwer  (2004).  Speech understanding; Update Semantics; Statistical Parsing
pdf
Khalil Sima'an and Luciano Buratto.   Backoff Parameter Estimation for the DOP Model. In Proceedings of the European Conference on Machine Learning (ECML'03), N. Lavrac, D. Gamberger, H. Blockeel and L. Todorovski (ed.).  Lecture Notes in Artificial Intelligence (LNAI 2837), pages 373-384, Springer, 2003.
Consistent estimators for DOP pdf
Khalil Sima'an. On Maximizing Metrics for Syntactic Disambiguation. In Proceedings of the International Workshop on Parsing Technologies (IWPT'03). Nancy, France, April 2003.
Presents among others a Minimum-Bayes Risk decoding algorithm referred to by others as the MAX-RULE-SUM
(see, e.g., Bansal and Klein 2010, ...,  Petrov et al 2007 , Matsuzaki et al 2005), or the Maximum Expected CFG Rule count algorithm  (see, e.g., Cohn et al 2008).
Minimum-Bayes Risk decoding for statistical parsing models
pdf
Rend Bod, Remko Scha and Khalil Sima'an (editors).  Data-Oriented Parsing. Studies in Computational Linguistics, CSLI  Publications, University of Chicago Press, 2003. Data-Oriented Parsing

Khalil Sima'an. Computational Complexity of Probabilistic Disambiguation:  NP-Completeness Results  for Language and Speech Processing. In Grammars, Volume 5(2),    Kluwer Publishers, 2002. Computational Complexity of statistical disambiguation
pdf
Khalil Sima'an, A. Itai, Y. Winter, A. Altman and N. Nativ. Building a Tree-Bank of Modern Hebrew Text.  In Beatrice Daille and Laurent Romary (eds.), Journal Traitement Automatique des  Langues (T.A.L) , 2001. Special Issue on Natural Language Processing and Corpus Linguistics. Hebrew treebanking; Statistical Parsing
pdf
Khalil Sima'an. Tree-gram Parsing: Lexical Dependencies and Structural Relations  Proceedings of 38th  Annual Meeting  of the Association for Computational Linguistics (ACL'00) , Hong Kong, China, 2000.
Content: Presents a novel model for parsing that combines the strengths of DOP with those of bilexical-dependency models (Charniak 1999; Collins 1997), including head-binarized ``subtrees" (Tree-grams) with label splitting by parent-encoding (Johnson 1999) and head pre-terminals. The implementation employs a coarse-to-fine parser (PCFG then Tree-gram). See
Mohit Bansal and Dan Klein 2010 for an exploration extending this model with compact Goodman representations, optimized parameter settings and state-of-the-art parsing results for different languages.
Treegrams; Statistical Parsing; Horizonal-Markov DOP; Lexicalized DOP
pdf
Khalil Sima'an. Efficient Disambiguation by means of Stochastic Tree Substitution Grammars. In New Methods in Language Processing . D. Jones and H. Somers (editors), UCL Press, UK, 1997. Statistical Parsing
pdf
Remko Scha, Rens Bod and Khalil Sima'an. A Memory-Based  Model of Syntactic Analysis: Data-Oriented Parsing. In  special Issue on Memory-Based Processing, W. Daelemans (ed.), Journal of Empirical and  Theoretical Artificial Intelligence (JETAI), 11 (3), 1999. Data-Oriented Parsing; Memory-based models
pdf
Khalil Sima'an. Explanation-Based Learning of Data-Oriented Parsing. In Proceedings of the Conference on Computational Natural Language Learning (CoNLL), jointly with ACL/EACL-97, Madrid, Spain, July 1997. The first publication aiming at learning the set of fragments for a DOP model from a treebank. Statistical learning of compact DOP models
pdf
Khalil Sima'an. Computational Complexity of Probabilistic Disambiguation by means of Tree Grammars.In Proceedings of  the International Conference on Computational Linguistics  (COLING '96),  pp.1175-1180 (vol. 2), Copenhagen,  Denmark, August 1996.
Content: Presents a proof of NP-Completeness for a set of related problems including computing the highest probability parse for an input sentence under probabilistic tree-subsitution grammars (PTSGs), computing the highest probability string from an input lattice under a probabilistic context-free grammar (PCFG) and under PTSG. The latter two problems are the problems of ``decoding" in speech recognition and machine translation when the (target) language model is a PCFG or PTSG.
Computational complexity of statistical disambiguation
pdf