Language data without trees is as arid as a desert. Linguistic grammars without data are decorative plants. Data is the soil on which we should grow our trees. But if we can't see the wood for the trees, might it not be better to stay in the desert?

Postal address
P.O. Box 94242
1090 GE
The Netherlands

Visiting address
Room F2.06
Science Park 107
Building F
+31 20 525 6573

+31 20 525 5206
Khalil Sima'an
خليل سمعان
Computational Linguistics
Vici Laureate 2013

Khalil in Turkey
"I see (bi-)trees everywhere"

'Brief CV 'Publications 'Activities
'Research 'Teaching 'Misc.

Professor of Computational Linguistics, Vici Laureate 2013  Statistical language processing and learning lab.  Institute for Logic, Language and Computation

I am a Computational Linguist. My work is on statistical learning for natural language processing including Machine Translation, Syntactic Parsing, Morphology and Semantics. I lead the statistical language processing and learning lab.
I have a PhD degree in Computational Linguistics from Utrecht University and was a postdoctoral fellow of the Royal Netherlands Academy for Arts and Sciences (KNAW) before joining the ILLC-UvA as Assistant Professor in 2003. I have been visiting researcher at Technion (2000), University of Maryland (2002), Johns Hopkins University (summer workshops 2005) and CNGL at Dublin City University (frequently 2003-2011). I am Vidi 2006 laureate and Vici 2013 laureate, both from Netherlands Organization for Scientific Research (NWO).

I serve as Editorial Board member of Machine Translation  and  Journal of Natural Language Engineering, as PC Chair for MT Summit XIV (2013) and International Conference on Parsing Technologies (2013), and   Advisory Board Member for John Benjamins Publishers' book series on Natural Language Processing. I also served as Area Chair Syntax and Parsing for ACL 2010.


  • Paper with Miloš Stanojević about learning and analysis aspects of the BEER metric at EMNLP 2014.
  • Paper with Hoang Cuong on latent domain phrase-based models at EMNLP 2014.
  • Paper with Hoang Cuong at COLING 2014 on latent domain data selection for SMT
  • BEER@ILLC-UvA metric shows excellent performance at WMT 2014 (with Miloš Stanojević).

 Former  PhD students

Research Topics

  • Statistical Machine Translation
  • Statistical Parsing and Probabilistic Grammars
  • Statistical Learning for structured models
  • Computational and Cognitive Models of Language Processing
  • Natural Language Engineering and Technology
More than formal linguistics and logic: The case for Prediction and Learning

Software and Data Packages I contributed to creating