projects

Commercial Profiles of Bloggers
I'm currently working on extracting commercially-oriented profiles of bloggers from their blog, for matching those with products and services. This is in relatively early stages but you can see some results on matching profiles for book recommendations and for contextual advertising in blogs.

Current moods in the (livejournal) blogosphere
What moods are most common in recent Livejournal posts? What are the least-common? What are the changes in the last days, and why do they happen? Can these be predicted and explained by analyzing the language used by the bloggers? Check Moodviews for answers.

Blog Retrieval at TREC
I'm participating in the coordination of a new task at TREC, focusing on searching for information in blogs. More details in the organizer's wiki. Here are some blog retrieval myths, at least for opinion retrieval in blogs.

Lucene and Language Modeling
For the research done in our group, I have extended Lucene to support language modeling. A version of the result (based on Lucene 1.4.2) can be obtained here, under the same license as the one Lucene is distributed with. Please contact me with any suggestions/comments about it.

Comment Spam in Blogs
This is a small corpus of comment spam in blogs I gathered. It was used for experiments with a language-modeling approach to classifying blog spam.

Question Answering
Before looking into blogs, I was involved in various Question Answering projects; many of them revolved around Quartz - the University of Amsterdam's question answering system. An limited online version of Quartz, containing only a small subset of its actual functionality, is available here; for some descriptions of how it works, see the publications page.

Source Code Retrieval
The datasets used for experiments with conceptual retrieval of source code.

The C++ Standard Template Library
Anyone who has debugged STL C++ structures using GDB shares the same nightmarish memories: long, cryptic lines; casting again and again to reach the actual object, etc. I've written some GDB scripts to ease this pain - have a look.
Also, here are some insights into STL I've had in the past.

Reinforcement Learning
PathLearner is a demo of the "robot-on-a-grid" problem, mostly for educational purposes.

ESSLLI Archive
I have set up and maintain the European Summer School in Logic, Language and Information Archive, a FoLLI project.

Weka and LocBoost
Years ago, I extended the Weka machine learning toolkit to include a new boosting algorithm, LocBoost, and to enable graphical comparison between learning methods. The results of this project, under the supervision of Ran El-yaniv, can be viewed here.

A Comparison of Female and Male CS Articles.

... finally, some SEO: what Oren has to say about Israir.