Friday, May 8, 2009

Chapter 2 (Part 3), Sennelart & Blondel - Automatic Discovery of Similar Words

In Section 2.3, we get to the meat of Sennelart & Blondel's work, which is a graph-based method for determining similar words, using a dictionary as source. Their method uses a vXv matrix, where each v is a word in the dictionary. They compare their method and results with that of Kleinberg, who proposes a method for determining good Web hubs and authorities, and with the ArcRank and WordNet methods. They test the four methods on four words: disappear, parallelogram, sugar, and science. By and large, their method performs 2nd-best, after WordNet. They propose improving their results by taking a larger subgraph.

The most interesting result is not so much the specific method, but that their approach makes it possible to see a dictionary, and the resulting vector space model, as a (possibly weighted) directed graph. This is a significant result, as graph theory has many powerful theoretical aspects which can be brought to bear on this problem class.

Overall, a good read, and some food for thought.

The Kleinberg reference is:
Kleinberg, J.M. Authoritative sources in a hyperlinked environment. Journal of the ACM, 46 (5): 604-632 (1999).

No comments: