Getting the Most Out of Social Annotations for Web Page Classification
Our paper “Getting the Most Out of Social Annotations for Web Page Classification” has been accepted for publication and presentation at DocEng 2009, the 9th ACM Symposium on Document Engineering to be held in Munich, Germany, from September 15 to 18, 2009.
User-generated annotations on social bookmarking sites can provide interesting and promising metadata for web document management tasks like web page classification. These user-generated annotations include diverse types of information, such as tags and comments. Nonetheless, each kind of annotation has a different nature and popularity level. In this work, we analyze and evaluate the usefulness of each of these social annotations to classify web pages over a taxonomy like that proposed by the Open Directory Project. We compare them separately to the content-based classification, and also combine the different types of data to augment performance. Our experiments show encouraging results with the use of social annotations for this purpose, and we found that combining these metadata with web page content improves even more the classifier’s performance.