Semantic Transforms Using Collaborative Knowledge Bases
5 Pages Posted: 30 Sep 2012 Last revised: 26 Nov 2013
Date Written: November 7, 2013
Abstract
Topic models are used to classify documents, and they do so by designating sets of keywords that describe ideas, leaving interpretation of these keywords to humans. In this paper, we create variants of topic models in which documents are classified by the Wikipedia page that they match best; in this way we generate human-understandable topic names – Wikipedia page titles. We tested our method on a dataset – ACM abstracts – that had been manually classified into topics by the papers' authors. Our results often matched the authors' classifications. Moreover, the topics identified are clearer than the LDA topic modeling results. Our technique may have application to many other types of texts, including social media.
Suggested Citation: Suggested Citation