Affiliations: [a] School of Computer and Information Science, Southwest University, Chongqing, China | [b] Economy and Technology Developing District, Henan, China
Abstract: The content in social media is difficult to analyze because of its informal and unstructured features. Luckily, some social media data like tweets have rich hashtags information, which can be helpful to identify meaningful content and topic information. More importantly, the hashtag usually express the context information of a tweet best. To this end, this paper introduces a context-aware topic model to detect and track the evolution of content by integrating hashtag and time information in text-based social media. Specifically, we develop two methods to cope with different functions of hashtags separately. The first one is named hashtag-generated Topic over Time (hgToT), in which a document is generated jointly by the existing words and hashtags. To enhance the significant effect of hashtags via topic variables, we further develop the second model named hashtag-supervised Topic over Time (hsToT), in which hashtags are treated as useful topic indicators of the tweet. Time information is modeled similarly in both hgToT and hsToT. The proposed two methods are able to capture the hashtags distribution over topics and topic changing over time simultaneously. Experiments on the dataset obtained from Twitter show that both hgToT and hsToT could detect the important information and track the meaningful content and topics successfully.
Keywords: Topic model, content evolution, topic over time, hashtags, social media