Probabilistic topic models for sentiment analysis on the Web

Chenghua, Lin

dc.contributor.author	Chenghua, Lin	en_GB
dc.date.accessioned	2011-12-12T08:47:58Z	en_GB
dc.date.accessioned	2013-03-21T11:56:54Z
dc.date.issued	2011-09-26	en_GB
dc.description.abstract	Sentiment analysis aims to use automated tools to detect subjective information such as opinions, attitudes, and feelings expressed in text, and has received a rapid growth of interest in natural language processing in recent years. Probabilistic topic models, on the other hand, are capable of discovering hidden thematic structure in large archives of documents, and have been an active research area in the field of information retrieval. The work in this thesis focuses on developing topic models for automatic sentiment analysis of web data, by combining the ideas from both research domains. One noticeable issue of most previous work in sentiment analysis is that the trained classifier is domain dependent, and the labelled corpora required for training could be difficult to acquire in real world applications. Another issue is that the dependencies between sentiment/subjectivity and topics are not taken into consideration. The main contribution of this thesis is therefore the introduction of three probabilistic topic models, which address the above concerns by modelling sentiment/subjectivity and topic simultaneously. The first model is called the joint sentiment-topic (JST) model based on latent Dirichlet allocation (LDA), which detects sentiment and topic simultaneously from text. Unlike supervised approaches to sentiment classification which often fail to produce satisfactory performance when applied to new domains, the weakly-supervised nature of JST makes it highly portable to other domains, where the only supervision information required is a domain-independent sentiment lexicon. Apart from document-level sentiment classification results, JST can also extract sentiment-bearing topics automatically, which is a distinct feature compared to the existing sentiment analysis approaches. The second model is a dynamic version of JST called the dynamic joint sentiment-topic (dJST) model. dJST respects the ordering of documents, and allows the analysis of topic and sentiment evolution of document archives that are collected over a long time span. By accounting for the historical dependencies of documents from the past epochs in the generative process, dJST gives a richer posterior topical structure than JST, and can better respond to the permutations of topic prominence. We also derive online inference procedures based on a stochastic EM algorithm for efficiently updating the model parameters. The third model is called the subjectivity detection LDA (subjLDA) model for sentence-level subjectivity detection. Two sets of latent variables were introduced in subjLDA. One is the subjectivity label for each sentence; another is the sentiment label for each word token. By viewing the subjectivity detection problem as weakly-supervised generative model learning, subjLDA significantly outperforms the baseline and is comparable to the supervised approach which relies on much larger amounts of data for training. These models have been evaluated on real world datasets, demonstrating that joint sentiment topic modelling is indeed an important and useful research area with much to offer in the way of good results.	en_GB
dc.identifier.citation	Lin, C., He, Y., Everson, R. and R¨uger, S. Weakly-supervised Joint Sentiment-Topic Detection from Text, IEEE Transactions on Knowledge and Data Engineering (TKDE), to appear.	en_GB
dc.identifier.citation	Lin, C., He, Y., and Everson, R. A Comparative Study of Bayesian Models for Unsupervised Sentiment Detection, In Proceedings of the 14th Con- ference on Computational Natural Language Learning (CoNLL), Uppsala, Sweden, 2010.	en_GB
dc.identifier.citation	Lin, C. and He, Y. Joint Sentiment/Topic Model for Sentiment Analysis, In Proceedings of the 18th ACM Conference on Information and Knowl- edge Management (CIKM), Hong Kong, China, 2009.	en_GB
dc.identifier.citation	Lin, C., He, Y. and Everson, R. Sentence Subjectivity Detection with Weakly-Supervised Learning, In Proceedings of the 5th International Joint Conference on Natural Language Processing (IJCNLP), ChiangMai, Thailand, 2011.	en_GB
dc.identifier.uri	http://hdl.handle.net/10036/3307	en_GB
dc.language.iso	en	en_GB
dc.publisher	University of Exeter	en_GB
dc.subject	sentiment analysis	en_GB
dc.subject	opinion mining	en_GB
dc.subject	subjectivity detection	en_GB
dc.subject	joint sentiment-topic model	en_GB
dc.subject	latent Dirichlet allocation	en_GB
dc.subject	topic model	en_GB
dc.title	Probabilistic topic models for sentiment analysis on the Web	en_GB
dc.type	Thesis or dissertation	en_GB
dc.date.available	2011-12-12T08:47:58Z	en_GB
dc.date.available	2013-03-21T11:56:54Z
dc.contributor.advisor	Richard, Everson	en_GB
dc.contributor.advisor	Yulan, He	en_GB
dc.publisher.department	Computer Science	en_GB
dc.type.degreetitle	PhD in Computer Science	en_GB
dc.type.qualificationlevel	Doctoral	en_GB
dc.type.qualificationname	PhD	en_GB

Files in this item

Name:: LinC_fm.pdf
Size:: 95.83Kb
Format:: PDF
Description:: Front matter pages

View/Open

Name:: LinC.pdf
Size:: 4.416Mb
Format:: PDF
Description:: Full thesis

View/Open

This item appears in the following Collection(s)

Doctoral Theses

Show simple item record

Show Statistical Information