Short Text Tagging Using Nested Stochastic Block Model: A Yelp Case Study

Bowllan, J; Cozart, K; Seyednezhad, SMM; Smith, A; Menezes, R

dc.contributor.author	Bowllan, J
dc.contributor.author	Cozart, K
dc.contributor.author	Seyednezhad, SMM
dc.contributor.author	Smith, A
dc.contributor.author	Menezes, R
dc.date.accessioned	2020-03-25T13:29:30Z
dc.date.issued	2019-11-26
dc.description.abstract	From online reviews and product descriptions to tweets and chats, many modern applications revolve around understanding both semantic structure and topics of short texts. Due to significant reliance on word co-occurrence, traditional topic modeling algorithms such as LDA perform poorly on sparse short texts. In this paper, we propose an unsupervised short text tagging algorithm that generates latent topics, or clusters of semantically similar words, from a corpus of short texts, and labels these short texts by stable predominant topics. The algorithm defines a weighted undirected network, namely the one mode projection of the bipartite network between words and users. Nodes represent all unique words from the corpus of short texts, edges mutual presence of pairs of words in a short text, and weights the number of short texts in which pairs of words appear. We generate the latent topics using nested stochastic block models (NSBM), dividing the network of words into communities of similar words. The algorithm is versatile—it automatically detects the appropriate number of topics. Many applications stem from the proposed algorithm, such as using the short text topic representations as the basis of a short text similarity metric. We validate the results using inter-semantic similarity and normalized mutual information, which show the method is competitive with industry short text topic modeling algorithms.	en_GB
dc.description.sponsorship	NSF	en_GB
dc.identifier.citation	Vol. 881, pp. 822 - 833	en_GB
dc.identifier.doi	10.1007/978-3-030-36687-2_68
dc.identifier.grantnumber	1560345	en_GB
dc.identifier.uri	http://hdl.handle.net/10871/120397
dc.language.iso	en	en_GB
dc.publisher	Springer Verlag	en_GB
dc.rights.embargoreason	Under embargo until 26 November 2020 in compliance with publisher policy	en_GB
dc.rights	© Springer Nature Switzerland AG 2020	en_GB
dc.subject	Network science	en_GB
dc.subject	nested stochastic block model	en_GB
dc.subject	topic modeling	en_GB
dc.subject	machine learning	en_GB
dc.subject	short text tagging	en_GB
dc.title	Short Text Tagging Using Nested Stochastic Block Model: A Yelp Case Study	en_GB
dc.type	Article	en_GB
dc.date.available	2020-03-25T13:29:30Z
dc.identifier.isbn	9783030366865
dc.identifier.issn	1860-949X
dc.description	This is the author accepted manuscript. The final version is available from Springer Verlag via the DOI in this record	en_GB
dc.description	International Conference on Complex Networks and Their Applications - COMPLEX NETWORKS 2019: Complex Networks and Their Applications VIII	en_GB
dc.identifier.journal	Studies in Computational Intelligence	en_GB
dc.rights.uri	http://www.rioxx.net/licenses/all-rights-reserved	en_GB
rioxxterms.version	AM	en_GB
rioxxterms.licenseref.startdate	2019-11-26
rioxxterms.type	Journal Article/Review	en_GB
refterms.dateFCD	2020-03-25T13:26:56Z
refterms.versionFCD	AM
refterms.dateFOA	2020-11-26T00:00:00Z
refterms.panel	B	en_GB

Files in this item

Name:: complex_Network_2019-4.pdf
Size:: 2.705Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Computer Science

Show simple item record

Show Statistical Information