Show simple item record

dc.contributor.authorBowllan, J
dc.contributor.authorCozart, K
dc.contributor.authorSeyednezhad, SMM
dc.contributor.authorSmith, A
dc.contributor.authorMenezes, R
dc.date.accessioned2020-03-25T13:29:30Z
dc.date.issued2019-11-26
dc.description.abstractFrom online reviews and product descriptions to tweets and chats, many modern applications revolve around understanding both semantic structure and topics of short texts. Due to significant reliance on word co-occurrence, traditional topic modeling algorithms such as LDA perform poorly on sparse short texts. In this paper, we propose an unsupervised short text tagging algorithm that generates latent topics, or clusters of semantically similar words, from a corpus of short texts, and labels these short texts by stable predominant topics. The algorithm defines a weighted undirected network, namely the one mode projection of the bipartite network between words and users. Nodes represent all unique words from the corpus of short texts, edges mutual presence of pairs of words in a short text, and weights the number of short texts in which pairs of words appear. We generate the latent topics using nested stochastic block models (NSBM), dividing the network of words into communities of similar words. The algorithm is versatile—it automatically detects the appropriate number of topics. Many applications stem from the proposed algorithm, such as using the short text topic representations as the basis of a short text similarity metric. We validate the results using inter-semantic similarity and normalized mutual information, which show the method is competitive with industry short text topic modeling algorithms.en_GB
dc.description.sponsorshipNSFen_GB
dc.identifier.citationVol. 881, pp. 822 - 833en_GB
dc.identifier.doi10.1007/978-3-030-36687-2_68
dc.identifier.grantnumber1560345en_GB
dc.identifier.urihttp://hdl.handle.net/10871/120397
dc.language.isoenen_GB
dc.publisherSpringer Verlagen_GB
dc.rights.embargoreasonUnder embargo until 26 November 2020 in compliance with publisher policyen_GB
dc.rights© Springer Nature Switzerland AG 2020en_GB
dc.subjectNetwork scienceen_GB
dc.subjectnested stochastic block modelen_GB
dc.subjecttopic modelingen_GB
dc.subjectmachine learningen_GB
dc.subjectshort text taggingen_GB
dc.titleShort Text Tagging Using Nested Stochastic Block Model: A Yelp Case Studyen_GB
dc.typeArticleen_GB
dc.date.available2020-03-25T13:29:30Z
dc.identifier.isbn9783030366865
dc.identifier.issn1860-949X
dc.descriptionThis is the author accepted manuscript. The final version is available from Springer Verlag via the DOI in this recorden_GB
dc.descriptionInternational Conference on Complex Networks and Their Applications - COMPLEX NETWORKS 2019: Complex Networks and Their Applications VIIIen_GB
dc.identifier.journalStudies in Computational Intelligenceen_GB
dc.rights.urihttp://www.rioxx.net/licenses/all-rights-reserveden_GB
rioxxterms.versionAMen_GB
rioxxterms.licenseref.startdate2019-11-26
rioxxterms.typeJournal Article/Reviewen_GB
refterms.dateFCD2020-03-25T13:26:56Z
refterms.versionFCDAM
refterms.panelBen_GB


Files in this item

This item appears in the following Collection(s)

Show simple item record