Statistical Features-Based Real-Time Detection of Drifted Twitter Spam

Chen, C; Wang, Y; Zhang, J; Xiang, Y; Zhou, W; Min, G

dc.contributor.author	Chen, C
dc.contributor.author	Wang, Y
dc.contributor.author	Zhang, J
dc.contributor.author	Xiang, Y
dc.contributor.author	Zhou, W
dc.contributor.author	Min, G
dc.date.accessioned	2017-02-14T11:14:28Z
dc.date.issued	2016-10-26
dc.description.abstract	Twitter spam has become a critical problem nowadays. Recent works focus on applying machine learning techniques for Twitter spam detection, which make use of the statistical features of tweets. In our labeled tweets data set, however, we observe that the statistical properties of spam tweets vary over time, and thus, the performance of existing machine learning-based classifiers decreases. This issue is referred to as “Twitter Spam Drift”. In order to tackle this problem, we first carry out a deep analysis on the statistical features of one million spam tweets and one million non-spam tweets, and then propose a novel Lfun scheme. The proposed scheme can discover “changed” spam tweets from unlabeled tweets and incorporate them into classifier’s training process. A number of experiments are performed to evaluate the proposed scheme. The results show that our proposed Lfun scheme can significantly improve the spam detection accuracy in real-world scenarios.	en_GB
dc.description.sponsorship	This work was supported by the ARC Linkage Project under Grant LP120200266. The work of J. Zhang was supported by the National Natural Science Foundation of China under Grant 61401371.	en_GB
dc.identifier.citation	Vol. 12, Iss. 4, April 2017, pp. 914 - 925	en_GB
dc.identifier.doi	10.1109/TIFS.2016.2621888
dc.identifier.uri	http://hdl.handle.net/10871/25838
dc.language.iso	en	en_GB
dc.publisher	Institute of Electrical and Electronics Engineers (IEEE)	en_GB
dc.rights	(c) 2016 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works.	en_GB
dc.title	Statistical Features-Based Real-Time Detection of Drifted Twitter Spam	en_GB
dc.type	Article	en_GB
dc.date.available	2017-02-14T11:14:28Z
dc.identifier.issn	1556-6013
dc.description	Accepted	en_GB
dc.description	This is the author accepted manuscript. The final version is available from the publisher via the DOI in this record.	en_GB
dc.identifier.journal	IEEE Transactions on Information Forensics and Security	en_GB

Files in this item

Name:: TIFS2014_r1.pdf
Size:: 353.0Kb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Computer Science

Show simple item record

Show Statistical Information