Classification Approaches to Identify Informative Tweets (bibtex)
by Piush Aggarwal
Abstract:
Social media platforms have become prime forums for reporting news, with users sharing what they saw, heard or read on social media. News from social media is potentially useful for various stakeholders including aid organizations, news agencies, and individuals. However, social media also contains a vast amount of non-news content. For users to be able to draw on benefits from news reported on social media it is necessary to reliably identify news content and differentiate it from non-news. In this paper, we tackle the challenge of classifying a social post as news or not. To this end, we provide a new manually annotated dataset containing 2,992 tweets from 5 different topical categories. Unlike earlier datasets, it includes postings posted by personal users who do not promote a business or a product and are not affiliated with any organization. We also investigate various baseline systems and evaluate their performance on the newly generated dataset. Our results show that the best classifiers are the SVM and BERT models.
Reference:
Classification Approaches to Identify Informative Tweets Piush Aggarwal, In Proceedings of the Student Research Workshop Associated with RANLP 2019, INCOMA Ltd., 2019.
Bibtex Entry:
@inproceedings{aggarwal-2019-classification,
    title = "Classification Approaches to Identify Informative Tweets",
    author = "Aggarwal, Piush",
    booktitle = "Proceedings of the Student Research Workshop Associated with RANLP 2019",
    month = sep,
    year = "2019",
    address = "Varna, Bulgaria",
    publisher = "INCOMA Ltd.",
    pages = "7--15",
url = "https://acl-bg.org/proceedings/2019/RANLPStud%202019/pdf/RANLPStud002.pdf",
abstract = "Social media platforms have become prime forums for reporting news, with users sharing what they saw, heard or read on social media. News from social media is potentially useful for various stakeholders including aid organizations, news agencies, and individuals. However, social media also contains a vast amount of non-news content. For users to be able to draw on benefits from news reported on social media it is necessary to reliably identify news content and differentiate it from non-news. In this paper, we tackle the challenge of classifying a social post as news or not. To this end, we provide a new manually annotated dataset containing 2,992 tweets from 5 different topical categories. Unlike earlier datasets, it includes postings posted by personal users who do not promote a business or a product and are not affiliated with any organization. We also investigate various baseline systems and evaluate their performance on the newly generated dataset. Our results show that the best classifiers are the SVM and BERT models.",
}