Title: Using Twitter data for Population Estimates
Authors: Jo Munson, Dilek Yildiz, Agnese Vitali, Ramine Tinati and Jennifer A. Holland
Abstract: Social media data are a promising source of social science data that stand to offer us insights into attitudes, behaviour, discourse, and the social linkages and interactions between individuals (Savage and Burrows 2007). However, a challenge for social scientists is to evaluate the selection bias in non-representative samples (Zagheni and Weber 2015) and hence understand whether it is possible to investigate these data from a population perspective, mapping findings onto populations. This paper asks:
To what extent are findings obtained with Twitter data generalizable to broader populations?
A key feature of our project is to estimate the demographic characteristics of the Twitter users, which are not made available. For this reason, we exploit demographic enhancement data in order to identify users’ sex, age, and location which we predicted on the basis of Twitter users’ self-reported names and type of language used when they Tweet using traditional machine learning and feature selection techniques. With these characteristics we can estimate the sex-specific age structure of the Twitter population, matching these data to Census estimates for the geographical regions of England and Wales.
IUSSP Scientific Panel on Big Data and Population Processes workshop on Social Media and Demographic Research: Applications and Implications, Cologne, DE.
Support for the research was provided by the University of Southampton Faculty of Human and Social Sciences Strategic Interdisciplinary Research Development Fund (SIRDF).