Assessing the Bias in Samples of Large Online Networks

Forthcoming in Social Networks

45 Pages Posted: 4 Dec 2012 Last revised: 10 Dec 2014

See all articles by Sandra González-Bailón

Sandra González-Bailón

University of Pennsylvania - Annenberg School for Communication

Ning Wang

University of Oxford - Oxford Internet Institute; University of Oxford - Mathematical Institute

Alejandro Rivero

University of Zaragoza

Javier Borge-Holthoefer

Universitat Oberta de Catalunya; Qatar Computing Research Institute

Yamir Moreno

University of Zaragoza

Date Written: December 4, 2012

Abstract

We consider the sampling bias introduced in the study of online networks when collecting data through publicly available APIs (application programming interfaces). We assess differences between three samples of Twitter activity; the empirical context is given by political protests taking place in May 2012. We track online communication around these protests for the period of one month, and reconstruct the network of mentions and re-tweets according to the search and the streaming APIs, and to different filtering parameters. We find that smaller samples do not offer an accurate picture of peripheral activity; we also find that the bias is greater for the network of mentions, partly because of the higher influence of snowballing in identifying relevant nodes. We discuss the implications of this bias for the study of diffusion dynamics and political communication through social media, and advocate the need for more uniform sampling procedures to study online communication.

Keywords: social media, Twitter, political communication, social protests, social networking sites, measurement error, graph comparison

Suggested Citation

González-Bailón, Sandra and Wang, Ning and Wang, Ning and Rivero, Alejandro and Borge-Holthoefer, Javier and Moreno, Yamir, Assessing the Bias in Samples of Large Online Networks (December 4, 2012). Forthcoming in Social Networks, Available at SSRN: https://ssrn.com/abstract=2185134 or http://dx.doi.org/10.2139/ssrn.2185134

Sandra González-Bailón (Contact Author)

University of Pennsylvania - Annenberg School for Communication ( email )

Philadelphia, PA
United States

HOME PAGE: http://https://dimenet.asc.upenn.edu/people/sgonzalezbailon/

Ning Wang

University of Oxford - Oxford Internet Institute ( email )

1 St. Giles
University of Oxford
Oxford OX1 3PG Oxfordshire, Oxfordshire OX1 3JS
United Kingdom

University of Oxford - Mathematical Institute ( email )

Mathematical Institute
Radcliffe Observatory Quarter
Oxford, Oxfordshire OX2 6GG
United Kingdom

Alejandro Rivero

University of Zaragoza ( email )

Gran Via 2
Zaragoza, 50005
Spain

Javier Borge-Holthoefer

Universitat Oberta de Catalunya ( email )

Barcelona, Barcelona
Spain

HOME PAGE: http://cosin3.rdi.uoc.edu/

Qatar Computing Research Institute ( email )

Tornado Tower
13th Floor
Doha, 5825
Qatar

Yamir Moreno

University of Zaragoza ( email )

Gran Via 2
Zaragoza, 50005
Spain

Do you have negative results from your research you’d like to share?

Paper statistics

Downloads
1,108
Abstract Views
21,251
Rank
36,105
PlumX Metrics