This page links to the Twitter data used in the paper “Determinants of Meme Popularity” by James P. Gleeson, Kevin P. O’Sullivan, Raquel A. Baños and Yamir Moreno; please cite this paper if you use the data. All data processing was performed by Raquel A. Baños at Instituto de Biocomputación y Física de Sistemas Complejos (BIFI), Universidad de Zaragoza.
The zip file twitter15M_data.zip (16MB) contains the following files:
Text file, with one line per tweet. Tweets were gathered from March 2011 to March 2012. Each line has the following structure: the first column indicates the number of fields minus 1 (NF-1) in the corresponding line, the second column corresponds to the date when the tweet was posted (time units are days), and columns from 3 to NF are the hashtags used.
Folder containing multiple text files. Text file ccdf_axx.txt gives the CCDF for hashtags of age xx days, as used in Figure 3 of the paper.
Text file from the sampling of 8.2E5 random Twitter users (sampled in October 2013). The file structure is as follows:
first column: number of followers (out-degree)
second column: number of friends (following, or in-degree)
last column: frequency