Comunidades de partidarios en redes sociales: estudio de las elecciones catalanas de 2010 y 2012

Esteban Moro en, Cotarelo, R. & Olmeda, J.A. (Comps.) (forthcoming). La democracia del siglo XXI. Política, medios de comunicación, internet y redes sociales. Actas de las II Jornadas españolas de ciberpolítica, 28 de mayo de 2013. Madrid: Centro de Estudios Políticos y Constitucionales. [pdf]

En los últimos años hemos asistido a un incremento notable de eventos de carácter político y/o social que han sido promovidos (cuando no originados) a través de medios de comunicación electrónicos. En particular, el uso cada vez más frecuente de plataformas de sociabilidad electrónicas como Facebook o Twitter para compartir opinión o información sobre temas de actualidad, política o sociales ha hecho que estás plataformas pasen a ser herramientas imprescindibles dentro de la comunicación política y también los nuevos canales en los que se comparten ideas, se busca información o se organizan campañas dentro del contexto político.

Uno de los ámbitos más estudiados ha sido el de las elecciones, en el que la discusión política está acotada tanto en ámbito como en tiempo y donde, por tanto, se produce un mayor posicionamiento político en las redes sociales de los usuarios, de los partidos, asociaciones, etc. El objetivo de esta contribución es estudiar este fenómeno de formación de grupos partidarios (partisanos) en el flujo de información en Twitter en un contexto político más variado en las que concurren una mayor diversidad opciones políticas (ver video). Asimismo pretendemos estudiar la caracterización, estabilidad de dichas comunidades y, en particular, la correlación entre sus propiedades y la estimación de voto. El hecho de disponer de datos con dos años de separación entre dos elecciones nos permite comprobar también la adecuación e invariabilidad de la metodología del análisis de redes sociales en dos situaciones diferentes.

Using Friends as Sensors to Detect Global-Scale Contagious Outbreaks

Manuel Garcia-Herranz, Esteban Moro, Manuel Cebrian, Nicholas A. Christakis and James H. Fowler, PLoS ONE 9(4): e92413 (2014) [link]



Recent research has focused on the monitoring of global–scale online data for improved detection of epidemics, mood patterns, movements in the stock market political revolutions, box-office revenues, consumer behaviour and many other important phenomena. However, privacy considerations and the sheer scale of data available online are quickly making global monitoring infeasible, and existing methods do not take full advantage of local network structure to identify key nodes for monitoring. Here, we develop a model of the contagious spread of information in a global-scale, publicly- articulated social network and show that a simple method can yield not just early detection, but advance warning of contagious outbreaks. In this method, we randomly choose a small fraction of nodes in the network and then we randomly choose a friend of each node to include in a group for local monitoring. Using six months of data from most of the full Twittersphere, we show that this friend group is more central in the network and it helps us to detect viral outbreaks of the use of novel hashtags about 7 days earlier than we could with an equal-sized randomly chosen group. Moreover, the method actually works better than expected due to network structure alone because highly central actors are both more active and exhibit increased diversity in the information they transmit to others. These results suggest that local monitoring is not just more efficient, but also more effective, and it may be applied to monitor contagious processes in global–scale networks.

Press coverage:


Performance of Social Network Sensors During Hurricane Sandy

Yury Kryvasheyeu, Haohui Chen, Esteban Moro, Pascal Van Hentenryck, Manuel Cebrian (2013) [arxiv]

Isandynformation flow during catastrophic events is a critical aspect of disaster management. Modern communication platforms, in particular online social networks, provide an opportunity to study such flow, and a mean to derive early-warning sensors, improving emergency preparedness and response. Performance of the social networks sensor method, based on topological and behavioural properties derived from the “friendship paradox”, is studied here for over 50 million Twitter messages posted before, during, and after Hurricane Sandy. We find that differences in user’s network centrality effectively translate into moderate awareness advantage (up to 26 hours); and that geo-location of users within or outside of the hurricane-affected area plays significant role in determining the scale of such advantage. Emotional response appears to be universal regardless of the position in the network topology, and displays characteristic, easily detectable patterns, opening a possibility of implementing a simple “sentiment sensing” technique to detect and locate disasters.

Best papers of the 2013. What is privacy?

I have read around 200 papers this year. A large fraction of them were very technical, some reviews and other very fashionable. But among them, I would like to highlight the ones that for me are the best. This is a personal selection and it is based not only on the technical aspects, but more on their impact. Specifically, these two papers change the way we look at privacy and how our actions reveal important information about us which was not obvious in the first place. These are the two papers I consider to be the best of 2013

The first paper reveals how Facebook “likes” can reveal important information about people like where we live, our sexual orientation, ethnicity, religious or political views, intelligence, happiness. But more worrisome is the potential prediction of use of addictive substances, parental separation, personality traits, etc. This has important implications for online personalization and privacy. Not only commercial companies can access information that individuals may not have intended to share. But one can imagine situations in which such predictions could pose a threat to individual’s freedom. You can check what your likes and friends reveal about you in the webpage app that the authors build up as a demonstration

The second paper addresses the important question in BigData applications, and also scientific research: how many data is needed to identify a particular individual? That is, how much level of anonymity is there in the data we leave behind in our everyday life? Most people (including me) think that a large volume of anonymous data might be needed to identify us. Thus our privacy is secured if we do not reveal a lot of information about us. But the researchers found that just 4 geolocalized phone calls can uniquely identify us. Just 4 calls! the reason behind is that our mobility is highly predictable and thus just 4 points in the dataset unveil that personal mobility pattern. Given the amount of geolocalized data that can be access from social networks, mobile phone data, etc. these results show that there is a growing concern that little information can be used to identify a targeted individual even in a completely anonymous dataset.

Most probably 2014 will reveal more privacy bounds and breaches in our online social life. For now, 2013 has shown us that privacy was not what is written in the Terms of Use or what we thought from our everyday life experience.

Via Catalana from the Twittersphere

Joint work by
Manuel García-Herranz, Department of Computer Science, Universidad Autonóma de Madrid
Manuel Cebrián, National Information and Communications Technology Australia, University of Melbourne
Esteban Moro, Department of Mathematics, Universidad Carlos III de Madrid

While public demonstrations are Social Science’s most important and studied phenomena, they are also the most mysterious and poorly understood ones. Demonstrations trigger new social movements, change countries attitudes, and have the potential to overthrow governments. Despite these social expressions being people’s most powerful force, very little is known about how they form, why the form, and most importantly, who they are formed by.

Via Catalana, Catalonia’s 250-mile human chain, is no exception to this mystery. As the human chain formed between the south of France and Valencia, many of the old questions popped up again in debates among journalists, politicians, and pretty much every café around Spain: was Via Catalana a grass-roots movement, or was it organized by a political central party? Did it represent the whole Catalonian society, or just a niche of society? Is there a “silent majority” of Catalans that disagree with the chain, or is the silence a way of showing support? The central question regarding the ultimate fate of the region: is there a fundamental social divide between members of the human chain and the rest of Spain? All of these questions were left unanswered by political analysts at the time, a fact clearly illustrated by official estimates of participation, which varied several orders of magnitude.

Yet Via Catalana is different, especially in the age of Big Data. Because Via Catalana is a a physical human-chain positioned across Catalonia at a concrete time and date, data scientists can study it through the hundreds of thousands of digital traces in Twitter, or as we like to call it, the Twittersphere.

For this study, we collected 97,405 geo-located Twitter messages in Spain during the day of Via Catalana, September 11th 2013, 21,237 of which were located in Catalonia. We collected a similar number of messages in the previous two days on September 10th and 9th. These days serve as a reference to study how message information, social connections, language use, sentiment, and human mobility, changed as a result of this powerful social demonstration. Without these “normal day” references, we run into the risk of underestimating or overestimating the impact of Via Catalana.

The best way to gain insight into the change with respect to human mobility is to visualize the dynamics of human mobility of the days, 9th (light blue), 10th (dark blue), and 11th(red), and overlay them on top of each other. The study comprises the full day, from midnight to midnight, iterated hour by hour (see video)

The difference between the previous two days and the Via Catalan day is striking. The peak in divergence in mobility patterns emerges around 14:00 hours, more than three hours before the “all-arms-linked” programmed time. The most patent manifestation of the chain happens well outside the Barcelona urban center, where normally there are no clear mobility patterns (just the commuting patterns of people traveling in and out of Barcelona) and the chain is easily identifiable. Interestingly, we observe that not all the fragments of the chain form at the same speed. There are two particular fragments: 1) from El Vendrell to the outskirts of Barcelona, and 2) the one comprising Blanes and Perpinyà, which required an additional two hours to form. We conjecture that this may be due to the fact that these fragments are special – they are the only fragments of Via Catalan that do not follow the coast line. The absence of a clear geographical reference may induce additional coordination costs for the mobilization; this is something that should be taken into account by practitioners when programming future demonstrations.

We also see that, at a much earlier time than the peak of the chain, an abnormal influx of individuals towards the pre-arranged location of the chain occurs around 10:00am, seven hours prior to the “all-arms-linked” programmed time. This indicates that most of the individuals decided to take advantage of the day-long holiday in Catalonia (National Day of Catalonia), which displaced them from distant regions of Catalonia and kept them from participating. It serves as an excellent way to quantify the effort, as a proxy for interest, that this movement marshaled.

The chains remain stable until approximately 22:00, where they start to disintegrate uniformly along the whole length of the chain. Twitter activity diminishes significantly after that time, and therefore more work is needed to reconstruct the traveling patterns after Via Catalana is dissolved.

Our findings illustrate the importance of the clear geographical patterns of this social mobilization – absent from other types of demonstrations, where, despite having a pre-arranged plan (e.g. marching along the downtown area of a city), the human mobility patterns are difficult to distinguish from the normal day ones. The clear geographical pattern unleashes the potential of Twitter (and other types of publicly available digital traces) to distinguish participants from non-participants, and open avenues for the behavioral analysis of these groups. Without this strong geographical pattern, the possibilities of Big Data cannot be fully exploited.

The next step of our investigation is studying how Via Catalana shaped the participants’ behavior, as measured by the information Catalonians inputted in their Twitter messages. A clear avenue to look at is the use of Catalan vs. Spanish language in the messages, given that Catalonia enjoys a bilingual society. A significant proportion of foreigners live in Catalonia, and Barcelona is a huge touristic attraction.

A snapshot of on September 10th at 17:00 vs. the same 17:00 on Via Catalana day (picture below) allows us to study how the demonstration shaped use of the language. Dots indicate geo-located Twitter messages using Spanish (Red), Catalan (blue) and Other/Unidentified (green) languages.


We can see a striking difference between September 10th and September 11th. September 10th displays a well-mixed used of Catalan, Spanish, and others; whereas September 11th shows a unifying use of Catalan along the human chain, as well as a marked increase of the use of Catalan in the whole Catalan Region.

Simple counting further illustrates this phenomenon. The number of geo-located tweets on September 10th is 14,711, with 5654 (38%) in Spanish and 3356 (23%) in Catalan. These numbers are dramatically reversed on the Via Catalana day, with 21,237 geo-located Twitter messages logged, 6074 (28%) in Spanish, and 8599 (40%) in Catalan.

It is particularly interesting to see that at both extremes of the chain, the one touching Valencia and the one touching the south of France, the use of mixed languages is more pronounced, whereas a pure Catalan use is patent in between these extremes and downtown Barcelona. We also observe two hotspots of the use of Spanish along the chain, in Mataro and Villafranca del Penedes. Further investigation into the demographics of these two regions is required to uncover the origin of their outlier nature.

Barcelona also emerges as an aggregator of languages and cultures, with high levels of language mixing even during Via Catalana. The capacity of urban centers to be exceptional engines for diversity has been known for quite some time in urban planning literature; however, the fact that diversity is resilient to undergoing social events such as Via Catalan is remarkable and a novel insight in Social Science.

These two insights, the dynamics in the human mobility undergoing the formation of Via Catalana, as well as how it dramatically affected the use of language, as just “low-hanging fruits,” can be uncovered looking at Via Catalana from the Twittersphere. Many other investigations follow, namely 1) how sentiment, as inferred from the text contained in the Twitter messages, changed as a result of Via Catalana, and 2) how sentiment from the rest of Catalonia and Spain was affected by Via Catalana.

Inference of the social connections between the Twitter users, both participants and non-participants in Via Catalana, will likely shed the most interesting insights. Algorithmic detection of communities from the social connections will uncover – perhaps for the first time ever at this scale – whether the participants in Via Catalana form one differentiated social group, or, on the other hand, are embedded into the larger Catalonian society. This is a matter of debate that has haunted local and state politicians for over a century.

We not only will look at how Via Catalana differed from the immediate past, but also, most importantly, how the movement is going to change the future social structure of the region. We are logging the daily dynamics of all Twitter users since the movement began, which will enable us to understand whether Via Catalana brought social ties from inside and outside Catalonia farther or closer, and at which pace these social changes are happening.

We truly believe that the digital traces emergent from Via Catalana provide the most exciting social laboratory for understanding how public demonstrations at super fast time scales shape social structure over the long term. It is an exhilarating time for us data scientists, and we can barely stop to write these words, as we go back into deep into the data. Every new line of code we type uncovers a hidden social reality that we want to share with the larger society.