Joint work by
Manuel García-Herranz, Department of Computer Science, Universidad Autonóma de Madrid
Manuel Cebrián, National Information and Communications Technology Australia, University of Melbourne
Esteban Moro, Department of Mathematics, Universidad Carlos III de Madrid
While public demonstrations are Social Science’s most important and studied phenomena, they are also the most mysterious and poorly understood ones. Demonstrations trigger new social movements, change countries attitudes, and have the potential to overthrow governments. Despite these social expressions being people’s most powerful force, very little is known about how they form, why the form, and most importantly, who they are formed by.
Via Catalana, Catalonia’s 250-mile human chain, is no exception to this mystery. As the human chain formed between the south of France and Valencia, many of the old questions popped up again in debates among journalists, politicians, and pretty much every café around Spain: was Via Catalana a grass-roots movement, or was it organized by a political central party? Did it represent the whole Catalonian society, or just a niche of society? Is there a “silent majority” of Catalans that disagree with the chain, or is the silence a way of showing support? The central question regarding the ultimate fate of the region: is there a fundamental social divide between members of the human chain and the rest of Spain? All of these questions were left unanswered by political analysts at the time, a fact clearly illustrated by official estimates of participation, which varied several orders of magnitude.
Yet Via Catalana is different, especially in the age of Big Data. Because Via Catalana is a a physical human-chain positioned across Catalonia at a concrete time and date, data scientists can study it through the hundreds of thousands of digital traces in Twitter, or as we like to call it, the Twittersphere.
For this study, we collected 97,405 geo-located Twitter messages in Spain during the day of Via Catalana, September 11th 2013, 21,237 of which were located in Catalonia. We collected a similar number of messages in the previous two days on September 10th and 9th. These days serve as a reference to study how message information, social connections, language use, sentiment, and human mobility, changed as a result of this powerful social demonstration. Without these “normal day” references, we run into the risk of underestimating or overestimating the impact of Via Catalana.
The best way to gain insight into the change with respect to human mobility is to visualize the dynamics of human mobility of the days, 9th (light blue), 10th (dark blue), and 11th(red), and overlay them on top of each other. The study comprises the full day, from midnight to midnight, iterated hour by hour (see video)
The difference between the previous two days and the Via Catalan day is striking. The peak in divergence in mobility patterns emerges around 14:00 hours, more than three hours before the “all-arms-linked” programmed time. The most patent manifestation of the chain happens well outside the Barcelona urban center, where normally there are no clear mobility patterns (just the commuting patterns of people traveling in and out of Barcelona) and the chain is easily identifiable. Interestingly, we observe that not all the fragments of the chain form at the same speed. There are two particular fragments: 1) from El Vendrell to the outskirts of Barcelona, and 2) the one comprising Blanes and Perpinyà, which required an additional two hours to form. We conjecture that this may be due to the fact that these fragments are special – they are the only fragments of Via Catalan that do not follow the coast line. The absence of a clear geographical reference may induce additional coordination costs for the mobilization; this is something that should be taken into account by practitioners when programming future demonstrations.
We also see that, at a much earlier time than the peak of the chain, an abnormal influx of individuals towards the pre-arranged location of the chain occurs around 10:00am, seven hours prior to the “all-arms-linked” programmed time. This indicates that most of the individuals decided to take advantage of the day-long holiday in Catalonia (National Day of Catalonia), which displaced them from distant regions of Catalonia and kept them from participating. It serves as an excellent way to quantify the effort, as a proxy for interest, that this movement marshaled.
The chains remain stable until approximately 22:00, where they start to disintegrate uniformly along the whole length of the chain. Twitter activity diminishes significantly after that time, and therefore more work is needed to reconstruct the traveling patterns after Via Catalana is dissolved.
Our findings illustrate the importance of the clear geographical patterns of this social mobilization – absent from other types of demonstrations, where, despite having a pre-arranged plan (e.g. marching along the downtown area of a city), the human mobility patterns are difficult to distinguish from the normal day ones. The clear geographical pattern unleashes the potential of Twitter (and other types of publicly available digital traces) to distinguish participants from non-participants, and open avenues for the behavioral analysis of these groups. Without this strong geographical pattern, the possibilities of Big Data cannot be fully exploited.
The next step of our investigation is studying how Via Catalana shaped the participants’ behavior, as measured by the information Catalonians inputted in their Twitter messages. A clear avenue to look at is the use of Catalan vs. Spanish language in the messages, given that Catalonia enjoys a bilingual society. A significant proportion of foreigners live in Catalonia, and Barcelona is a huge touristic attraction.
A snapshot of on September 10th at 17:00 vs. the same 17:00 on Via Catalana day (picture below) allows us to study how the demonstration shaped use of the language. Dots indicate geo-located Twitter messages using Spanish (Red), Catalan (blue) and Other/Unidentified (green) languages.
We can see a striking difference between September 10th and September 11th. September 10th displays a well-mixed used of Catalan, Spanish, and others; whereas September 11th shows a unifying use of Catalan along the human chain, as well as a marked increase of the use of Catalan in the whole Catalan Region.
Simple counting further illustrates this phenomenon. The number of geo-located tweets on September 10th is 14,711, with 5654 (38%) in Spanish and 3356 (23%) in Catalan. These numbers are dramatically reversed on the Via Catalana day, with 21,237 geo-located Twitter messages logged, 6074 (28%) in Spanish, and 8599 (40%) in Catalan.
It is particularly interesting to see that at both extremes of the chain, the one touching Valencia and the one touching the south of France, the use of mixed languages is more pronounced, whereas a pure Catalan use is patent in between these extremes and downtown Barcelona. We also observe two hotspots of the use of Spanish along the chain, in Mataro and Villafranca del Penedes. Further investigation into the demographics of these two regions is required to uncover the origin of their outlier nature.
Barcelona also emerges as an aggregator of languages and cultures, with high levels of language mixing even during Via Catalana. The capacity of urban centers to be exceptional engines for diversity has been known for quite some time in urban planning literature; however, the fact that diversity is resilient to undergoing social events such as Via Catalan is remarkable and a novel insight in Social Science.
These two insights, the dynamics in the human mobility undergoing the formation of Via Catalana, as well as how it dramatically affected the use of language, as just “low-hanging fruits,” can be uncovered looking at Via Catalana from the Twittersphere. Many other investigations follow, namely 1) how sentiment, as inferred from the text contained in the Twitter messages, changed as a result of Via Catalana, and 2) how sentiment from the rest of Catalonia and Spain was affected by Via Catalana.
Inference of the social connections between the Twitter users, both participants and non-participants in Via Catalana, will likely shed the most interesting insights. Algorithmic detection of communities from the social connections will uncover – perhaps for the first time ever at this scale – whether the participants in Via Catalana form one differentiated social group, or, on the other hand, are embedded into the larger Catalonian society. This is a matter of debate that has haunted local and state politicians for over a century.
We not only will look at how Via Catalana differed from the immediate past, but also, most importantly, how the movement is going to change the future social structure of the region. We are logging the daily dynamics of all Twitter users since the movement began, which will enable us to understand whether Via Catalana brought social ties from inside and outside Catalonia farther or closer, and at which pace these social changes are happening.
We truly believe that the digital traces emergent from Via Catalana provide the most exciting social laboratory for understanding how public demonstrations at super fast time scales shape social structure over the long term. It is an exhilarating time for us data scientists, and we can barely stop to write these words, as we go back into deep into the data. Every new line of code we type uncovers a hidden social reality that we want to share with the larger society.