Skip to content
I started using Twitter more than 10 years ago (!). I open an account in this social network in 2008 and although I was not using it too much for the first year, I become a frequent user after that. It has helped me to get news, information both for my personal and professional interests. But not only that, Twitter has been also the data source for our research, that helped us to investigate the relationship between human behavior in the social platform and paramount problems in our society as information propagation, unemployment, disaster damage, political opinion.
What are the properties of a long-lasting relationship? This important question as intrigued the social scientists during the last decades and has triggered numerous publications, surveys and experiments to detect what patterns are behind social relationships that persist. Probably the most famous finding is that of Granovetter who proposed that strong relationships are the ones more likely to persist in the future. And what is a strong relationship? According to Granovetter, a strong relationship is that with high intensity (a lot of interactions), intimacy (mutual confiding) and large structural redundancy (lots of common friends).
One of my favorite activities is to teach my field or research (network science) to high-schoolers. We (together with my colleague Cristina Brändle) have been doing that from our university to the local high schools in Madrid. Since they know concepts like equations, probability or geometry, it is somehow easy to show them concepts like what is a network, small world, friendship paradox or centrality. We usually have transparencies and allow them to work on Excel to perform some calculations which works well to understand the basic concepts of networks.
From the “Small World Experiment” to the “Red Balloon Challenge,” and beyond
We live in a small world, right? But the cost and fragility of navigating it could harm any potential strategy to leverage the power of social networks. Read this fascinating story of the research, experiments, and failures in the quest for using social networks to search information/people:
[Excerpt of the article] Our ability to search social networks for people and information is fundamental to our success.
This is a recent article publish in Medium.com by David Martín-Corral, Manuel Cebrián and myself in which we analyze the super-scaling of touristic attraction (number of events) with population. Amazingly we found that the number of events (music/theater/sports/etc.) scales super-linearly with the population of the city.
So yes! more people means more fun!
Read the article in Medium
2015 has been an amazing year both on the professional and personal grounds. Here are some of the main things that happened this year:
I helped to build up the “Master in Data Science and Big Data in Finance” at the Afi School of Finance. In this first edition we had 18 incredible students and the support from most of the big financial companies in Spain and the best researchers, practitioners and data scientist in the financial industry.
A while ago, I wrote a post about how to create animations of temporal networks using R and the amazing package igraph package. The post was written in 2012 and the code does not work with the most recent versions (1.0) of igraph. Here I revisited that post, improving its performance and also making it consistent with the new versions of the package and R.
First of all, let me remind you the basic idea: we want to get an animated evolution of a network in which nodes/edges appear (and/or disappear) dynamically.
I had the pleasure to organize last edition of Netmob at MIT Media Lab (together with Sandy Pentland, Vincent Blondel and Yves-Alexandre de Montjoye). Netmob is the primary conference in the analysis of those datasets in social, urban, societal and industrial problems. Netmob 2015 also hosted the final part of theD4D Challenge by Orange. They were 3 amazing days of applications and analysis of mobile phone datasets, preceded by one-day school and 3 days Hackathon.
I have read around 200 papers this year. A large fraction of them were very technical, some reviews and other very fashionable. But among them, I would like to highlight the ones that for me are the best. This is a personal selection and it is based not only on the technical aspects, but more on their impact. Specifically, these two papers change the way we look at privacy and how our actions reveal important information about us which was not obvious in the first place.
Joint work by
Manuel García-Herranz, Department of Computer Science, Universidad Autonóma de Madrid Manuel Cebrián, National Information and Communications Technology Australia, University of Melbourne Esteban Moro, Department of Mathematics, Universidad Carlos III de Madrid While public demonstrations are Social Science’s most important and studied phenomena, they are also the most mysterious and poorly understood ones. Demonstrations trigger new social movements, change countries attitudes, and have the potential to overthrow governments.
In our last (just accepted) paper “Limited communication capacity unveils strategies for human interaction” [link] we have found that we humans have different social strategies when we communicate/interact with people. Specifically, the sociability of a person (the total number of contacts in a time interval) which is usually taken as the connectivity in the social network is actually the result of two different human features:
Social capacity: the number of relationships humans can maintain opened and which is limited Social activity: the number of relationships human form and destroy as a consequence of their daily tasks, family, events, etc.
Happy World Book Day
La Caballería andante (...) es una ciencia, dijo Don Quijote (...) que encierra en sí todas o las más ciencias del mundo (...) el que la profesa ha de ser jurisperito, y saber las leyes de la justicia distributiva y conmutativa (...) ha de ser teólogo, para saber dar razón de la cristiana ley que profesa (...); ha de ser médico, principalmente herbolario, parara conocer (...) las yerbas que tienen virtud de sanar las heridas (.
To me, the main question when modeling a process in the social sciences is "when does emulation count as explanation?"
— John Myles White (@johnmyleswhite) March 25, 2013 Interesting question. I guess the problem is that sometimes in science a model does not pretend to make any prediction, neither is supported by any relevant data. It simply emulates reality.
UPDATE: the version of the R code in this post does not work with newer versions of the igraph package (> 1.0). I have posted an updated version of this post here: Temporal networks with R and igraph (updated). Please visit the new post to use the new code and follow the discussion there. In my last post about how a twitter conversation unfolds in time on Twitter, the dynamical nature of information diffusion in twitter was illustrated with a video of the temporal network of interactions (RTs) between accounts.
Preferential attachment is a key process governing the dynamics of many economic, social and biological process. It is the “The rich get richer” mechanism by which a quantity is distributed among individuals according to how much they already have. It also happens in social networks and the ones that have more social connectivity (the “hubs”) receive more new connections than the poorly connected. In a famous paper, Laszlo Barabási and Reka Albert encoded this mechanics in the so called Barabasi-Albert model to generate random scale free-networks.
Millions of tweets, retweets and mentions are exchanged in Twitter everyday about very different subjects, events, opinions, etc. While aggregating this data over a time window might help to understand some properties of those processes in online social networks, the speed of information diffusion around particular time-bound events requires a temporal analysis of them. To show that (and with the help of the Text & Opinion Mining Group at IIC) we collected all tweets (750k) of the vibrant conversation around the disputed subject of the general strike of March 29th in Spain.
Yesterday I gave a talk in the 6th IIC Technology Conference about how Social Contagion can be leveraged for marketing purposes. The motto of the conference was about the need of using Algorithms in nowadays business process. With the availability of more and more complex data the use of algorithms that can detect and reduce complexity is of paramount importance. Big data is not only about volume (TeraBytes of data), it is about huge complex data and reducing that complexity can only be achieved by modeling, simulating and analyzing the patterns we observe in the data.
I have just read an amazing book “Shibumi” by Trevanian (a.k.a. Rodney William Whitaker) probably the best spy novel I have read so far. In the book, a big data computer (called Fat Boy) is operated by a “data scientist” (although is not called that way). I enjoyed very much the following paragraph, an analogy of the emptiness of big data without insight and also a musing about how difficult is to find relationships from activity data (the kind of research we do!
In most of my talks I present quantitative evidence of patterns, data exploration or results. But which is the right way to show that evidence? Worry no more: the Extreme Presentation Method helps you to decide with this chart chooser (click here to download the pdf)
My impression is that this chart chooser is good for small data. For big data some of the charts render useless. For example, read the insightfull post Don’t use Scatterplots by Chris Stucchio on why is not a good idea to use scatterplots to show relationship between bivariate data when you have large amounts of data.
We (together with Kimmo Kaski, Aalto University) are organizing the ECCS’11 Satellite conference “Complex Dynamics of Human Interactions” to be held at Vienna, September 14th.
You can find more info at http://www.complexdynamics.org
“It’s not enough to have a map of the structure. It is crucial to understand the dynamics of a process”, L. Barábasi
Scope The nature of human interaction has undergone a substantial change in the past years and the change does not seem to be over.
Each day trillions of emails, phone calls, comments on blogs, twitter messages, exchanges in online social networks, etc. are done. Not only the number of communications has increased, but also each of these transactions leaves a digital trace that can be recorded to reconstruct our high-frequency human activity. It is not only the amount and variety of data that is recorded what is important. Also its high-frequency character and its comprehensive nature have allowed researchers, companies and agencies to investigate individual and group dynamics at an unprecedented level of detail and applied them to client modeling, organizational analysis or epidemic spreading .
La crisis y su efecto en los presupuestos del año 2010 han servido para poner a prueba el compromiso del gobierno de cambiar de modelo productivo y aumentar el gasto en I+D+i. En especial, sufren recortes los gastos del ministerio de Ciencia e Innovación (hasta un 17%), el capítulo 7 (las subvenciones a investigadores) un 17% y los presupuestos de algunos OPIs dependientes del Ministerio con un 15% menos de media.
We have just published an experimental/theoretical work on the speed of information diffusion in social networks in Physical Review Letters. Specifically we have studied the impact of the heterogeneity of human activity in propagation of emails, rumors, hoaxes, etc. Tracking email marketing campaigns, executed by IBM Corporation in 11 European countries, we were able to compare their viral propagation with our theory (see below the campaigns details).
The results are very simple.
Giving a good talk is not an easy task, but with time and practice you get to learn how to communicate (hopefully I’ve learned too!!). There are a number of places on the web with advices to give a good talk. But I like Paul N. Edwards’s short manual about how to give an academic talk. My experience as audience in many talks tell me that the most important things are (quoting Paul’s manual):
Our research group is looking for Ph.D. candidates. Here is the announcement
![mosaico](http://estebanmoro.org/wp-content/uploads/2009/06/mosaico.jpg) We offer contracts to work for a Ph.D. within the project MOSAICO (Modelling, Analysis and Simulations of Complex Systems). Candidates must have a degree in physics, math or related disciplines with outstanding marks. Info on the research lines is available from [http://www.gisc.es](http://www.gisc.es) and work will be carried out at Universities Complutense or Carlos III de Madrid. Work will begin on October 1st, 2009.
We’ve heard it: people that invest on the stock market or that gamble in lotteries, casinos, etc usually say “I’m going through a bad patch” (or bad spell). That is, they have been losing money for a while, but hey! better times are ahead and there’s no reason to quit. Are they sure? Are better times ahead? How close is “ahead” to today? Let’s work through a specific example to see how far is “ahead”.
I got wonderful news today. Our paper “Specialization and herding behavior of trading firms in a financial market” (pdf) has been selected by the Editorial Board of New Journal of Physics as part of the Journal’s Best of 2008. According to their site, “Best of 2008” is a compilation of articles selected by the Editorial Board and staff team on the basis of criteria including referee endorsements, readership and citation levels and simple broad appeal.
Mark Twain (1924) probably had politicians in mind when he reiterated Disraeli’s famous remark ("There are three kinds of lies: lies, damned lies and statistics"). Scientists, we hope, would never use data in such a selective manner to suit their own ends. But, alas, the analysis of data is often the source of some exasperation even in an academic context. On hearing comments like ‘the result of this experiment was inconclusive, so we had to use statistics’, we are frequently left wondering as to what strange tricks have been played on the data.
One of the areas of my research is stochastic differential equations (SDE). I posted about it several times before. One of the things students and collaborators keep asking me about SDEs is the weird stochastic Itô Calculus. Itô Calculus is different from what you learn in 101 calculus. In particular, the chain rule is not longer valid. Let me explain it with an example. Suppose you have the following equation
The Eigenfactor Project and Moritz Stefaner join efforts to visualize the citation network between different journals belonging to different research fields. It is amazing and a wonderful way to explore patterns in citation networks
Found via FlowingData
Although the public transportation system in Madrid is very good, I don’t usually take the bus or the train to move around. But sometimes my car decides I should take public transportation (bus or train). One of things I always found intriguing is that I always wait way more for the bus than what is expected according to the frequency quoted by the transportation companies at the bus or train stop.
I found an interesting presentation by Sami Mitra, associate editor of Physical Review Letters, about the editorial office and the editorial process at PRL. Among some interesting figures about the journal and also about the procedure of selecting potential referees, I enjoyed very much the quotes from some of the referee communications with the editorial office. Here they are:
I cannot review this paper as it is wrong and I did it ﬁrst.
In the old days, research quality was measured by the number of papers you published. Publishing was a hard process and only few scientists were able to publish several papers per year. However, with the bloom of new journals, the appearance of electronic editorial process, and the specialization of research fields, the number of publications per year has grow exponentially during the last decades. Thus publishing is not longer a good measure of the quality of research.
Good news to fans of the “language of Nature”, mathematics. Wall Street Journal is running an article on what are the best jobs in 2008. The ranking is done “evaluating 200 professions to determine the best and worst according to five criteria inherent to every job: environment, income, employment outlook, physical demands and stress.” And which one do you think ranks first? Here is the list of the first ten:
In 1828, Robert Brown published the manuscript entitled “A brief account of microscopical observations made in the __months of June, July and August, 1827, on the particles contained in the pollen of plants; and on the general existence of active molecules in organic and inorganic bodies" in the Edinburgh new Philosophical Journal [download it in pdf format here]. He suspended some of the pollen grains of the species Clarkia pulchella in water and examined them closely, only to see them “filled with particles” of around 5 µm diameter that were “very evidently in motion”.
Science is the only news. When you scan through a newspaper or magazine, all the human interest stuff is the same old he-said-she-said, the politics and economics the same sorry cyclic dramas, the fashions a pathetic illusion of newness, and even the technology is predictable if you know the science. Human nature doesn’t change much; science does, and the change accrues, altering the world irreversibly. A famous quote by Stewart Brand, that appear in John Brockman’s essay (and book) The Third Culture
News from the arXiv: a new section has been created to host preprints about Quantitative Finance. The section (as stated in the press release) intends to fix a problem with existing pre-print repositories. One one hand, social sciences repositories like SSRN, RepEC/IDEAS and others are too academic for practitioners, while on the other hand sites like defaultrisk.com or wilmott.com have not attracted many academic contributors. The new category in the arXiv would be a gathering point for both practitioners and academic people working in this important research field
Stochastic differential equations (SDEs) are basically inhomogenous ordinary differential equations that depend on an external stochastic process.
Typically, that stochastic process is white noise, which is the mathematical idealization of the noise found in nature. This idealization is handy, because it simplifies the mathematical description. However, this idealization comes at some cost: traditional calculus is no longer valid and you have to use the so-call Itô calculus. This introduces some non intuitive changes.
Percentage of active users in the Internet 2.0 is tiny. Fractions go from
* only 1% of Wikipedia's users contribute to making it better * only 0.1% of users upload their own videos to Youtube * only 3% of people with weblogs post on a daily basis * only 1% of Amazon.com customers contribute with reviews The numbers are tiny. But not uncommon. Typical return rates of marketing campaigns or surveys are around 2-5% (see report by the Direct Marketing Association).
A wonderful quote about the nature of Game Theory
Game theory is no doubt wonderful for telling stories. However, it flunks the main test of any scientific theory: The ability to make empirically testable predictions. In most real-life situations, many different outcomes -- from full cooperation to near-disastrous conflict -- are consistent with the game-theory version of rationality. by Michael Mandel, on Kahneman and Smith 2002 Nobel Prize in Economics.
November was a rather sad month in the world of stochastic differential equations. In the 26th we were suppose to be celebrating the birth of one of the best mathematicians in history, Norbert Wiener, who gives name to the Wiener process, usually denoted W(t). However, in the 10th, Kiyoshi Itô, the father of stochastic differential equations, passed away. Interestingly both are present in a simple stochastic differential equation like this
While the individual man is an insoluble puzzle, in the aggregate he becomes a mathematical certainty. You can, for example, never foretell what any one man will do, but you can say with precision what an average number will be up to. Individuals vary, but percentages remain constant. So says the statistician Sherlock Holmes in “The Sign of Four”. Via Gaussianos
A picture is worth a thousand lines of equations
(http://estebanmoro.org/wp-content/uploads/2008/11/neatproofs.jpg)By [James Robert Brown](http://www.chass.utoronto.ca/~jrbrown/)
Hola a todos.
Este es mi primer post en español en este blog (aunque incluyo la traducción al inglés más abajo). Y lo hago para anunciar que soy el editor desde el número de Diciembre de la sección “Física y Computación” de la revista de la Real Sociedad Española de Física. Para seguir las novedades y noticias de dicha sección he montado un blog (distinto a este) en el que podeis ver los nuevos artículos de la sección, así como el archivo de los artículos ya publicados.
It is always amazing to browse through YouTube, specially if you are looking for science material. Here is an example of superconductivity: take a superconductor and a magnet at room temperature. Nothing happens. Now cool down the superconductor using liquid nitrogen. The superconductor starts “superconducting” and boom! here comes in the Meissner effect. As always, a picture (movie) is better than a page of equations to show how wonderful physics is.
Two New York newspapers (The New York Times and the New Yorker) are running stories about whether string theory is a theory of anything or not. Specifically, both articles are reviews of a couple of very critic books on string theory: * NOT EVEN WRONG : The Failure of String Theory and the Search for Unity in Physical Law. By Peter Woit.
* THE TROUBLE WITH PHYSICS The Rise of String Theory, the Fall of a Science, and What Comes Next.
Apparently, a company in Ireland named Steorn has found the killer marketing campaign for their products:
1. Get a law of physics: the first law of thermodynamics, for example, and claim you have a technology that can break it. Cool! 2. Get a good flashy marketing campaign by publishing in The Economist a "show us wrong" announcement to the scientific community. 3. Hide the details of your technology and delay its public announcement by creating a "challenge" to the scientific community.
Kiyoshi Itô (90), professor emeritus at kyoto University, has become the first winner of the Gauss Prize. This prize is to honor scientist whose mathematical research has had an impact outside mathematics. Ito’s work, mainly in establishing a well defined calculus (named Ito’s calculus) to treat high irregular noise functions has got widespread application in describing several stochastic processes across fields like economics, biology, chemistry, physics, etc. Ito’s calculus is behind the pricing of options introduced by Black, Scholes and Merton (which got them a Nobel price).
We all are aware of recent cases of fraud in science. The case of cloning in South Korea is the most recent one, but not the first or the last to happen. Identifying those cases is hard, since most of the times the verification of the claims is a long time-consuming process. Very recently, Robert L. Park has identified some warning signs about a scientific discoverythat can make us doubt about the scientific soundness of it, since they indicate that a scientific claim lies well outside the bounds of rational scientific discourse: * The discoverer pitches the claim directly to the media * The discoverer says that a powerful establishment is trying to suppress his or her work * The scientific effect involved is always at the very limit of detection * Evidence for a discovery is anecdotal * The discoverer says a belief is credible because it has endured for centuries * The discoverer has worked in isolation * The discoverer must propose new laws of nature to explain an observation Several examples with all or several of these red lights come to my mind.
The Edge has a summary-article on a Kevin Kelly’s talk on The Next 100 Years of Science: Long-term Trends in the Scientific Method. Kevin Kelly helped launch Wiredmagazine in 1993 and has published several books and articles in publications such as The Economist, The New York Times, Time, etc. He rises some interesting points about what’s next in science for this century. Specifically:
* There will be more change in the next 50 years of science than in the last 400 years.
Take a coin and toss it a number \(N\) of times in a time interval of duration \(T\). Suppose that every time you get head you win \(a\) euros and that you lose the same amount of money when you get tail. Then your capital is a random process with ups and dows like this:
This process is a stochastic process usually called “Random Walk” and its properties depend on the parameters $N, a $ and \(T\).
When tea is poured in a cup of hot water, we observe a phenomenon called diffusion: in the end particles of tea spread evenly throughout the mass of water and we enjoy our cup of tea. Diffusion occurs as a result of the second law of thermodynamics (increase of entropy) and can be modeled quantitatively using the diffusion equation (or heat equation). This is a funny equation, since it establishes that the velocity of spreading is infinite while the mean root square fluctuations of the position of the particles grows in time as
Or… Correlation implies causality. This is a logical fallacy. Keep this in mind when analyzing data. There are numerous cases of how people use this logical fallacy nowadays (most typically in newspapers). For example, the following graph
shows a correlation between global warming and the number of remaining pirates. But this does not imply (of course), any causality between them.
By the way, this graph appears in a Bobby Henderson’s clever parody of the type of arguments in Intelligent Design.
I have been on holidays during the last two weeks visiting Argentina. The picture on the left was taken on top of the Perito Moreno glacier, which is amazing. Most glaciers we found where blueish, including the icebergs found floating in the rivers. The reason for that is that the thicker the ice or snow layer is, the better red colors are absorbed by the layer and only the blue colors are reflected (see a more detailed explanation here).
You can’t fold a paper more than seven or eight times. Don’t believe me? Then try it. Thinner paper? Longer paper? It doesn’t matter; you just can’t do it. I used to play this game with my friends which were always amazed and asked for an explanation. Is there any physical or mathematical constrain to do it? Nope: it is simply a matter of scale. If you have a paper of length _L_ and you fold it, the length now is L/2.
In a recent Nature article, Albert-Lászlo Barabási and João Gama Oliveira, have found the perfect excuse for lazy people not answering some emails in their inbox: they analyzed the time response of emails and found that they follow a power law probability distribution of the form P(t) = t-1. In particular this implies that not even the mean response time is finite. Hey! why should you then expect me to answer your emails within my lifetime period!
UCSD physicist Jorge E. Hirsch has propose a quick-and-dirty way to measure quality of academic scientist’s output. His method is explained and studied in a paper to be published in the November 15 issue of PNAS. The idea is very simple and it is called the h-index. This number relies on the number of citations our papers have. In particular the h-index is the maximum number _h_ that verifies the following: at least h of papers of an author have h citations each.