Temporal networks with igraph and R (with 20 lines of code!)

In my last post about how a twitter conversation unfolds in time on Twitter, the dynamical nature of information diffusion in twitter was illustrated with a video of the temporal network of interactions (RTs) between accounts. The temporal evolution of the network yields to another perspective of social structure and, in some cases, aggregating the data in a time window might blur out important temporal structures on information diffusion, community or opinion formation, etc. Although many of the commercial and free Social Network Analysis software have tools to visualize static networks, there are no so many options out there for dynamical networks. And in some cases they have very limited options for their “dynamical layout”. A notable exception is SoNIA, the Java-based package, which unfortunately is not updated frequently. Another possibility is to work with the Timeline plugin in Gephi. However there is no video recording possibility for the animations. In this post I will show you how to render the network at each time step and how to encode all snapshots into a video file using the igraph package in R and ffmpeg. The idea is very simple

  1. generate a number of snapshots of the network at different times using R and igraph, and
  2. then put them together in a video file using ffmpeg.

For 1. we need to draw the temporal network at each snapshot. Given the set of nodes and edges present at a given time, we have to find a layout for that instantaneous graph. The layout is a two-dimensional visualization of the nodes and edges in the plane and there are many algorithms to produce it. The package igraph contains mainly Force based algorithms like for example the Kamada-Kawai or Fruchterman-Reingold ones. Your millage may vary from one algorithm to another since visualizations depend on the number of nodes, clustering and/or community structure of the network. Sounds easy, but there two big problems with this approach:

  • Force based layout algorithms consist on performing a number of iterations aimed to minimize the energy of physical forces between nodes, and starting from an initial configuration which is typically a random initial condition. This means that even if your network does not evolve in time, successive calls to the layout algorithm will produce different results. In our temporal network case it means that the layout from one snapshot to the next one will be very different producing a swarm-of-bees kind of motion. For example, if you run this script you will see that the four layouts are very different:
    library(igraph)
    par(mfrow=c(2,2),mar=c(0,0,0,0), oma=c(0,0,0,0))
    g <- watts.strogatz.game(1,20,3,0.4)
    for(i in 1:4) plot(g,layout=layout.fruchterman.reingold,margin=0)
    

    Luckily, in igraph 0.6 we can specify the initial position of the nodes for some layout functions: layout.graphopt, layout.kamada.kawai and layout.fruchterman.reingold. My personal experience is that layout.graphopt crashes in this 0.6 version (although it works on 0.5), so we are left with the other two algorithms. The plan (taken from this original idea of Tamás Nepusz, one of the developers of igraph) is to use the layout of the previous snapshot as the initial condition for the next snapshot layout so we have a smooth transtion from one to the other. In the example above, the implementation will be the following using the start parameter:

    library(igraph)
    par(mfrow=c(2,2),mar=c(0,0,0,0), oma=c(0,0,0,0))
    g <- watts.strogatz.game(1,20,3,0.4)
    layout.old <- layout.fruchterman.reingold(g)
    for(i in 1:4){
         layout.new <- layout.fruchterman.reingold(g,params=list(niter=10,maxdelta=2,start=layout.old))
         plot(g,layout=layout.new)
         layout.old <- layout.new
         }
    

    Now you can see that the layouts are similar. There are two new parameters passed to the layout function: niter = 10 specify the number of iterations (10) of the minimization of energy procedure in the force based algorithm. This number should be small, otherwise the final result will be very different from the initial condition. The same happens for the other parameter maxdelta=2 which controls the maximum change in the position of the nodes allowed in the minimization procedure.

  • The other problem is that in a temporal network nodes and/or edges appear and disappear dynamically. Thus the time dependent graph might have different number of nodes and/or edges from one snapshot to the next one. This means that the layout at a given snapshot cannot be used as the initial condition to generate next time layout, since the number of nodes can be different. My solution to this problem is to consider all (past/present/future) nodes/edges when calculating the layout but to display only present nodes/edges in the plot by making past and future nodes/edges transparent. This trick allows the reutilization of the layouts between steps, but it will produce a more or less steady visualization in which the layout at any given time is not related to the instantaneous structure of the temporal graph. To overcome this problem we take advantage of another property of force based algorithms: nodes which are connected attract each other along the edge. At a given instant, we could then modify the attraction between nodes along edges depending on whether the the edge is not present. In igraph 0.6, only the layout.fruchterman.reingold has this possibility through the parameter weights, a vector giving edge weights which are use to multiply the attraction along the edge. For example we could set weight equal to one if the edge is present and use zero weight for the rest. This will produce a layout in which present nodes are tightly connected while the past/future nodes are repelled from them. This effect dramatically highlights the appearance and disappearance of nodes, but could create too much confusion if there are many of those events.

To test this ideas, we will work an important example in the theory of complex networks: the preferential attachment mechanism to generate scale-free networks, i.e. the Barabási-Albert model. In our implementation, we keep the mechanism very simple: starting from an initial core of nodes, at each time step we add a single node that connects to m existing nodes which are selected proportionally to the number of links that the existing nodes already have. This mechanism leads to heavily linked nodes ("hubs") together with a large fraction of poorly connected nodes. A particular realization of this model can be found in the file edges.csv below. The structure of the file is simple each line of the form id1 | id2 | time indicates that a link between id1 and id2 appears at a particular time. Depending on the context this might represent that the tie was activated at that particular instant (for example if it is a RT between two twitter accounts) or that it was the time in which the edge appeared first (like in our Barabási-Albert model). Here is the code to generate the snapshots and producing a PNG picture for each of them: As you can see the edges present before time ti are colored in "gray" and weighted 1 while the rest are transparent rgb(0,0,0,0) and weighted 0. For the nodes we have used the function graph.strength that calculate the sum of weights of adjacent edges of a node: note that if at a given instant a node has no active adjacent edges, its graph strength is zero and thus the node is transparent. Otherwise it is colored as in the vcolor vector. Final step is to encode this images into a video format. To that end I have used ffmpeg which can be install in linux, windows or mac. The following command line in a terminal shell produces a video file output.mp4 in the mpeg format:

ffmpeg -r 10 -b 20M -i example%03d.png output.mp4

The first -r 10 flag controls the rate of frames per second (fps), 10 in this case, while the -b 20M sets the bitrate in the output (set to a large value here, 20M). The result is the following video Done with 20 lines in R! I'm sure you can beat me with some other R tricks and many ways to improve this visualization. I am eager to know your comments. Please!

38 thoughts on “Temporal networks with igraph and R (with 20 lines of code!)

  1. Thank you for you comment! I have changed the file in git accordingly. Please, reload the page to see the new code which includes a new variable: total_time.

  2. Hi, great post. I’m interested in one aspect. Your network is basically accumulating edges over time. Have you thought of a way to represent in the data structure a way to register the `disconnect’ action between two nodes? I think you could just use the presence of an edge at time t1 as a toggle signal, but I don’t find it practical.

    regards,
    David

  3. David: that could be easily done if edges have time_initial and time_final properties. Then, the only thing to change inside the loop will be the “ifelse”‘s: in this case they should be

    E(g)$weight < - ifelse( ti > E(g)$time_initial & E(g)$time_final > ti, 1,0)

    for example.

    Actually, this is the way I did the visualization of the conversation in twitter you can find in this post: Every link there is the (almost) instantaneous RT between two accounts. Let me know if you have a simulation/dataset in mind. I might be able to help you.

  4. Great visualization and another demonstration of the power of R. Now I’m not in social network analysis, but I’m very curious how this visualization is used to improve decision making . In general I find this type of animated visualization attractive, but I find it difficult to see this applied in a business type operationalization. Hope you can enlighten me with some implementation examples?

  5. Thanks Peter for your comment. Visualization can aid the analysis. In my other post about Twitter and politics, you can clearly see the dynamics of the conversation. With that information you can gain insight into what is important in analyzing information diffusion in social networks: in that specific example, the group formation, the cyrcadian patterns and the clash between different opinions.

    On a more general note, visualization is sometimes, by itself, another kind of analysis, since most of the visualization algorithms are based on clustering, fitting and/or modeling of the data.

  6. Pingback: MOOCs, SNA and taking sides in WoW | Warping Data

  7. Hi Esteban, these videos are really great! I’ve been in the process of developing a library named ‘ndtv’ for doing this sort of thing. It uses the data structures in the ‘networkDynamic’ package (which is built on ‘network’, which supplied much of the original code for igraph) and the ‘animation’ library. Its not yet on CRAN, but it can be download as part of the statnet preview release: http://statnet.csde.washington.edu/preview/

    I’d be curious to hear what you think. Its not currently using the sort of direct coupling to past positions you’ve implemented, I’ll have to try that out…

  8. Thanks Skye for your comment. Nice library! If I got it right, your compute.animation function is basically implementing the same idea I used here: take the previous layout as a seed to produce the new one. In that idea the key point is the algorithm used: since the algorithm advances the dynamics between network events, it should be as “physical” as possible. This is why forced-based algorithms work very well. The problem is that to accelerate convergence, in most forced-based algorithms there are some non-physical movements (most of the times at the first iterations). What I found is that in igraph, the fruchterman.reingold algorithm is the best, specially if you limit the node displacement by max.delta. In my opinion, a clean implementation of a force-based layout algorithm will be the best to use.

  9. hi esteban,
    looks great and I would like to play around with my own data. would it also work with real unix timestamps instead of your consecutive numbering of t ?

  10. I guess it will be straightforward: they only thing I can imagines is that you have to change the seq command at the beginning of the loop. In any case, let me know if you find any problem!

  11. Pingback: Redes Temporais usando igraph e R. | Laboratório de Estudos sobre Internet e Cultura | Depcom - Ufes

  12. Dear Esteban!
    This is fascinating. I have a question, what would be the structure of the time_final relationship? For example assuming that a and b are not connected at time 2, what would i add in the time2 column?
    alter ego time1 time2
    a b 1 ?
    c d 2 5
    a d 3 6

    Thank you a lot in advance

  13. Thanks Manuel!
    Generally, there are many ways to describe tie dynamics into the visualizations.
    a) You can define (in time_initial and time_final) the times where each tie appears and disappears and assume that the tie is present between those instants. Thus, the weight of the tie in the visualization is given by

    E(g)$weight < - ifelse( ti > E(g)$time_initial & E(g)$time_final > ti, 1,0)

    Thus you need a list of ties and their initial and final times to produce the simulation.
    b) Or, you can define the list of times in which a given interaction within the tie happens and show the tie in the visualization for each interaction time. But you might get better results if you assume that the interaction is not instantaneous and stays there for a while. For example, you may say that the interaction has a duration of dt units of time. Then the weights of the ties should be

    E(g)$weight < - ifelse( ti > E(g)$time & (E(g)$time +dt) > ti, 1,0)

    I guess your question is on the a) part and then what I would use is a time_final for the a < -> b link which is below 2. That is
    a b 1 1.99
    c d 2 5
    a d 3 6

  14. Pingback: Más sobre identificación de influyentes y líderes de opinión | Netnografía – Reputación y Clima de Opinión Online, Consultoria de Comunicación Online – Miguel del Fresno

  15. Pingback: Are you a social keeper or a social explorer? | Implicit None

  16. Pingback: Análisis de redes sociales para comunicación, marketing y relaciones públicas | Netnografía – Reputación y Clima de Opinión Online, Consultoria de Comunicación Online – Miguel del Fresno

  17. Hi Esteban!

    This is awesome! I’m new to the data world, and am loving it. I’ve tried to recreate this process with my own set of data but I am incredibly unfamiliar with ffmpeg and that’s where my snag is.

    I’ve opened a stack overflow issue with it but have yet to get any feedback. Any ideas? (not even sure where the PNG files need to be!)

    http://stackoverflow.com/questions/16555246/ffmpeg-png-mp4-error-opening-input-files-invalid-argument

    Thank!

    Max

  18. Thanks Max

    If you have run the script, in the R working directory you should have a number of files named example001.png, example002.png, etc.

    Go to the terminal and navigate to that working directory and run the ffmpeg script where your example$$$.png files are.

    Two things:
    a) In your stackflow question you have “example%01d.png” while it should be “example%03d.png”
    b) You can get your working directory in R by typing “getwd()”

    Hope it helps

  19. Got it! Thanks so much for everything you are providing for the community. Your blog is an inspiration!

  20. Hi Steban!
    You are a genious! I made a couple of changes to the code based on your suggestion. I added a new colums called dt1
    In this case ab have a relationship that lasts for 2 times
    c and d end till the end of the observation, same for e and f.
    Within the loop i have
    E(g)$weight E(g)$time & (E(g)$time +dt1) > ti, 1,0)
    E(g)$color E(g)$time & (E(g)$time +dt1) > ti, “gray”,rgb(0,0,0,0))
    The last one has to be there if not, whenever one of the actors apprears again, the link between a and b reapears in the network!
    E A T dt1
    a b 1 2
    c d 2 2
    e f 3 1

    Muchas gracias de nuevo, Esteban!

  21. I did it!

    Here is my visualization: https://vimeo.com/66442839

    I’m still learning, but I tried to take a relatively simple dataset: a twitter account over the course of a week, with “id1″ in sequence from 1:100 to identify the tweets, “id2″ the number of retweets and “time” the days of the week.

    I started my “time” at one and made more frames for a longer video. Fascinated with emergent systems, and this is a great way for me to get my mind around it!

    thanks a billion!

    max kramer
    New School University, MA Media Studies

  22. Hi Esteban,

    This is superb!
    Quick question: How are you producing the input file “edges.csv” ? And do you have samples of it?
    Also – would it work if instead of numerical time “t”, you have Unix timestamp (or some other version of time)?

    Thanks!

  23. Hi Atul
    Thanks for you heads up!
    The edges.csv was generating using the preferential attachment mechanism.
    Regarding the unix time, yes! it will work. Look above for my answer to another person regarding this!
    Best

  24. Pingback: Hisse Senedinde Bir Gün | Erisken's Blog

  25. Thank you very much, Esteban!
    really impressive your twitter graph!
    Would it be possible to sum up the edges, that is while one link exists between the two nodes, another link appears between the same nodes and they sum up to a thicker link?
    Thanks a lot!
    Dilyara

  26. Hi Esteban,

    Congratulations for your work. It’s really amazing.

    I run the code using my own dataset but I have some problems to produce a video file output.mp4 in the mpeg format.

    Here’s the error:

    “Optio b (video bitrate(please us -b:v)) cannot be applied to input file example%03d.png — you are trying to applyan input option to an output file or vice versa. Move this option before the file it belongs to.
    Error parsing options for input file example%03d.png.
    Error opening input files: Error number -22 occurred.”

    Can you help me? How I solve this problem.

    Thanks in advance.

    João Carvalho

  27. Hello Steban,

    Congratulations for this great job.
    I would like some help with ffmpeg. I placed all the files of ffmpeg folder in the folder that i have my png files. I am running the ff -prompt and i put the line that you suggest.
    I get this error back:
    Error parsing options for input file example%03d.png
    Eroor opening input files: Error number -22 occured

    Could you please help me with this or give some analytical info for the part about exporting.

    Thanks in advance

  28. Hi professor Steban,

    Great work, congratulations.

    I am studying how vertices degree evolve along the time, so i am using a list of contacts such as below: (time and duration are express in seconds). I am also trying to figure how to fit this evolution in a power law distribution. Do you have any glue of how to do it in R ?

    time – node1 – node2 – duration
    0 0 1 100
    100 0 2 100
    400 1 3 100
    500 1 4 100
    810 2 6 100
    900 1 5 100
    920 5 2 100
    950 3 8 100

  29. Thank you so much Esteban. I have been looking a lot of places for a thorough introduction to dynamic network visualization and finally here it is.

Leave a Reply

Your email address will not be published.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>