Using Data, Networks and Complexity to Study Trade, Aid, Economics

When you go into a career in research, you imagine that you’ll spend all your time thinking, reading and coming to know. In fact, whenever anyone talks to you about your job, it’s clear that that’s exactly what they think you do all day: think, learn, gain wisdom.

My experience of my PhD so far has been pretty different to that imagined paradise of dressing-gown-sporting chin scratching. In actual fact, the general pattern has been something like this:

Step 1

Have a meeting with a colleague for an hour about how your super-interesting model should work. Bash through the details, make the regrettable but inevitable simplifications, write down some maths.

Step 2

Spend 22 months configuring UNIX servers; building web interfaces; learning to code in Python, R, Latex and Javascript; setting up a Postgres database and sweating over the various command line tools that come with it; configuring Latex installations; picking integrated development environments, network visualisation software and text editors; learning the intricacies and idiosyncrasies of Web frameworks, Latex packages, Python packages, frameworks for testing, tools for documentation and even blogging sites (you know who you are,, all in a desparate bid to get the results of step 1 working, interactive and documented.

Step 3

Write down the maths you discussed 22 months ago, once in Python, and once again in Latex. This is a five-minute job, depending on how thoroughly you completed step 2.

Step 4

Realise that despite the fact that step 2 is where for all practical purposes all the chin scratching and coming-to-know has taken place, there’s to be no credit given for it whatsoever. The PhD is earned (or otherwise) by the one hour of step 1 you did, and you did that so long ago, that you can’t remember what any of it means anyway.

No, there is nothing gained from those 22 months of toil but the skills you learned in the process.

This is not quite what people have in mind when they talk about learning for its own sake. They’re presumably talking about coming to know a subject for the pure joy, or adding to the sum of human knowledge because that’s a noble aim in itself.

What’s happened to me feels much more like a self-study tutorial in becoming a whizz coder, engineer and software developer. Which is great in itself. But it’s not quite what I signed up for somehow.

I’ve been modelling the interconnected nature of the global economy by simulating a reduction in demand for various sectors in various countries. It’s a very simple little piece of analysis:

What would happen if the demand for a given sector in a given country was reduced by a single US dollar?

In answering this question for every sector in every country in the model, you can get a sense of which sectors have the biggest impact on the global economy. Basically you reduce the demand for each sector by a dollar and watch what happens to the rest of the world.

Unexpectedly, perhaps, this most-important sector is the vehicles sector in China. If demand for vehicles dropped by a single dollar, an unbelievable $98 would be lost in terms of global production. This is a truly astonishing conclusion.

So where does this $98 dollars come from? Well, the interconnectedness of the global economy is behind the magnitude of the number. In short, not only do sectors which feed the Chinese vehicle sector suffer, but all the sectors which feed those sectors and so on through the network that is the global economy. And a hint of how complex the picture is, is given by this image (click for full size):

Each circle is a sector in a certain country. The lines between the sectors represent changes in trade between them due to the $1 reduction in demand for Chinese vehicles. The sectors are sized according to how affected they are by the change. (Note for technical types only: they are sized proportional to their eigenvector centrality.)

It goes to show how interconnected the global economy really is. This small change in China has knock-on effects for the US, Japan, Korea, Germany, Italy, the Netherlands… the list goes on and on.

In the early 1960s two American geography professors, John Nystuen and Michael Dacy, were working on a way to make sense of a huge database of telephone records in Washington state. Clearly the majority of calls were being made either to or from Seattle, the state’s largest city, but they suspected there was more underlying structure to the pattern of calls. They pioneered a simple, but powerful, way of treating the calls going in and out of each urban area in the state as a network, then using this network representation to extract patterns from the data.

An extract from the original 1961 paper shows how simple the idea is, and also how old the typewritten paper looks now!

The data on how many calls were made from one city to another is arranged in a grid. This is exactly the same idea as those tables of distances you get in road atlases. (Look! These things still exist!) You look up your “from city” in the column, and go across to the “to city”. The number you arrive at is the number of calls made from the “from city” to the “to city”. Nystuen and Dacy simply looked at the largest call flows out of each city. Where a city’s largest flow was to a city smaller than itself, that city was deemed to be a node: a kind of ultimate ‘destination’ of calls.

They used this data to produce a simple network diagram of all the calls in the area, boiling a whole table of numbers down to a few simple relationships drawn with arrows between cities:

They went on to extend this idea to include indirect as well as direct flows. For example, if people in Bray, tend to call people in Maidenhead who tend to call people in London, (see this map for an illustration of this made-up example) then London should get some of the ‘credit’ for the Bray to Maidenhead calls too.

I’ve taken this simple idea and applied it to the enormous network of goods and services trade that we’re building here at CASA in London. The results are pretty interesting and make some kind of sense of an otherwise tangled mess of flows within and between countries.

Here’s the UK economy, where the size of the circle is given by the number of ‘in’ connections:

It’s fun to see how metals flows to machinery, which flows to vehicles, vehicle trade and finally to the hospitality industry. It’s exactly this kind of chain of relationships that Nystuen and Dacy were hoping to reveal in their original study. Also the wood/minerals to construction to financial services relationship is interesting. Overall, the UK economy can be seen to be hugely focused around hospitality (bars, hotels, tourism, cafes etc.) and financial services, with all other sectors being subservient to one or other of these two.

Here’s the same picture for the US. You can see that there are many more separate networks, suggesting that the US is less reliant on a small number of sectors. (Don’t be fooled by the small circles for fuel and chemicals: it doesn’t mean these sectors are small, just that few other sectors have these as their nodes.)


Interesting to note here that leather is differently in each of the three examples we’ve seen so far. It’s used in the hospitality industry (chairs?) in the UK, in vehicles in the US and for textiles in China.

Finally, here’s Japan:

What’s also interesting is to look at the connections between sectors worldwide. Here, each country is in a different colour, and it’s clear that most countries exist in their own clusters.

In the whole galaxy of trade that flows between sectors in a country and countries in the world, there are only two clusters of inter-related countries. They are Korea and Indonesia, and Canada, Mexico and the US. By this measure at least, these latter three countries seem to act as a single country in a way that none of the countries in the EU do, for example:

There’s clearly tons to explore here, and this is just using nothing but a simple network analysis from 1961! There will be far more interesting and modern analyses on this blog in the near future, which will be more subtle in helping us pick out clusters between countries and within countries. There may even be something to say on the clusters of trade routes which are most important to global production. All this and more to follow

Macroeconomics is one of those disciplines where the ideas are simple, but the lingo is complicated.

Paul Krugman, in his New York Times blog, is usually great at communicating the ideas of Macroeconomics in a human-friendly way.

But sometimes the language gets ahead of Krugman—sadly, most obviously when the ideas he’s expressing are important and deep. This post is a perfect example. The ideas are incredibly important for understanding what governments are doing or not doing to manage the financial situation, but without some specialist knowledge, it’s pretty hard to understand.

Here, I’ve written a summary of the stuff you’ll need to know to understand Krugman’s excellent writing, and get a sense of the deep and subtle ideas he’s discussing.

It’s not particularly short: there’ll be plenty of “too long didn’t read”s I suspect, but for those who are keen to understand what on earth is going on with macroeconomics—inflation, interest rates and all that—I think it’ll be worth the while.

Here it is:
disclaimer: this is just my own understanding of the situation. Don’t use this summary to actually run an economy, because there’s a chance that parts of it are wrong or oversimplified. If you’re an actual central banker, best get a proper qualification on the subject before fiddling with any knobs or levers.

Rob Levy’s User-Friendly Guide to Macroeconomics

or, how to understand Paul Krugman and his economist pals


A bank traditionally earns a profit by taking the money which savers have deposited with it, called simply deposits, and lending it back out to people who are looking for a loan. These borrowers might be after a mortgage, or after funds to invest in a business venture.

The banks encourage people to deposit their money by offering them interest. In order for the whole thing to make a profit, they charge more interest to the borrowers than they pay to the savers.
A bank has a legal obligation to keep a certain amount of its deposits in ‘cash’, and is free to lend out as much of the rest as they fancy. We’ll call the amount of cash they have to keep a “cash cushion”, because its designed to stop them running out of money if all the depositors suddenly want their savings back at the same time.

The relationship between a central bank and normal banks is exactly the same as between banks and customers: banks stash excess deposits with the central bank and earn an interest rate. Banks even borrow money from the central bank, for which they’re charged an interest rate. It’s this last interest rate which determines how much of a bank’s deposits it want to lend out and how much it wants to cling onto.

The setting of this interest rate is called ‘monetary policy’ and it’s ‘loose’ monetary policy when the interest rate is low.

When monetary policy is loose, banks are keen to dish loads of their deposits out as mortgages and business loans, because they can borrow for cheap from the central bank to keep their cash cushion at the right level. But if the interest rate is high, the bank will be more tempted to keep hold of its deposits because it doesn’t want to have to borrow to maintain its cash cushion.
When a bank is doing lots of lending, lots of business investment takes place and lots of houses are bought. These things are (usually) good for the economy so the central bank wants to encourage lending.

But there are limits: if the banks are so keen to lend that they’ll offer mortgages and business loans at ridiculously low prices, then people will start buying homes faster than they can be built, or expanding their businesses faster than they can expand their customer base. In this case, inflation sets in: prices rise because businesses have to pass on the cost of all this wasted investment to consumers. It’s this kind of mania for house-buying and business investment that economists refer to as the economy “overheating”.

So, the central bank, via the interest rate it charges to banks, can control how much lending the banks do. It therefore has to set a balance between too much lending (overheating) and not enough (recession); both are bad for the economy. What it really wants to do is set the interest rate to just the right level, such that there is just enough investment in businesses to continue to match a possibly growing demand. This is the so-called ‘natural’ interest rate.

But there are limits to what the central bank can do. If banks have some reason to feel negative about the future, they won’t want to lend money however cheaply they can borrow it from the central bank. In this case, the central banks interest rate could get down to zero (at which point the banks can borrow money ‘for free’) and the banks still won’t take the bait. Once interest rates are at zero, that’s it. The central bank is out of options! This is what Krugman refers to as the ‘zero lower bound’.

With this knowledge in your armoury, go and read “Secular Stagnation, Coalmines, Bubbles and Larry Summers”. It’s both well-written and important to understand. Good luck!

Months of blood, sweat and tears (or rather, those all-too modern equivalents coffee, RSI and eye-strain) have been spilled in the last six months. I have an absolutely non-existent ability to concentrate on more than one thing at a time, and the one thing I’ve been thinking, dreaming and going on about non-stop for pretty much all that time reached fruition late last night.

This is the first ever output from what we’re calling a ‘Global Demonstration Model’:

Modelled trade flows and economy shapes for the three countries, based on 2008 input-output data and 2010 trade data

Modelled trade flows and economy shapes for the three countries, based on 2008 input-output data and 2010 trade data

The picture above doesn’t really make much sense yet, in that it’s pretty hard to tell what’s going on and some of the colours are repeated, but it gives you a sense of the sheer magnitude of the model I’ve been building. It shows the economies of India, the UK and the US and the trade flows between them. The economy shapes are from 2008, and the trade flows are from 2010.

There’s a proper paper coming out soon (as if I were an actual academic) but before then, and for the reader who has no interest in reading a literature review, here’s an sketch of the Global Demonstration Model in human-readable terms:

  • Countries are represented by data on the shape of their economies, as split into economic sectors by the World Input-Output Database.
  • Trade between countries is modelled from actual trade data from the UN.
  • Countries are assumed to trade with one another in fixed proportions; for example, the UK gets around 12% of its agricultural imports from the Netherlands. This percentage then remains fixed when exploring the model. (Since you ask, the 12% is mostly tomatoes, flowers, onions, peppers and cucumbers.)
  • Each sector is imported in a fixed amount in each country. For example, the US only imports 11% of its total fuel requirements, against the UK’s 45%. These ratios also don’t change.

These assumptions allow us to ‘mess about’ with the world as we see it today, and test how sensitive the global economy is to particular changes, or how the world might have looked if trade patterns had been different.

And this is just the beginning: we’re going to be putting several more social science models around this trade-based core to model the effects of, for example, migration on the global economy.

I’ll be working on the diagram to make it (a) more readable, and (b) more interactive, and we’ll be producing some interesting analysis based on the model pretty soon.

Watch this space…

The great thing about having access to the entirety of the UN’s commodity trade database is that you can ask any kind of question you wish of the data.

For example, here’s a network representation of trade flows throughout the entire world in agricultural products in 2010.

COMTRADE 2010 agriculture sector

A network representation of global trade in agricultural products in 2010. Country size and colour is the exporter-ness of the country (purple = very exporter-y)

Bigger, more purple nodes are the ones with the biggest average export (simply the total export divided by the number of trading partners. Network scientists insist on calling this the ‘Weighted Out-Degree’. Don’t blame me.) When viewed from inside a country’s circle facing towards a particular trading partner exports curve out to the left, meaning, conversely, that imports curve in from the right. The colour of the lines is an average of the colours of the two countries.

Most interesting to note, are the following features:

  • The USA is by a huge margin the world’s biggest exporter of agricultural products ($67 billion compared to second-place Brazil’s $29 billion. The biggest products are soya, maize, cotton and wheat.)
  • China is by a huge margin the world’s biggest importer ($56 billion compared to second-place USA’s $35 billion. Since you ask, it imports soya beans from the US, Brazil and Argentina, cotton from the US and India, and wood from Russia. Now you know.)
  • Only Canada, Mexico, Ireland, the Netherlands (NLD) and Morocco (MAR) have a visually obvious trade balance. All other countries are either clear net importers or clear net exporters.

Also very interesting—but this ones requires a bit more concentrated looking if you’re not willing to take my word for it—you can definitely see geographic clusters. Look at the star around Japan (JPN); it includes Thailand, Vietnam, Australia, Indonesia and the Philipines. Europe is very clearly at the bottom (note that it’s purely fortuitous that Britain and Ireland have ended up on the Western fringe of Europe, but it’s not chance that they’re together on a fringe.) Israel (ISR) sits at the border of Europe and the Middle East/North Africa (Egypt, Syria, Jordan, and Tunisia are all nearby.)

The products which are categorised as ‘agricultural’ are according to the categorisation used by the World Input-Output Database (WIOD), the subject of at least one blog post here. The trade flow data is from a massive (200+ million rows) database of trade flows which I’ve spent the last hundred and fifty years assembling from the UN’s commodity trade database, COMTRADE. The visualisation is from a piece of open-source software called Gephi with a heavily-tweaked Force Atlas 2 layout.

This is just one of an unimaginable number of interesting analyses and visualisations I’ll be able to do, now that I’ve got my very own copy of UN’s COMTRADE database to play with and ask questions of as I see fit. If you’d like to see any interesting analyses in future blog posts, let me know on Twitter @aid_complexity.

The fun is only just beginning…

Big data, yeah? It’s great isn’t it? Doesn’t everyone just love to have loads of big data all over the place?

Got 30 million customers in the UK, have you? Each of those customers purchasing thousands of products a year, yeah? Screw it, lets just store ALL that information in a massive database. It’s big data innit? It’s what people do now.

Well I’m sick of it. Regular readers will know that I’m currently in the process of trying to gather trade data from the UN. It’s of the format “we sold this much soap to this country in this year”. Sounds simple, right?

Well it is. But it’s also big. There are around 200 reporting countries, reporting trade with one another, in over 3,000 product categories across fifteen years. This makes the final database somewhere in the region of 150 million rows long. It’s big, and it’s slow, and it’s incredibly painful to deal with.

By way of an example, let me introduce you to a painful problem which has bedevilled me these past few days: due to some kind of wierdness with the import process, some countries’ data ended up with an equals sign at the start of their product codes. So instead of product code “101305″ they had “=101305″. I can’t even remember now how this happened, but it’s to do with the fact that the data sets are so large, that they could only be opened in certain pieces of software, one of which has obviously had this wierd side-effect. The affected countries are Japan, Brazil, China, India, Russia and Mexico. So, some nice small countries then. This means that 20 millions pieces of data need an equals sign removing. Sounds easy right?

The process to get rid of those equals signs started yesterday evening, and was still running this lunchtime, a full eighteen hours after it started.

This is not tenable. This is not big data. This is just a big ball-ache.

Similarly to many branches of statistics-gathering, the world’s trade statistics bureaux lack, in their communication style, a certain panache. The writings of such agencies are characterised by a complete absence of zing, lightness-of-touch and joie de vivre. I’ve blogged before about horrific diagrams like that shown here, and how the whole enterprise of gathering information about global trade is inaccessible and unpleasant.

So it gave me an extra tickle, to find a rare example of humour in a working paper from the National Bureau of Economic Research called “World Trade Flows: 1962-2000″.

In the paper, they present a number of databases of world trade flows from a series of years between 1962 and 2000. Blind or indifferent to the fears of the “Millenium Bug“, they use a two-digit code to represent the particular year. Let me recap: it’s a database of World Trade Flows at a given two-digit year we’ll generically call “??”.

The result can be seen here on p48 of the report. Fantastic stuff…
Feenstra et al 2005

Following on from my last post all about how to visualise networks when the connections go both ways (and when there might be more than one connection between the same two nodes), this is just a quick update to say that I’ve solved the problem in what I think is quite an exciting way.

I’ll do a blog post explaining the ins and outs of the technique very soon, but first let me tell you that ‘the big reveal’ will occur during a talk at the CASA Conference (#casacities) by me and my colleague Thomas which is going be live streamed here, from 3.30pm London time tomorrow (Friday 27th September). I’m pretty excited/nervous about it because it’s happening at the Barbican Centre in the large cinema number 1. There’ll also be a second room watching the talk live, and obviously all the people watching online.

We’ll be talking about a model we’ve built which utilises terrifying amounts of trade data from the UN, and descriptions of countries’ economies from the EU, to track goods and services as they move around the world, being traded, consumed and converted and following the gain in value as the whole process happens. We’ll be debuting the new visualisation tool at the talk too, so there should be something for everyone!

See you there…

Networks: aren’t they great? The sexiest modelling paradigm around at the moment and there are no shortage of social science researchers itching to jump on the bandwagon.

Never one to drag my heels, I blogged last week about the attempts by me and my colleagues to bring network science into Economics, and included a fancy graphic to demonstrate how visualising networks can look pretty, and potentially be informative about systems with complex interconnections.

An imagined network of three countries, Red, Blue and Green, using three products, A, B and C internally as intermediate inputs to the production process, and also trading these products with one another.

An imagined network of three countries, Red, Blue and Green, using three products, A, B and C internally as intermediate inputs to the production process, and also trading these products with one another.

But the image I included was static, prepared in a piece of open source network analysis software called Gephi (it’s one of those pieces of software that everybody hates, everybody uses, but no one understands). The natural extension to this is an interactive network diagram. Imagine if we could play with the network shown in that picture. How cool would it be to be able to drag the nodes around to see how the network responds?

Well, there is a way; and, in fact, it’s been done many times before. This cool-looking interactive visualisation is by web-visualisation guru Mike Bostock. The guy brings together insane technical skillz (he seems single-handedly to have written the popular javascript visualisation library d3) with an eye for beautiful design that leads to some of the most breath-taking infographics on the web.

His network visualisation uses something called a force-directed graph in which physics equations are used to determine the behaviour of a network. The nodes (drawn as circles) repel each other like charged particles, and the links between the circles act like springs, pulling the nodes back together. This leads to a balanced state where the nodes are as far apart from each other as they can be, under the constraint that they’re attached together with springs of varying strength.

The network shown in Mike Bostock’s example is pretty simple, but it struck me as a great way to visualise my network of networks. Here’s an example. This is Great Britain’s economy in 2009. Each circle is a sector of the economy, and a link between two sectors shows the extent to which one sector sold goods to the other in that year. For simplicity, most of the smaller links have been filtered out (otherwise, the whole thing is a tangled mess!)

This is great: the sectors are circles, with the bigger circles being the bigger sectors overall, and the connections between the circles being the value of the goods sold from one sector to another. The thicker the line, the more goods were sold.

But there’s a key piece of information missing from this way of viewing the network: the flows between sectors have direction, that is to say, it matters that sector A sold £100 worth of stuff to sector B, rather than the other way around. So how to visualise the network in a way that emphasises the directionality of the links as well as the size?

We could try putting arrows on the ends of the links, right? Mike Bostock has thought of this already of course, and has a simple example here. But the problem is that the circles in his example are a fixed size. If the circles were bigger, the arrows would get hidden underneath them. How to place the arrows when the circles are all different sizes and the line connecting them is ‘bendy’ is an ‘unpleasant’ maths problem.

How to place an arrow when circle sizes differ

I wrestled with putting arrows on the lines for a while before abandoning the project altogether. Then, after some skillful Googling (as vital to the 21st century citizen as reading and writing was to citizens of previous centuries) I came across this from Mike Bostock’s website:

Making a gradient follow a path

With this idea, I could make each end of the links a different colour with, say, red being the seller’s end, and green being the buyer’s end. On a very small subset of my UK 2009 economy network, things seem to work pretty well:

but the computational overhead is massive. Each line in this network is really a group of around 30 little pieces of line, each with its own colour, creating the effect of a smooth transition from green to red. That means that the browser has to work much harder than it otherwise would have to. This approaches scales very poorly. Here’s a slightly more filled out network (these videos are real-time captures of my browser’s output):

Although the network is still a tiny fraction of the complete picture, things are already starting to slow down. Finally, just to really push things to the limit, here’s the network as shown in the very first video in this post. As you can see, although the resulting network looks “pretty cool” (for which read, mind-bogglingly complex) my browser has basically ceased to function. It takes around ten seconds to process each frame of the animation.

So it looks like the colouring of the links is not workable. Watch this space for more updates as I try different methods for showing a big network with directed links.