Using Data, Networks and Complexity to Study Trade, Aid, Economics

Anyone who has every applied for a job at the UN will feel a sweat break out on the back of their neck at the mention of the 90’s-style faux-Latin-named job application site “Inspira”. (Remember when things were all called “Exceptimus” and “Ignitor” and stuff? Ugh.)

Anyway, the site’s been down for a few days for modernisation, and the new improved site came back up on Friday. (They had a FOUR DAY outage! This alone is enough to make me suspect that the people who designed the website didn’t know what they were doing. Outages should measured in minutes or hours, not days.)

Given my experience of large organisations and their ability to procure IT which isn’t grossly overpriced and absolutely terrible (for which see Britain’s NHS, Barclays’ several iterations of terrible online banking websites, any government department in the world ever) it’s no surprise to see that the fundamentally broken Inspira has been given little more than a lick of paint.

Still present are the absence of a “Back” functionality, because the site doesn’t use URLs to control location, visible meaningless Javascript all over the place (hovering over a link to my job opening reveals “javascript:submitAction_win4(document.win4, ‘HRS_CE_WRK_HRS_JOB_LINK$21$$0′” which gives just a hint of the horrors people meeting the website’s code for the first time must experience), Windows 98-style icons and the overall user-experience and look and feel of being dragged by chain up a muddy slope by a slow moving tractor.

But my personal favourite is the built-in spell-check function. (Note to developers: people don’t develop their own spell-check functions any more. Browsers do this now.) My entire cover letter passed the spell-check with flying colours bar one exception: It suggests I replace the word “in” in the opening sentence with the admittedly more emphatic, but perhaps overzealous “IN”.

I’m not even joking.

The UN Inspira spellcheck suggesting "IN" as a replacement for the word "in"

People talk a lot about how development aid might be used to improve a country’s attractiveness as a trade partner. (Mostly the World Trade Organisation, but not exclusively!)

“Aid for Trade” is a controversial project because it has a distinctly globalisation-friendly vibe about it, and a fundamental belief in the kind of trickle-down economics so beloved of market-oriented people and organisations.

But one thing that is never discussed when the possibility is raised of improving a country’s export competitiveness, is that in the absence of additional global demand, any increase in export due to an Aid for Trade programme must be accompanied by a reduction in exports for somebody else.

With the global economic model I’ve built as part of my PhD (and some fabulously bold assumptions about how trade works), I can have a stab at modelling which countries stand to gain and which to lose from a particular Aid for Trade project.

This picture, drawn using ESRI’s flashy new ArcGIS Pro shows the modelled results of improving Ethiopia’s export infrastructure. (It’s “inspired” (for which read, pinched) from this nice flight paths visualisation.)

Flows which increase are in blue, and those which decrease are in red.


Here are the boring but very necessary caveats:
– Only those between African countries are shown.
– Because the increased flows are much bigger in magnitude than the decreased ones (at least within Africa) I’ve had to compromise on the line thicknesses, leading to an overstatement of the decreases!! Caveat emptor!!
– This is based on a gravity-type trade model. Their use in predicting trade is controversial.
– The economic descriptions used in this model are based on estimates, since most African countries don’t publish the kind of economic data you’d need to build a proper description.

Everybody loves a good map projection, none more so that the nerds here at UCL’s CASA.

I made a little toy visualisation of survey responses per global region for a pal of mine at Kings College, but knew he’d be unhappy with my choice of projection. So I decided to take the decision out of my hands and give control to the user.

The result is this fun little way of playing with map projections, relishing the smooth animations from one projection to the next. Go on, pick your favourite!


mercator transverse

The uncomprehending, blinking gaze: this is the default response when I tell people I’m “into” data.

It’s like being into electrical wiring, or urban sewer systems – yes, we’re glad they’re there, and yes, we’re certainly glad they work as expected, but yes, aren’t we also rather glad it’s someone else who has to worry about them and not us?

Well, I can see where these people are coming from. Data, as I’ve said on this blog before, looks like this:
Screenshot from 2015-10-21 15:06:59
That’s about as boring as it’s possible for something to look I’d say.

So it’s little wonder perhaps that no one knows, cares or thinks about data in this world of exciting stuff to see and do.

Screenshot from

But data is the one abstract concept that could feasibly be said to run the world. It’s being gathered in every imaginable context, from your pocket, to the furthest reaches of the solar system, and it’s being used to make decisions on how we deal with subjects ranging from refugees to pirates, to city planning to arms trading.

On Thursday of next week, I’ll be giving a lunchtime lecture here at UCL on the subject of data, why it’s informative, when it’s misleading and why on earth I love it so much. It’ll have examples of data visualisation so beautiful they’ll make you want to quit your job, and examples of the misuse of data so scurrilous they’ll make you wish other people would quit theirs. It doesn’t get more exciting than that…

A model of the global economy is, by its very nature, an unwieldy object to work with. There are 40 countries (we want more; that’s coming next) and the economy of each country is described by the economic activity of 35 sectors.

Each sector in each country interacts with each other sector in each other country creating close to two million interactions.

This is great for wowing potential users of the model with the sheer scale and size of thing, but it makes life pretty hard if you want to ask a question like “what effect has a certain change had on… well, everything?”

This is hard because “everything” here encompasses two million numbers some of which will have gone up and others of which will have gone down.

If you don’t put any effort into visualisation, the output of the model looks absolutely horrible:

Output of a World Input-Output Table

Needless to say, picking interesting information out of such a mass of numbers involves some careful thought. (For the interested, what you’re seeing here is dollar-valued commodity flows between sectors within the Australian economy, the sectors being numbered 1 to 35.)

The paper I’m writing at the moment asks an even trickier question than “what’s going on?”. I’m trying to work out how our model compares with other, more standard, ways of doing this kind of thing. This means making the same change in two models and comparing the results.

One way to boil down lots of information into a far smaller number of ‘things’ is to rank the numbers you’re analysing. This just means putting the numbers into order then saying which number is biggest, which is second-biggest etc.

So in our case, if we make a change to the global economy, instead of looking at a horrifying table of numbers we can just say “Australia was the country most affected by the change. Netherlands was second, Spain tenth, Bulgaria 39th…” and so on.

The advantage to this approach is that, when comparing the results of two models, you can just compare the ranks of the countries and see if they’re similar. If they are, you might be justified in concluding that the models are doing more-or-less the same thing.

It also allows for some nice visualisation. If we write down all the countries in one column in the order of their rank (most-affected by some change we’ve made, to least-affected) using one model, and make a second column where the countries are ordered according to their rank using the other model, we can quickly see where the differences are, particularly if we draw nice lines between the countries to show how their position has changed.

Here’s the outcome of such an experiment:

The design for this visualisation was inspired by a similar thing in the work of Hidalgo and Hausmann, see here on p4!

It shows the results of reducing demand for Chinese vehicles by $1M on the global economy in 2010. The left-hand column shows the results using a traditional model (for the interested: it’s called a Multi-Region Input-Output model, or MRIO). The most-affected countries are at the top and the least-affected at the bottom. The right-hand column is the same but for our model.

With the exception of Slovakia, the results look pretty good. The ranks are generally pretty similar which is encouraging. We’re currently trying to find out what’s going on with Slovakia, and I’ll post here if we ever find out!

(Note that Taiwan is not in our model, because the UN doesn’t report trade data for it, as it deems it to be a part of China. I won’t be delving into this international controversy here!)

Anyone who’s done any work in the vicinity of network science or, more specifically, seen social scientists attempting network science, will have seen plenty of images such as this:


Taken from Fagiolo and Mastrorillo (2013)

or this:


Taken from Adamic and Glance (2005)

which add little scientific input, but merely dazzle the reader with the complexity and sheer magnitude of the networks being analysed. At a recent talk in Cambridge I heard the legendary Mark Newman refer to these network spaghetti-servings as “ridiculograms”. I know researchers who have been asked specifically to produce such diagrams to impress the difficulty of their project onto an adoring lay-crowd.

Well, I’m hoping to do a little more with a network visualisation. It’s not because I’m better than the people who made those diagrams (apologies to the researchers in question; I’m not meaning to be rude.) but just because I’ve got a lot of time to dedicate to painfully learning the tools we have available for turning spaghetti into scientific knowledge. (Caution: this pasta metaphor may have overstepped its usefulness.)

The tool of choice for the modern networks visualiser is called Gephi and it really is one of the n wonders of the open-source software world. Which makes me feel guilty about saying that I hate it, but I do. It’s amazing and brilliant but, at least on Windows 64-bit, it’s buggy as hell, and does all sorts of crazy stuff when you’re least expecting it*. Christ knows that if it were up to me to develop all of open-source, the world would be a much, much worse place.

But here’s my friendly guide to untangling that ridiculogram step-by-step, without losing too much sleep in the process:

Here’s how every network visualisation you’ve ever seen has started life. This is what you get when you import your graph into Gephi:
chinese vehicles step 1
This particular graph happens to be a version of a trade network. Each node is an economic sector in a particular country, and the edges (lines) represent the size of the flow between sectors. The truth is a bit more detailed than that. Actually this graph shows the response of the trade network to a $1M reduction in demand for Chinese vehicles.

The first thing to do is lay those bad-boys out. I’ve gone for the Yifan Hu layout, because it visually separates clusters.
chinese vehicles step 2

This then lays the ground-work for the more visually pleasing “Force Atlas 2” layout. Here I’ve gone for “Dissuade Hubs”, “LinLog mode” and “Prevent Overlap” with Scaling=5.0 and Gravity=4.0:
chinese vehicles step 3

Now let’s colour the nodes by country, to see if the clusters match to countries. It looks like they broadly do (which makes sense because sectors interact most with domestic sectors):
chinese vehicles step 4
To do this in Gephi, go the the Partition tab at the top right and select Nodes. Click on the refresh button and pick the variable you want to partition by. This will set random colours to each group.

There are rather too many countries showing here, making the colours all a bit similar, and the clusters not all that clear. Let’s wrestle with Gephi filters. (This is hard, boring and severally counterintuitive.) To filter by country, go to Filters > Attributes > Equal and select ‘country’. To filter for just China, you’d simply enter CHN into the pattern box, but our life is a bit more difficult, because we want to filter in a number of countries. To do this we use the regular expression ‘or’ concept, which is a vertical bar: ‘|’. So my pattern looks like ‘CHN|JPN|DEU|USA|KOR|FRA|AUS|ITA|GBR|BRA’. Tick the Use regex box and click OK. Now click Filter and the filter will be applied:
chinese vehicles step 5

This is starting to look a bit nicer, but we need to resize the nodes to show which are the most important. I usually size by node centrality (basically a measure of how ‘important’ the node is in the network.) To do this, go to the Statistics tab, and click Run next to Eignvector Centrality. This adds the centrality of each node as a property. To set the size of the nodes, go to Ranking at the top-left, and click the red diamond which, for some reason, stands for node size. Select Eigenvector Centrality from the list and click apply:
chinese vehicles step 6

So this is ok, but the China nodes (in green) are all so much more significant that everything else is basically invisible. The node sizes can be fine-tuned using the Spline… link. I set my spline like this, which gives a some definition to all the big values and allows lots of medium values to come through:
chinese vehicles step 7

Resulting in:
chinese vehicles step 8
which looks much better.

Now time to label the nodes. An almost invisible button at the bottom-right of the screen is actually an up-arrow behind which hides the labelling dialog. (Note that this is definitely the single-worst piece of UI design I’ve ever seen.) If you’re lucky enough to find this, set node labels on and adjust the size slider until you can see them. If the attribute you want for the label isn’t already selected, click Configure… and change it in there.

We now have:
chinese vehicles step 9

We’re almost finished with the layout, but let’s space the clusters out a bit, so we can see what’s going on within countries. The ‘Noverlap’ layout isn’t installed by default, but you can install it easily from the Plugins menu. This is really useful for spreading out clusters of nodes. I ran it with a ratio of 2.0 and a margin of 20.0. I then also ran the Expansion layout followed by the Label Adjust layout. This combination of layouts seems to get everything looking peachy:
chinese vehicles step 10

Now that the layout is complete, we can filter out some of the smaller edges. Edges are great for laying out accurately, but you don’t want to see every one on the finished diagram. To add another filter to the country filter we’ve already go, we need to add it as a subfilter. More crazy counter-intuitiveness. My completed set of filters looked like this:
chinese vehicles step 11
Note that to set the range, you can double-click on the number and type it in. Saves messing with that stupid slider.

Now that the graph is nicely layed out, and filtered, time to switch to the Preview pane. This pane doesn’t redraw unless you click Refresh after every change. After selecting Nodes > Show Labels and Edges > Rescale weight, the default preview looks like this, already pretty nice:
chinese vehicles step 13

I’ve changed the font (at least on my system, you have to do this by just typing the name and size into the Font box. I’ve typed “Tahoma 12” here.) and massively up the thickness. This means that the biggest flows are ridiculously thick, but it’s a fair trade-off to get some of the smaller flows to show up too:
chinese vehicles step 14

For a few final flourishes, export your preview to an SVG, and open it in a vector-graphics editor. (I’m using the free and totally brilliant Inkscape, but feel free to pay a million pounds for Illustrator.) I’ve used the editor to add some country labels, and move a few of the sector labels to make them more readable and less cluttered. I’ve also deleted a few nodes for visual clarity’s sake. Here’s the finished product. Pretty good I reckon, and certainly a world away from the hairball-style ridiculogram we started with:
chinese vehicles step 15
Here’s a list of Gephi bugs that have had me smashing my completely innocent keyboard in frustration: 1) when you save you work, then close the application, it worryingly asks you if you want to save. (Which makes me feel that something is amiss.) An insanely large number of times, the resulting file is then corrupted somehow and won’t open. 2) Hand-wrought queries you’ve spent ages writing are not saved, so you have to make ’em from scratch every time. Ugh. 3) The export to SVG option often results in a terse little “NullPointerException” error message and no SVG is produced.

That’s it. Rant more-or-less over.

When you go into a career in research, you imagine that you’ll spend all your time thinking, reading and coming to know. In fact, whenever anyone talks to you about your job, it’s clear that that’s exactly what they think you do all day: think, learn, gain wisdom.

My experience of my PhD so far has been pretty different to that imagined paradise of dressing-gown-sporting chin scratching. In actual fact, the general pattern has been something like this:

Step 1

Have a meeting with a colleague for an hour about how your super-interesting model should work. Bash through the details, make the regrettable but inevitable simplifications, write down some maths.

Step 2

Spend 22 months configuring UNIX servers; building web interfaces; learning to code in Python, R, Latex and Javascript; setting up a Postgres database and sweating over the various command line tools that come with it; configuring Latex installations; picking integrated development environments, network visualisation software and text editors; learning the intricacies and idiosyncrasies of Web frameworks, Latex packages, Python packages, frameworks for testing, tools for documentation and even blogging sites (you know who you are,, all in a desparate bid to get the results of step 1 working, interactive and documented.

Step 3

Write down the maths you discussed 22 months ago, once in Python, and once again in Latex. This is a five-minute job, depending on how thoroughly you completed step 2.

Step 4

Realise that despite the fact that step 2 is where for all practical purposes all the chin scratching and coming-to-know has taken place, there’s to be no credit given for it whatsoever. The PhD is earned (or otherwise) by the one hour of step 1 you did, and you did that so long ago, that you can’t remember what any of it means anyway.

No, there is nothing gained from those 22 months of toil but the skills you learned in the process.

This is not quite what people have in mind when they talk about learning for its own sake. They’re presumably talking about coming to know a subject for the pure joy, or adding to the sum of human knowledge because that’s a noble aim in itself.

What’s happened to me feels much more like a self-study tutorial in becoming a whizz coder, engineer and software developer. Which is great in itself. But it’s not quite what I signed up for somehow.

I’ve been modelling the interconnected nature of the global economy by simulating a reduction in demand for various sectors in various countries. It’s a very simple little piece of analysis:

What would happen if the demand for a given sector in a given country was reduced by a single US dollar?

In answering this question for every sector in every country in the model, you can get a sense of which sectors have the biggest impact on the global economy. Basically you reduce the demand for each sector by a dollar and watch what happens to the rest of the world.

Unexpectedly, perhaps, this most-important sector is the vehicles sector in China. If demand for vehicles dropped by a single dollar, an unbelievable $98 would be lost in terms of global production. This is a truly astonishing conclusion.

So where does this $98 dollars come from? Well, the interconnectedness of the global economy is behind the magnitude of the number. In short, not only do sectors which feed the Chinese vehicle sector suffer, but all the sectors which feed those sectors and so on through the network that is the global economy. And a hint of how complex the picture is, is given by this image (click for full size):

Each circle is a sector in a certain country. The lines between the sectors represent changes in trade between them due to the $1 reduction in demand for Chinese vehicles. The sectors are sized according to how affected they are by the change. (Note for technical types only: they are sized proportional to their eigenvector centrality.)

It goes to show how interconnected the global economy really is. This small change in China has knock-on effects for the US, Japan, Korea, Germany, Italy, the Netherlands… the list goes on and on.

In the early 1960s two American geography professors, John Nystuen and Michael Dacy, were working on a way to make sense of a huge database of telephone records in Washington state. Clearly the majority of calls were being made either to or from Seattle, the state’s largest city, but they suspected there was more underlying structure to the pattern of calls. They pioneered a simple, but powerful, way of treating the calls going in and out of each urban area in the state as a network, then using this network representation to extract patterns from the data.

An extract from the original 1961 paper shows how simple the idea is, and also how old the typewritten paper looks now!

The data on how many calls were made from one city to another is arranged in a grid. This is exactly the same idea as those tables of distances you get in road atlases. (Look! These things still exist!) You look up your “from city” in the column, and go across to the “to city”. The number you arrive at is the number of calls made from the “from city” to the “to city”. Nystuen and Dacy simply looked at the largest call flows out of each city. Where a city’s largest flow was to a city smaller than itself, that city was deemed to be a node: a kind of ultimate ‘destination’ of calls.

They used this data to produce a simple network diagram of all the calls in the area, boiling a whole table of numbers down to a few simple relationships drawn with arrows between cities:

They went on to extend this idea to include indirect as well as direct flows. For example, if people in Bray, tend to call people in Maidenhead who tend to call people in London, (see this map for an illustration of this made-up example) then London should get some of the ‘credit’ for the Bray to Maidenhead calls too.

I’ve taken this simple idea and applied it to the enormous network of goods and services trade that we’re building here at CASA in London. The results are pretty interesting and make some kind of sense of an otherwise tangled mess of flows within and between countries.

Here’s the UK economy, where the size of the circle is given by the number of ‘in’ connections:

It’s fun to see how metals flows to machinery, which flows to vehicles, vehicle trade and finally to the hospitality industry. It’s exactly this kind of chain of relationships that Nystuen and Dacy were hoping to reveal in their original study. Also the wood/minerals to construction to financial services relationship is interesting. Overall, the UK economy can be seen to be hugely focused around hospitality (bars, hotels, tourism, cafes etc.) and financial services, with all other sectors being subservient to one or other of these two.

Here’s the same picture for the US. You can see that there are many more separate networks, suggesting that the US is less reliant on a small number of sectors. (Don’t be fooled by the small circles for fuel and chemicals: it doesn’t mean these sectors are small, just that few other sectors have these as their nodes.)


Interesting to note here that leather is differently in each of the three examples we’ve seen so far. It’s used in the hospitality industry (chairs?) in the UK, in vehicles in the US and for textiles in China.

Finally, here’s Japan:

What’s also interesting is to look at the connections between sectors worldwide. Here, each country is in a different colour, and it’s clear that most countries exist in their own clusters.

In the whole galaxy of trade that flows between sectors in a country and countries in the world, there are only two clusters of inter-related countries. They are Korea and Indonesia, and Canada, Mexico and the US. By this measure at least, these latter three countries seem to act as a single country in a way that none of the countries in the EU do, for example:

There’s clearly tons to explore here, and this is just using nothing but a simple network analysis from 1961! There will be far more interesting and modern analyses on this blog in the near future, which will be more subtle in helping us pick out clusters between countries and within countries. There may even be something to say on the clusters of trade routes which are most important to global production. All this and more to follow

Macroeconomics is one of those disciplines where the ideas are simple, but the lingo is complicated.

Paul Krugman, in his New York Times blog, is usually great at communicating the ideas of Macroeconomics in a human-friendly way.

But sometimes the language gets ahead of Krugman—sadly, most obviously when the ideas he’s expressing are important and deep. This post is a perfect example. The ideas are incredibly important for understanding what governments are doing or not doing to manage the financial situation, but without some specialist knowledge, it’s pretty hard to understand.

Here, I’ve written a summary of the stuff you’ll need to know to understand Krugman’s excellent writing, and get a sense of the deep and subtle ideas he’s discussing.

It’s not particularly short: there’ll be plenty of “too long didn’t read”s I suspect, but for those who are keen to understand what on earth is going on with macroeconomics—inflation, interest rates and all that—I think it’ll be worth the while.

Here it is:
disclaimer: this is just my own understanding of the situation. Don’t use this summary to actually run an economy, because there’s a chance that parts of it are wrong or oversimplified. If you’re an actual central banker, best get a proper qualification on the subject before fiddling with any knobs or levers.

Rob Levy’s User-Friendly Guide to Macroeconomics

or, how to understand Paul Krugman and his economist pals


A bank traditionally earns a profit by taking the money which savers have deposited with it, called deposits, and lending it back out to people who are looking for a loan. These borrowers might be after a mortgage, or after funds to invest in a business venture.

The banks encourage people to deposit their money by offering them interest. In order for the whole thing to make a profit, they charge more interest to the borrowers than they pay to the savers.
A bank has a legal obligation to keep a certain amount of its deposits in ‘cash’, and is free to lend out as much of the rest as they fancy. We’ll call the amount of cash they have to keep a “cash cushion”, because its designed to stop them running out of money if lots of the depositors suddenly want their savings back at the same time.

The relationship between a central bank and normal banks is exactly the same as between banks and customers: banks stash excess deposits with the central bank and earn an interest rate. Banks even borrow money from the central bank, for which they’re charged an interest rate. It’s this last interest rate which determines how much of a bank’s deposits it want to lend out and how much it wants to cling onto. We’ll see why in a second.

The setting of this interest rate is called ‘monetary policy’ and it’s ‘loose’ monetary policy when the interest rate is low.

When monetary policy is loose, banks are keen to dish loads of their deposits out as mortgages and business loans, because they can borrow for cheap from the central bank to keep their cash cushion at the right level. But if the interest rate is high, the bank will be more tempted to keep hold of its deposits because it doesn’t want to have to borrow to maintain its cash cushion.
When a bank is doing lots of lending, lots of business investment takes place and lots of houses are bought. These things are (usually) good for the economy so the central bank wants to encourage lending.

But there are limits: if the banks are so keen to lend that they’ll offer mortgages and business loans at ridiculously low prices, then people will start buying homes faster than they can be built, or expanding their businesses faster than they can expand their customer base. In this case, inflation sets in: prices rise because businesses have to pass on the cost of all this wasted investment to consumers. It’s this kind of mania for house-buying and business investment that economists refer to as the economy “overheating”.

So, the central bank, via the interest rate it charges to banks, can control how much lending the banks do. It therefore has to set a balance between too much lending (overheating) and not enough (recession); both are bad for the economy. What it really wants to do is set the interest rate to just the right level, such that there is just enough investment in businesses to continue to match demand as it ‘naturally’ grows or shrinks. This is the so-called ‘natural’ interest rate.

Natural growth in demand comes from innovation (a new smartphone model), population growth, the exploitation of natural resources, and improved manufacturing processes which means companies can makes the same products more cheaply. It’s this natural rate of growth that the central banks are trying to second-guess.

But there are limits to what the central bank can do. If banks have some reason to feel negative about the future, they won’t want to lend money however cheaply they can borrow it from the central bank. In this case, the central banks interest rate could get down to zero (at which point the banks can borrow money ‘for free’) and the banks still won’t take the bait. Once interest rates are at zero, that’s it. The central bank is out of options! This is what Krugman refers to as the ‘zero lower bound’. This whole set of circumstances is called a ‘liquidity trap’ because banks get addicting to hoarding their money (and money, when it’s cash as opposed to loans you’re owed, is called ‘liquid’: an extra layer of jargon to worry about.)

Once the central bank reaches this zero point, where they’re lending money to banks for free and still the banks won’t lend, its role as a controller of the economy is stuffed. All it can do is wait for things to improve on their own. This is clearly not what the central banks ideally want.

With this knowledge in your armoury, go and read “Secular Stagnation, Coalmines, Bubbles and Larry Summers”. It’s both well-written and important to understand. If you can get to the bit right at the end about offering a positive interest rate on all savings even when the market doesn’t really want to, you’ll be rewarded with a real “ah-ha!” moment.

Good luck!