Global Data

R-stat

Global Economy Data
D. Quah
Economics and International Development, LSE
November 2014 (Revised January 2016)
(Also here)

Unless you constantly work with just a single dataset, one of the more significant bottlenecks in empirical projects is getting the data to a form where you can interestingly query those data. This writeup describes the data I regularly use, manipulate, and need to collect (although not regularly enough that I remember every detail about them – hence the need for this document).

Here I provide R code snippets to get from a number of originating databases –Maddison Project; World Bank; Penn World Tables; IMF generally, but focusing on IMF World Economic Outlook; Polity IV; inequality; author-provided data from key published papers; and so on – to where I can then ask the questions I’m interested in. My projects then pretty much always start by my returning first to this writeup and just copying R code from out of it.

By its nature this writeup is never finished. When I encounter interesting data that are not one-offs, but that I will be using consistently, their management then appears here.

Obviously, not everyone will work the way I do and not everyone will want to use these same data. But I hope the combination of R, knitr, and ideas fromliterate programming might help others similarly concerned about presenting their empirical work in a way easier for others to replicate and reproduce.

Setting up

I will consider these datasets in turn in the following sections. First, however, I load for subsequent use a collection of handy R libraries. If you just want to know about the data, you can skip the remainder of this section and just head on over to the section describing the dataset you want to know more about.

library(gdata)
library(ggplot2)

In the sequel, some sections might needlessly re-load these libraries. Doing so is without harm but keeping the code there might help those users who are going to just cut and paste from here into their own projects.

For the aesthetics I find useful in charts, I set up graphics themes, one for each different kind of plot:

myTStheme <- theme_classic() +
  theme(
    plot.title=element_text(size=rel(1.5)),
    legend.title=element_text(size=rel(1.5)),
    legend.text=element_text(size=rel(1.5)),
    legend.position=c(1,0), legend.justification=c(1,0),
    axis.text=element_text(size=rel(1.5)),
    axis.title=element_text(size=rel(1.5)),
    axis.title.x=element_blank(),
    axis.title.y=element_blank()
 )

Then I have collected data and R code routines into their own directories so I can re-use them conveniently across different projects:

myDataDir     <- file.path ("~", "Dropbox", "1", "j", "Data")
myRoutinesDir <- file.path ("~", "Dropbox", "1", "j", "Code", "Routines")

Where you put your own data and routines will differ from these, so just set the file.path values to what you want. Alternatively—which is how I do it—you can put these in your .Rprofile so they are unneeded in this file but will be executed whenever you invoke R.

The Maddison Project

The Maddison Project data comprise the now-standard empirical estimates to study economic growth over the very long run. These data are provided as an Excel spreadsheet on the Project’s website; since December 2015 they have also been made available as an R library. Further below, in presenting manipulations on Penn World Tables data, I show how to download and use data that have been packaged up in R libraries more generally.

With the R library a researcher can proceed directly to the data analysis for those who use R. The Excel spreadsheet, however, might well remain a standard source for many others, including those who want to see the data directly. Unfortunately, the information here is given in a way that is more useful visually than for data manipulation and analysis. To use this spreadsheet in data analysis one will need to go through something like the following to put the data into usable form.

library(gdata)
library(reshape2)
library(stringr)
theMaddisonXLS <- file.path(myDataDir, "Maddison-Project", "mpd_2013-01.xlsx")
hold.DF        <- read.xls(theMaddisonXLS, skip=1, stringsAsFactors=FALSE)
colNames       <- as.character(hold.DF[1, ])
colNames[1]    <- "Year"
new.DF         <- hold.DF[-1, ]
names(new.DF)  <- colNames
MaddP.DF       <- melt(new.DF, id.vars="Year")
rm(new.DF, hold.DF, colNames)
MaddP.DF$value <- as.numeric(gsub(",", "", MaddP.DF$value))
MaddP.DF       <- MaddP.DF[!is.na(MaddP.DF$value), ]

names(MaddP.DF)[2] <- "Economy"
names(MaddP.DF)[3] <- "perCapitaGDP"

MaddP.DF$logPerCapGDP <- log(MaddP.DF$perCapitaGDP)
MaddP.DF$Economy      <- str_trim(MaddP.DF$Economy, side="both")
detach("package:stringr")
detach("package:reshape2")
detach("package:gdata")

(Why so elaborate? Check that if we didn’t do trimming of whitespace withstr_trim(), we wouldn’t get a match for “Sweden” in the codechunk to follow. Astounding but true. Instead, we would have had to match “Sweden\32”, i.e., with the invisible blank at the end of the name. Similarly, “Denmark\32,”Finland\32“,”Germany\32“, and so on, but, no, not”France“. That last one’s been put there without a trailing blank. Also, if we hadn’t executed the gsub(), to remove all”,“’s, then entries such as”1,218″ would be unrecognizable as numbers – but instead appear as just NA. Taking out the “,”’s and then converting to numbers by as.numeric are operations needed to get these data to be manipulable for statistical analysis. Yes, spreadsheets and casual hand-editing are excellent to be able to see data directly but they’re dangerous things in computer software.)

Since I knew I would want to use these data repeatedly and I didn’t want to keep running the codechunk above, I saved my own copy of the Maddison Project GDP data in R’s native format:

myMaddP.file <- file.path(myDataDir, "Maddison-Project", "maddp-201301-DQ.rds")
saveRDS(MaddP.DF, file=myMaddP.file)

(This is only for my personal use so I’m not packaging it up as a library. But of course if you do want the R library version, again you can get that for yourself.)

When I now need to use these data I no longer need to do all the stripping and cleaning after (slowly) reading a spreadsheet as above. Instead I just go:

MaddP.DF <- readRDS(myMaddP.file)
rm(myMaddP.file)

so that, for instance, to get growth rates:

MaddP.DF$annGrowth <- NA
for (anEconomy in unique(MaddP.DF$Economy)) {
  theYears <- MaddP.DF[MaddP.DF$Economy==anEconomy, ]$Year
  logPCGDP <- MaddP.DF[MaddP.DF$Economy==anEconomy, ]$logPerCapGDP
  theAnnGr <- rep(NA, length(logPCGDP))
  for (jLoop in 2:length(theAnnGr)) {
    if (theYears[jLoop-1] == theYears[jLoop]-1) {
      theAnnGr[jLoop] <- logPCGDP[jLoop] - logPCGDP[jLoop-1]
    }
  }
# Change to percent and then move into dataframe
  MaddP.DF[MaddP.DF$Economy==anEconomy, ]$annGrowth <- 100.0 * theAnnGr
  rm(theAnnGr)
}

(for those who know R, notice I can’t vectorise the inner loop using, say, diffas I need to check if the data are available sequentially in time).

What economies are we working with here?

unique(MaddP.DF$Economy)
##   [1] "Austria"                          
##   [2] "Belgium"                          
##   [3] "Denmark"                          
##   [4] "Finland"                          
##   [5] "France"                           
##   [6] "Germany"                          
##   [7] "(Centre-   North)           Italy"
##   [8] "Holland/     Netherlands"         
##   [9] "Norway"                           
##  [10] "Sweden"                           
##  [11] "Switzerland"                      
##  [12] "England/GB/UK"                    
##  [13] "12 W. Europe"                     
##  [14] "Ireland"                          
##  [15] "Greece"                           
##  [16] "Portugal"                         
##  [17] "Spain"                            
##  [18] "14 small WEC"                     
##  [19] "30 W. Europe"                     
##  [20] "Australia"                        
##  [21] "N. Zealand"                       
##  [22] "Canada"                           
##  [23] "USA"                              
##  [24] "W. Offshoots"                     
##  [25] "Albania"                          
##  [26] "Bulgaria"                         
##  [27] "Czecho-slovakia"                  
##  [28] "Hungary"                          
##  [29] "Poland"                           
##  [30] "Romania"                          
##  [31] "Yugoslavia"                       
##  [32] "7 E. Europe"                      
##  [33] "Bosnia"                           
##  [34] "Croatia"                          
##  [35] "Macedonia"                        
##  [36] "Slovenia"                         
##  [37] "Montenegro"                       
##  [38] "Serbia"                           
##  [39] "Kosovo"                           
##  [40] "F. Yugoslavia"                    
##  [41] "Czech Rep."                       
##  [42] "Slovakia"                         
##  [43] "F. Czecho-slovakia"               
##  [44] "Armenia"                          
##  [45] "Azerbaijan"                       
##  [46] "Belarus"                          
##  [47] "Estonia"                          
##  [48] "Georgia"                          
##  [49] "Kazakhstan"                       
##  [50] "Kyrgyzstan"                       
##  [51] "Latvia"                           
##  [52] "Lithuania"                        
##  [53] "Moldova"                          
##  [54] "Russia"                           
##  [55] "Tajikistan"                       
##  [56] "Turk-menistan"                    
##  [57] "Ukraine"                          
##  [58] "Uzbekistan"                       
##  [59] "F. USSR"                          
##  [60] "Argentina"                        
##  [61] "Brazil"                           
##  [62] "Chile"                            
##  [63] "Colombia"                         
##  [64] "Mexico"                           
##  [65] "Peru"                             
##  [66] "Uruguay"                          
##  [67] "Venezuela"                        
##  [68] "8 L. America"                     
##  [69] "Bolivia"                          
##  [70] "Costa Rica"                       
##  [71] "Cuba"                             
##  [72] "Dominican Rep."                   
##  [73] "Ecuador"                          
##  [74] "El Salvador"                      
##  [75] "Guatemala"                        
##  [76] "Haïti"                            
##  [77] "Honduras"                         
##  [78] "Jamaica"                          
##  [79] "Nicaragua"                        
##  [80] "Panama"                           
##  [81] "Paraguay"                         
##  [82] "Puerto Rico"                      
##  [83] "T. & Tobago"                      
##  [84] "15 L. America"                    
##  [85] "21 Caribbean"                     
##  [86] "L. America"                       
##  [87] "China"                            
##  [88] "India"                            
##  [89] "Indonesia (Java before 1880)"     
##  [90] "Japan"                            
##  [91] "Philippines"                      
##  [92] "S. Korea"                         
##  [93] "Thailand"                         
##  [94] "Taiwan"                           
##  [95] "Bangladesh"                       
##  [96] "Burma"                            
##  [97] "Hong Kong"                        
##  [98] "Malaysia"                         
##  [99] "Nepal"                            
## [100] "Pakistan"                         
## [101] "Singapore"                        
## [102] "Sri Lanka"                        
## [103] "16 E. Asia"                       
## [104] "Afghanistan"                      
## [105] "Cambodia"                         
## [106] "Laos"                             
## [107] "Mongolia"                         
## [108] "North Korea"                      
## [109] "Vietnam"                          
## [110] "24 Sm. E. Asia"                   
## [111] "30 E. Asia"                       
## [112] "Bahrain"                          
## [113] "Iran"                             
## [114] "Iraq"                             
## [115] "Israel"                           
## [116] "Jordan"                           
## [117] "Kuwait"                           
## [118] "Lebanon"                          
## [119] "Oman"                             
## [120] "Qatar"                            
## [121] "Saudi Arabia"                     
## [122] "Syria"                            
## [123] ""                                 
## [124] "UAE"                              
## [125] "Yemen"                            
## [126] "W. Bank & Gaza"                   
## [127] "15 W. Asia"                       
## [128] "Asia"                             
## [129] "Algeria"                          
## [130] "Angola"                           
## [131] "Benin"                            
## [132] "Botswana"                         
## [133] "Burkina Faso"                     
## [134] "Burundi"                          
## [135] "Cameroon"                         
## [136] "Cape Verde"                       
## [137] "Centr. Afr. Rep."                 
## [138] "Chad"                             
## [139] "Comoro Islands"                   
## [140] "Congo 'Brazzaville'"              
## [141] "Côte d'Ivoire"                    
## [142] "Djibouti"                         
## [143] "Egypt"                            
## [144] "Equatorial Guinea"                
## [145] "Eritrea & Ethiopia"               
## [146] "Gabon"                            
## [147] "Gambia"                           
## [148] "Ghana"                            
## [149] "Guinea"                           
## [150] "Guinea Bissau"                    
## [151] "Kenya"                            
## [152] "Lesotho"                          
## [153] "Liberia"                          
## [154] "Libya"                            
## [155] "Madagascar"                       
## [156] "Malawi"                           
## [157] "Mali"                             
## [158] "Mauritania"                       
## [159] "Mauritius"                        
## [160] "Morocco"                          
## [161] "Mozambique"                       
## [162] "Namibia"                          
## [163] "Niger"                            
## [164] "Nigeria"                          
## [165] "Rwanda"                           
## [166] "Sao Tomé & Principe"              
## [167] "Senegal"                          
## [168] "Seychelles"                       
## [169] "Sierra Leone"                     
## [170] "Somalia"                          
## [171] "Cape Colony/ South Africa"        
## [172] "Sudan"                            
## [173] "Swaziland"                        
## [174] "Tanzania"                         
## [175] "Togo"                             
## [176] "Tunisia"                          
## [177] "Uganda"                           
## [178] "Congo-Kinshasa"                   
## [179] "Zambia"                           
## [180] "Zimbabwe"                         
## [181] "3 Small Afr."                     
## [182] "Total Africa"                     
## [183] "Total World"

A bit of a mess, isn’t it? And it’s even after we’ve already done a str_trim().

I am in awe of the amount of work that goes into constructing these Maddison Project data. These researchers have my greatest respect. But the mess above is what happens when authors use names they make up, like “England/GB/UK”, “Holland/ Netherlands”, “(Centre- North) Italy”, “14 small WEC”, or “3 Small Afr.”; or when they insert peculiar characters like “&” or random invisible whitespace.

(Without ISO standardisation, of course, it’s inevitable we have to make things up. Still.)

It’s bad enough when we have to guess what these names mean in a spreadsheet; trying to write computer code to select things systematically is almost impossible.

But having done our best to clean these data, take a look at some selected growth experiences.

In the Appendix I set up R code to do this conveniently; that code will be re-used subsequently as well.

Using that code now, check out these four economies from 1870 to 2010:

theBegSmpl   <- 1870
theEndSmpl   <- 2010
theEconomies <- c("USA", "England/GB/UK", "France", "Sweden")
theSeries    <- "logPerCapGDP"
thisTitle    <- "log Per Capita GDP in constant 1990 Int. GK$"

source(file=file.path(myRoutinesDir, "multplot-maddp.R"), local=TRUE, echo=TRUE)
## 
## > getSeries <- c("Year", "Economy", theSeries)
## 
## > theAES <- aes_string(x = "Year", y = theSeries, group = "Economy", 
## +     colour = "Economy")
## 
## > this.DF <- MaddP.DF[(MaddP.DF$Economy %in% theEconomies) & 
## +     (MaddP.DF$Year >= theBegSmpl) & (MaddP.DF$Year <= theEndSmpl), 
## +     getSeries]
## 
## > ggplot(data = this.DF, theAES) + geom_line(size = 2) + 
## +     myTStheme + ggtitle(thisTitle)

## 
## > rm(this.DF, theAES, getSeries)
rm(thisTitle, theSeries, theEconomies, theEndSmpl, theBegSmpl)

To structure more clearly this information, I seek to eyeball an extrapolated trend in these per capita incomes data. As previously, I provide in the Appendix the R code to do this. Here, I just call that code after setting up the things I want to see.

Begin with US data:

theBegFit  <- 1870
theEndFit  <- 1980
theEndSmp  <- 2010
theEconomy <- "USA"
source(file=file.path(myRoutinesDir, "eyetrend-maddp.R"), local=TRUE, echo=TRUE)
## 
## > olsFIT <- lm(logPerCapGDP ~ Year, data = MaddP.DF[(MaddP.DF$Economy == 
## +     theEconomy) & (MaddP.DF$Year >= theBegFit) & (MaddP.DF$Year <= 
## +      .... [TRUNCATED] 
## 
## > thisTitle <- paste0(theEconomy, ": log Per Capita GDP in constant 1990 Int. GK$")
## 
## > this.DF <- MaddP.DF[(MaddP.DF$Economy == theEconomy) & 
## +     (MaddP.DF$Year >= theBegFit) & (MaddP.DF$Year <= theEndSmp), 
## +     c("Year", "perCapi ..." ... [TRUNCATED] 
## 
## > ggplot(data = this.DF, aes(x = Year, y = logPerCapGDP)) + 
## +     geom_line(size = 2) + geom_segment(data = this.DF, aes(x = theBegFit, 
## +     xend = .... [TRUNCATED]

## 
## > thisTitle <- paste0(theEconomy, ": Per Capita GDP in constant 1990 Int. GK$")
## 
## > expTrendFitted <- function(x) {
## +     ifelse(x >= theBegFit & x <= theEndFit, exp(coef(olsFIT)[1] + 
## +         (x * coef(olsFIT)[2])), NA)
## + }
## 
## > expTrendExtrap <- function(x) {
## +     ifelse(x >= theEndFit + 1 & x <= theEndSmp, exp(coef(olsFIT)[1] + 
## +         (x * coef(olsFIT)[2])), NA)
## + }
## 
## > ggplot(data = this.DF, aes(x = Year, y = perCapitaGDP)) + 
## +     geom_line(size = 2) + stat_function(fun = expTrendFitted, 
## +     linetype = 1, colo .... [TRUNCATED]

## 
## > rm(olsFIT, expTrendFitted, expTrendExtrap)
rm(theBegFit, theEndFit, theEndSmp, theEconomy)

where presented are both the fitted linear trend for the log of US per capita GDP, and the resulting exponential trend for the original series. The solid line is the fitted trend; the dashed line the extrapolation.

Remarkably, a smooth exponential trend, fitted from 1870 through as early as 1980, gives a reasonable description on the out-of-sample post-1980 behaviour of US per capita GDP.

Do the same for China but now beginning in 1950 as it’s from then that the Maddison Project data provide a usefully uninterrupted sequence:

theBegFit  <- 1950
theEndFit  <- 1980 
theEndSmp  <- 2010
theEconomy <- "China"

source(file=file.path(myRoutinesDir, "eyetrend-maddp.R"), local=TRUE, echo=TRUE)
## 
## > olsFIT <- lm(logPerCapGDP ~ Year, data = MaddP.DF[(MaddP.DF$Economy == 
## +     theEconomy) & (MaddP.DF$Year >= theBegFit) & (MaddP.DF$Year <= 
## +      .... [TRUNCATED] 
## 
## > thisTitle <- paste0(theEconomy, ": log Per Capita GDP in constant 1990 Int. GK$")
## 
## > this.DF <- MaddP.DF[(MaddP.DF$Economy == theEconomy) & 
## +     (MaddP.DF$Year >= theBegFit) & (MaddP.DF$Year <= theEndSmp), 
## +     c("Year", "perCapi ..." ... [TRUNCATED] 
## 
## > ggplot(data = this.DF, aes(x = Year, y = logPerCapGDP)) + 
## +     geom_line(size = 2) + geom_segment(data = this.DF, aes(x = theBegFit, 
## +     xend = .... [TRUNCATED]

## 
## > thisTitle <- paste0(theEconomy, ": Per Capita GDP in constant 1990 Int. GK$")
## 
## > expTrendFitted <- function(x) {
## +     ifelse(x >= theBegFit & x <= theEndFit, exp(coef(olsFIT)[1] + 
## +         (x * coef(olsFIT)[2])), NA)
## + }
## 
## > expTrendExtrap <- function(x) {
## +     ifelse(x >= theEndFit + 1 & x <= theEndSmp, exp(coef(olsFIT)[1] + 
## +         (x * coef(olsFIT)[2])), NA)
## + }
## 
## > ggplot(data = this.DF, aes(x = Year, y = perCapitaGDP)) + 
## +     geom_line(size = 2) + stat_function(fun = expTrendFitted, 
## +     linetype = 1, colo .... [TRUNCATED]

## 
## > rm(olsFIT, expTrendFitted, expTrendExtrap)
rm(theEconomy, theEndSmp, theEndFit, theBegFit)

In stark contrast to the US, China’s per capita GDP follows post-1980 a completely different trajectory from its pre-1980 history. This, of course, is no surprise to anyone even vaguely aware of global economic developments. The value of the calculation is to quantify how large the change is that has occurred: if anyone thought growth trends were slow and difficult to change, China provides a striking and positive counter-example.

Finally, for comparison, let’s do this for the UK:

theBegFit  <- 1950
theEndFit  <- 1980
theEndSmp  <- 2010
theEconomy <- "England/GB/UK"

source(file=file.path(myRoutinesDir, "eyetrend-maddp.R"), local=TRUE, echo=TRUE)
## 
## > olsFIT <- lm(logPerCapGDP ~ Year, data = MaddP.DF[(MaddP.DF$Economy == 
## +     theEconomy) & (MaddP.DF$Year >= theBegFit) & (MaddP.DF$Year <= 
## +      .... [TRUNCATED] 
## 
## > thisTitle <- paste0(theEconomy, ": log Per Capita GDP in constant 1990 Int. GK$")
## 
## > this.DF <- MaddP.DF[(MaddP.DF$Economy == theEconomy) & 
## +     (MaddP.DF$Year >= theBegFit) & (MaddP.DF$Year <= theEndSmp), 
## +     c("Year", "perCapi ..." ... [TRUNCATED] 
## 
## > ggplot(data = this.DF, aes(x = Year, y = logPerCapGDP)) + 
## +     geom_line(size = 2) + geom_segment(data = this.DF, aes(x = theBegFit, 
## +     xend = .... [TRUNCATED]

## 
## > thisTitle <- paste0(theEconomy, ": Per Capita GDP in constant 1990 Int. GK$")
## 
## > expTrendFitted <- function(x) {
## +     ifelse(x >= theBegFit & x <= theEndFit, exp(coef(olsFIT)[1] + 
## +         (x * coef(olsFIT)[2])), NA)
## + }
## 
## > expTrendExtrap <- function(x) {
## +     ifelse(x >= theEndFit + 1 & x <= theEndSmp, exp(coef(olsFIT)[1] + 
## +         (x * coef(olsFIT)[2])), NA)
## + }
## 
## > ggplot(data = this.DF, aes(x = Year, y = perCapitaGDP)) + 
## +     geom_line(size = 2) + stat_function(fun = expTrendFitted, 
## +     linetype = 1, colo .... [TRUNCATED]

## 
## > rm(olsFIT, expTrendFitted, expTrendExtrap)
rm(theEconomy, theEndSmp, theEndFit, theBegFit)

Get a final sense of the difference here by putting all these on the same graph.

theBegSmpl   <- 1950
theEndSmpl   <- 2010
theEconomies <- c("USA", "England/GB/UK", "China")
theSeries    <- "logPerCapGDP"
thisTitle    <- "log Per Capita GDP in constant 1990 Int. GK$"

source(file=file.path(myRoutinesDir, "multplot-maddp.R"), local=TRUE, echo=TRUE)
## 
## > getSeries <- c("Year", "Economy", theSeries)
## 
## > theAES <- aes_string(x = "Year", y = theSeries, group = "Economy", 
## +     colour = "Economy")
## 
## > this.DF <- MaddP.DF[(MaddP.DF$Economy %in% theEconomies) & 
## +     (MaddP.DF$Year >= theBegSmpl) & (MaddP.DF$Year <= theEndSmpl), 
## +     getSeries]
## 
## > ggplot(data = this.DF, theAES) + geom_line(size = 2) + 
## +     myTStheme + ggtitle(thisTitle)

## 
## > rm(this.DF, theAES, getSeries)
rm(thisTitle, theSeries, theEconomies, theEndSmpl, theBegSmpl)

A more useful perspective on the size of these cross-country differences come from the levels of the series themselves, not their logs.

theBegSmpl   <- 1950
theEndSmpl   <- 2010
theEconomies <- c("USA", "England/GB/UK", "China")
theSeries    <- "perCapitaGDP"
thisTitle    <- "Per Capita GDP in constant 1990 Int. GK$"

source(file=file.path(myRoutinesDir, "multplot-maddp.R"), local=TRUE, echo=TRUE)
## 
## > getSeries <- c("Year", "Economy", theSeries)
## 
## > theAES <- aes_string(x = "Year", y = theSeries, group = "Economy", 
## +     colour = "Economy")
## 
## > this.DF <- MaddP.DF[(MaddP.DF$Economy %in% theEconomies) & 
## +     (MaddP.DF$Year >= theBegSmpl) & (MaddP.DF$Year <= theEndSmpl), 
## +     getSeries]
## 
## > ggplot(data = this.DF, theAES) + geom_line(size = 2) + 
## +     myTStheme + ggtitle(thisTitle)

## 
## > rm(this.DF, theAES, getSeries)
rm(thisTitle, theSeries, theEconomies, theEndSmpl, theBegSmpl)

Remember, however, that this is for per capita GDP and obviously therefore does not take into account the sizes of the different populations.

World Development Indicators

The World Bank’s World Development Indicators WDI are available in one large Excel ZIPfile, one large CSV text ZIPfile, and through online query.

[To include here – R Code I had used for my DV409 course, for students in International Development at LSE]

Penn World Tables

The Penn World Tables provide annual economic data on incomes, outputs, inputs, and productivity across more than 150 economies beginning in 1950. This project was begun by Robert Summers, Alan Heston, and Irving Kravis has now been taken over by a worldwide team of researchers. Feenstra, Robert C., Robert Inklaar, and Marcel P. Timmer (2013) “The Next Generation of the Penn World Table” currently provide regular updates. The project name, however, obviously still shows where it originated.

Penn World Tables (PWT) version 8.0 data are available as spreadsheets created from dynamic queries on the site. The results from such queries also contain extensive descriptions on the assumptions and procedures used to construct these data. So we could always craft our requests to that site directly.

As an alternative to that, R users from around the world have assembled an R dataset collecting together all the PWT data, and have placed that on R servers. So we can instead just use that directly. In this approach – as with much of modern computer thinking – data comprised of numbers are no different from executable library code, so we can just install our own private version of the PWT data as an R package. As with any R library, we only need to install the PWT data once on whatever machine we want to use.

Install-Pkg

Key in or select pwt8, and then let RStudio install the data.

The pwt8 manual at the site gives a compact description and listing of what’s in it. By loading this dataset, exactly as we would an R library of code that we might run, we immediately have access to all the PWT8.0 variables:

library("pwt8")
data("pwt8.0")

The R documentation describes pwt8.0 as a dataframe of 10,354 observations on 39 variables. To understand this, remember that in R terminology, a dataframe is a 2-dimensional array. However, like most modern things on computers, a dataframe can hold text, numbers, items of logic, and possibly even more complicated objects as its entries, freely intermingled. Each variable has 10,354 (167 economies, 62 years) observations: many of these observations might, of course, be NA (not available) but in principle we have data on 167 economies for 62 years, 1950 through 2011.

Create our own dataframe and put in it, among other information, per capita GDP (measured in thousands of constant 2005 PPP-adjusted US$):

ourOwn.DF <- data.frame (country=pwt8.0$country,
                         isocode=pwt8.0$isocode,
                         year=pwt8.0$year)
ourOwn.DF$pc.GDP  <- (pwt8.0$rgdpe / pwt8.0$pop)/1000.0

Our dataframe ourOwn.DF now contains in its first four columns the variables country, isocode, year, and per capita GDP pc.GDP. We can see the beginnings of those columns by asking for the dataframe’s structure,

str(ourOwn.DF)
## 'data.frame':    10354 obs. of  4 variables:
##  $ country: Factor w/ 167 levels "Angola","Albania",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ isocode: Factor w/ 167 levels "AGO","ALB","ARG",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ year   : int  1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 ...
##  $ pc.GDP : num  NA NA NA NA NA NA NA NA NA NA ...

R doesn’t try to print out everything, just enough of the start of those columns that we know things are as we expect.

What PWT uses to label economies is an ISO code (ISO 3166-1 alpha-3), unfortunately, different from the World Bank’s country codes. We can see what these isocodes are by:

unique(ourOwn.DF$isocode)
##   [1] AGO ALB ARG ARM ATG AUS AUT AZE BDI BEL BEN BFA BGD BGR BHR BHS BIH
##  [18] BLR BLZ BMU BOL BRA BRB BRN BTN BWA CAF CAN CHE CHL CHN CIV CMR COD
##  [35] COG COL COM CPV CRI CYP CZE DEU DJI DMA DNK DOM ECU EGY ESP EST ETH
##  [52] FIN FJI FRA GAB GBR GEO GHA GIN GMB GNB GNQ GRC GRD GTM HKG HND HRV
##  [69] HUN IDN IND IRL IRN IRQ ISL ISR ITA JAM JOR JPN KAZ KEN KGZ KHM KNA
##  [86] KOR KWT LAO LBN LBR LCA LKA LSO LTU LUX LVA MAC MAR MDA MDG MDV MEX
## [103] MKD MLI MLT MNE MNG MOZ MRT MUS MWI MYS NAM NER NGA NLD NOR NPL NZL
## [120] OMN PAK PAN PER PHL POL PRT PRY QAT ROU RUS RWA SAU SDN SEN SGP SLE
## [137] SLV SRB STP SUR SVK SVN SWE SWZ SYR TCD TGO THA TJK TKM TTO TUN TUR
## [154] TWN TZA UGA UKR URY USA UZB VCT VEN VNM YEM ZAF ZMB ZWE
## 167 Levels: AGO ALB ARG ARM ATG AUS AUT AZE BDI BEL BEN BFA BGD BGR ... ZWE

so that, explicitly, countries and isocodes can be seen by (the relatively obscure):

by (ourOwn.DF, ourOwn.DF$isocode, FUN = function(a.DF) {
   a.DF[1, c("country", "isocode")]
   }
)

or, perhaps more transparently, since there is only ever one “2011” observation for each economy:

subset (ourOwn.DF, subset=(ourOwn.DF$year == "2011"), c("isocode", "country"))
##       isocode                    country
## 62        AGO                     Angola
## 124       ALB                    Albania
## 186       ARG                  Argentina
## 248       ARM                    Armenia
## 310       ATG        Antigua and Barbuda
## 372       AUS                  Australia
## 434       AUT                    Austria
## 496       AZE                 Azerbaijan
## 558       BDI                    Burundi
## 620       BEL                    Belgium
## 682       BEN                      Benin
## 744       BFA               Burkina Faso
## 806       BGD                 Bangladesh
## 868       BGR                   Bulgaria
## 930       BHR                    Bahrain
## 992       BHS                    Bahamas
## 1054      BIH     Bosnia and Herzegovina
## 1116      BLR                    Belarus
## 1178      BLZ                     Belize
## 1240      BMU                    Bermuda
## 1302      BOL                    Bolivia
## 1364      BRA                     Brazil
## 1426      BRB                   Barbados
## 1488      BRN                     Brunei
## 1550      BTN                     Bhutan
## 1612      BWA                   Botswana
## 1674      CAF   Central African Republic
## 1736      CAN                     Canada
## 1798      CHE                Switzerland
## 1860      CHL                      Chile
## 1922      CHN                      China
## 1984      CIV              Cote d'Ivoire
## 2046      CMR                   Cameroon
## 2108      COD Congo, Democratic Republic
## 2170      COG         Congo, Republic of
## 2232      COL                   Colombia
## 2294      COM                    Comoros
## 2356      CPV                 Cape Verde
## 2418      CRI                 Costa Rica
## 2480      CYP                     Cyprus
## 2542      CZE             Czech Republic
## 2604      DEU                    Germany
## 2666      DJI                   Djibouti
## 2728      DMA                   Dominica
## 2790      DNK                    Denmark
## 2852      DOM         Dominican Republic
## 2914      ECU                    Ecuador
## 2976      EGY                      Egypt
## 3038      ESP                      Spain
## 3100      EST                    Estonia
## 3162      ETH                   Ethiopia
## 3224      FIN                    Finland
## 3286      FJI                       Fiji
## 3348      FRA                     France
## 3410      GAB                      Gabon
## 3472      GBR             United Kingdom
## 3534      GEO                    Georgia
## 3596      GHA                      Ghana
## 3658      GIN                     Guinea
## 3720      GMB                Gambia, The
## 3782      GNB              Guinea-Bissau
## 3844      GNQ          Equatorial Guinea
## 3906      GRC                     Greece
## 3968      GRD                    Grenada
## 4030      GTM                  Guatemala
## 4092      HKG                  Hong Kong
## 4154      HND                   Honduras
## 4216      HRV                    Croatia
## 4278      HUN                    Hungary
## 4340      IDN                  Indonesia
## 4402      IND                      India
## 4464      IRL                    Ireland
## 4526      IRN                       Iran
## 4588      IRQ                       Iraq
## 4650      ISL                    Iceland
## 4712      ISR                     Israel
## 4774      ITA                      Italy
## 4836      JAM                    Jamaica
## 4898      JOR                     Jordan
## 4960      JPN                      Japan
## 5022      KAZ                 Kazakhstan
## 5084      KEN                      Kenya
## 5146      KGZ                 Kyrgyzstan
## 5208      KHM                   Cambodia
## 5270      KNA          St. Kitts & Nevis
## 5332      KOR         Korea, Republic of
## 5394      KWT                     Kuwait
## 5456      LAO                       Laos
## 5518      LBN                    Lebanon
## 5580      LBR                    Liberia
## 5642      LCA                  St. Lucia
## 5704      LKA                  Sri Lanka
## 5766      LSO                    Lesotho
## 5828      LTU                  Lithuania
## 5890      LUX                 Luxembourg
## 5952      LVA                     Latvia
## 6014      MAC                      Macao
## 6076      MAR                    Morocco
## 6138      MDA                    Moldova
## 6200      MDG                 Madagascar
## 6262      MDV                   Maldives
## 6324      MEX                     Mexico
## 6386      MKD                  Macedonia
## 6448      MLI                       Mali
## 6510      MLT                      Malta
## 6572      MNE                 Montenegro
## 6634      MNG                   Mongolia
## 6696      MOZ                 Mozambique
## 6758      MRT                 Mauritania
## 6820      MUS                  Mauritius
## 6882      MWI                     Malawi
## 6944      MYS                   Malaysia
## 7006      NAM                    Namibia
## 7068      NER                      Niger
## 7130      NGA                    Nigeria
## 7192      NLD                Netherlands
## 7254      NOR                     Norway
## 7316      NPL                      Nepal
## 7378      NZL                New Zealand
## 7440      OMN                       Oman
## 7502      PAK                   Pakistan
## 7564      PAN                     Panama
## 7626      PER                       Peru
## 7688      PHL                Philippines
## 7750      POL                     Poland
## 7812      PRT                   Portugal
## 7874      PRY                   Paraguay
## 7936      QAT                      Qatar
## 7998      ROU                    Romania
## 8060      RUS                     Russia
## 8122      RWA                     Rwanda
## 8184      SAU               Saudi Arabia
## 8246      SDN                      Sudan
## 8308      SEN                    Senegal
## 8370      SGP                  Singapore
## 8432      SLE               Sierra Leone
## 8494      SLV                El Salvador
## 8556      SRB                     Serbia
## 8618      STP      Sao Tome and Principe
## 8680      SUR                   Suriname
## 8742      SVK            Slovak Republic
## 8804      SVN                   Slovenia
## 8866      SWE                     Sweden
## 8928      SWZ                  Swaziland
## 8990      SYR                      Syria
## 9052      TCD                       Chad
## 9114      TGO                       Togo
## 9176      THA                   Thailand
## 9238      TJK                 Tajikistan
## 9300      TKM               Turkmenistan
## 9362      TTO          Trinidad & Tobago
## 9424      TUN                    Tunisia
## 9486      TUR                     Turkey
## 9548      TWN                     Taiwan
## 9610      TZA                   Tanzania
## 9672      UGA                     Uganda
## 9734      UKR                    Ukraine
## 9796      URY                    Uruguay
## 9858      USA   United States of America
## 9920      UZB                 Uzbekistan
## 9982      VCT   St. Vincent & Grenadines
## 10044     VEN                  Venezuela
## 10106     VNM                    Vietnam
## 10168     YEM                      Yemen
## 10230     ZAF               South Africa
## 10292     ZMB                     Zambia
## 10354     ZWE                   Zimbabwe

You can also take a look for selected economies of the data that we have just created:

ourOwn.DF[ourOwn.DF$isocode %in% c("CHN", "USA"), c("isocode", "year", "pc.GDP")]
##      isocode year pc.GDP
## 1861     CHN 1950     NA
## 1862     CHN 1951     NA
## 1863     CHN 1952  0.614
## 1864     CHN 1953  0.679
## 1865     CHN 1954  0.688
## 1866     CHN 1955  0.705
## 1867     CHN 1956  0.742
## 1868     CHN 1957  0.793
## 1869     CHN 1958  0.912
## 1870     CHN 1959  0.992
## 1871     CHN 1960  0.928
## 1872     CHN 1961  0.588
## 1873     CHN 1962  0.603
## 1874     CHN 1963  0.682
## 1875     CHN 1964  0.772
## 1876     CHN 1965  0.842
## 1877     CHN 1966  0.910
## 1878     CHN 1967  0.799
## 1879     CHN 1968  0.749
## 1880     CHN 1969  0.827
## 1881     CHN 1970  0.967
## 1882     CHN 1971  1.006
## 1883     CHN 1972  0.976
## 1884     CHN 1973  1.043
## 1885     CHN 1974  1.044
## 1886     CHN 1975  1.090
## 1887     CHN 1976  1.065
## 1888     CHN 1977  1.089
## 1889     CHN 1978  1.234
## 1890     CHN 1979  1.296
## 1891     CHN 1980  1.324
## 1892     CHN 1981  1.368
## 1893     CHN 1982  1.475
## 1894     CHN 1983  1.556
## 1895     CHN 1984  1.858
## 1896     CHN 1985  2.005
## 1897     CHN 1986  2.083
## 1898     CHN 1987  2.164
## 1899     CHN 1988  2.111
## 1900     CHN 1989  1.966
## 1901     CHN 1990  2.041
## 1902     CHN 1991  2.138
## 1903     CHN 1992  2.297
## 1904     CHN 1993  2.548
## 1905     CHN 1994  2.742
## 1906     CHN 1995  3.058
## 1907     CHN 1996  3.132
## 1908     CHN 1997  3.296
## 1909     CHN 1998  3.239
## 1910     CHN 1999  3.371
## 1911     CHN 2000  3.533
## 1912     CHN 2001  3.753
## 1913     CHN 2002  4.137
## 1914     CHN 2003  4.451
## 1915     CHN 2004  4.880
## 1916     CHN 2005  5.342
## 1917     CHN 2006  5.973
## 1918     CHN 2007  6.610
## 1919     CHN 2008  6.721
## 1920     CHN 2009  7.189
## 1921     CHN 2010  7.679
## 1922     CHN 2011  8.069
## 9797     USA 1950 12.802
## 9798     USA 1951 13.387
## 9799     USA 1952 13.621
## 9800     USA 1953 14.032
## 9801     USA 1954 13.740
## 9802     USA 1955 14.552
## 9803     USA 1956 14.599
## 9804     USA 1957 14.641
## 9805     USA 1958 14.284
## 9806     USA 1959 15.072
## 9807     USA 1960 15.220
## 9808     USA 1961 15.323
## 9809     USA 1962 16.028
## 9810     USA 1963 16.495
## 9811     USA 1964 17.236
## 9812     USA 1965 18.176
## 9813     USA 1966 19.142
## 9814     USA 1967 19.412
## 9815     USA 1968 20.188
## 9816     USA 1969 20.667
## 9817     USA 1970 20.495
## 9818     USA 1971 21.046
## 9819     USA 1972 22.063
## 9820     USA 1973 23.183
## 9821     USA 1974 22.541
## 9822     USA 1975 22.239
## 9823     USA 1976 23.324
## 9824     USA 1977 24.150
## 9825     USA 1978 25.303
## 9826     USA 1979 25.740
## 9827     USA 1980 25.021
## 9828     USA 1981 25.481
## 9829     USA 1982 24.721
## 9830     USA 1983 25.725
## 9831     USA 1984 27.528
## 9832     USA 1985 28.377
## 9833     USA 1986 28.981
## 9834     USA 1987 29.499
## 9835     USA 1988 30.399
## 9836     USA 1989 31.189
## 9837     USA 1990 31.344
## 9838     USA 1991 30.984
## 9839     USA 1992 31.798
## 9840     USA 1993 32.537
## 9841     USA 1994 33.683
## 9842     USA 1995 34.211
## 9843     USA 1996 35.225
## 9844     USA 1997 36.567
## 9845     USA 1998 37.978
## 9846     USA 1999 39.382
## 9847     USA 2000 40.489
## 9848     USA 2001 40.522
## 9849     USA 2002 40.823
## 9850     USA 2003 41.404
## 9851     USA 2004 42.449
## 9852     USA 2005 43.212
## 9853     USA 2006 43.954
## 9854     USA 2007 44.372
## 9855     USA 2008 43.237
## 9856     USA 2009 41.728
## 9857     USA 2010 42.287
## 9858     USA 2011 42.646

so that we see in these data China’s per capita GDP just breached P$8000 in 2011. By contrast, the US had by 1950, the beginning of the sample, achieved better than 150% of China’s 2011 per capita GDP. In 2011 per capita GDP in the US exceeded 5 times China’s.

Also, we can check out the per capita GDP of a selection of economies of interest, looking at the numbers directly and then producing a graph of them.

This next instruction would serve up the numbers so we can look at them directly but to save space I don’t show its output:

ourOwn.DF[ourOwn.DF$isocode %in% c("GBR", "USA", "SGP"),
 c("year", "isocode", "pc.GDP")]

Produce next the desired graph, using the myTStheme aesthetic I defined at the beginning of this document:

thisTitle <- "GBR, USA, SGP per capita GDP at PPP"
ggplot(ourOwn.DF[ourOwn.DF$isocode %in% c("GBR", "USA", "SGP"), c("year", "isocode", "pc.GDP")],
  aes(x=year, y=pc.GDP, group=isocode, colour=isocode)) +
  myTStheme + geom_line(size=2) + ggtitle(thisTitle)

The first instruction says concentrate on that part of our dataframe whose isocodes are “GBR” (Great Britain), “USA” (the US), or “SGP” (Singapore), and pick out the columns “year”, “isocode”, and “pc.GDP” that go with those isocodes: This does no more than give us a peek within our dataframeourOwnData.DF, but it helps reassure us that everything is OK.

The ggplot() instruction re-creates, on the fly, a dataframe (that will go away once the instruction finishes), that is exactly the same as that we were just looking at from the previous instruction; and then it draws a line graph (usinggeom_line()) where the X-axis is the year variable and the Y-axispc.GDP, grouping the observations by isocode (so the “USA” observations all go together, and the “SGP” ones similarly), and using colours specific to each isocode.

(And so, yes, according to these data Singapore’s citizens, in purchasing power parity and on average, have grown richer than the US’s.)

I had used a related chart, showing the performance of a couple of other East Asian economies, in Chinese Lessons: Singapore’s Epic Regression to the Mean(Nov 2014).

ggplot(ourOwn.DF[ourOwn.DF$isocode %in% c("USA", "SGP", "TWN", "KOR"), c("year", "isocode", "pc.GDP")],
  aes(x=year, y=pc.GDP, group=isocode, colour=isocode)) +
  myTStheme + geom_line(size=2)

Get better resolution on this information by looking at ratios relative to US:

ourOwn.DF$relUSA <- rep(NA, nrow(ourOwn.DF))
tmpUSA <- ourOwn.DF[ourOwn.DF$isocode=="USA",]$pc.GDP

for (anISOcode in unique(ourOwn.DF$isocode)) {
 if (anISOcode!="USA") {
  propUSA <- ourOwn.DF[ourOwn.DF$isocode==anISOcode,]$pc.GDP / tmpUSA
  ourOwn.DF[ourOwn.DF$isocode==anISOcode,]$relUSA <-propUSA
 }
}

The 5 years at the beginning and end of the timesample showed average relative income levels (in percent):

for (anISOcode in c("SGP", "TWN", "KOR")) {
  cat(sprintf("%s %5.2f %5.2f\n", anISOcode,
   100*mean(ourOwn.DF$relUSA[ourOwn.DF$isocode==anISOcode & 
                          ourOwn.DF$year>1959 & ourOwn.DF$year<1965]),
   100*mean(ourOwn.DF$relUSA[ourOwn.DF$isocode==anISOcode & 
                          ourOwn.DF$year>2006 & ourOwn.DF$year<2011])))
}
## SGP 16.48 114.55
## TWN 13.22 63.96
## KOR  7.04 60.87

If you wish, before proceeding, you can now experiment with looking at different economies’ per capita GDP by varying the ggplot() call above. It’s impractical, however, to graph the per capita GDP of all 167 economies: Well, you can do so, of course, but it’s unclear what to make of the resulting wash of colored ink. That, however, has not prevented a number of well-known researchers from presenting exactly that.

IMF

A perennial question arising in timeseries is what to make of the difference between deeper, underlying long-run secular movements and short-run (quarter by quarter, or even year by year) directly observable fluctuations.

I use this background question to motivate the extraction and manipulation IMF World Economic Outlook Data. In particular, I retrace the steps I used to generate the long-run, short-run comparisons in “Convergence Determines Governance” (Nov 2014).

This trend/cycle distinction arises in many interesting situations when working with dynamic data. Our motivation here comes specifically from continuing the previous discussion on economic growth. We examine dynamic income patterns across advanced and emerging economies, taking the opportunity to unpack an additional useful dataset, namely that presented in the IMF’s World Economic Outlook (October 2015).

Download the “By Countries” and “By Country Groups” files from the IMF provider page, and use Excel to convert them to .xslx format. (Incidentally, one of the most common questions of IMF data is what IMF means by “Country Groups”. This listing for the October 2015 WEO report gives the answer.) I’m putting these files in my directory

file.path(myDataDir, “IMF-WEO”, “2015.10”)

and that’s what my R code will point to below. Again, you’ll want to modify those names accordingly for your own machine.

If you peek inside the spreadsheets, you’ll see that IMF decided to present their data vertically by variable (and country, country groups, and so on), and horizontally by year. If each variable has its values running down a column—which is the R convention—the IMF data are organised in the following “panel data” way. The spreadsheet contains, among many others, a variable named “1980”: that variable takes a certain value for the observation “USA GDP”, another for the observation labelled “Singapore Investment”, and so on. This isn’t particularly convenient for timeseries work. So we reshape these data using R.

Rather unfortunately further, the format the IMF decided to use differs across the two critical spreadsheet files. For countries, the file WEOOct2015all.xlsxincludes an extra column for ISO code (which is of course useful) but requires the coder to be wary when peeling off data. Below I will use this ISO information to refer to individual countries, so the code will be written to preserve it.

First read in everything and then keep the series we want. Here it’ll be GDP, or in IMF language, the “WEO Subject Code” that is “NGDPD”, measured in billions of US dollars. Most computer manipulations eschew whitespace like blanks or spaces in names—it’s difficult to tell when one name ends and something else begins. To help us, R quietly goes ahead and replaces the IMF label with a name that substitutes periods “.” for spaces.

I’d earlier, for illustration, detached some libraries, so I just need to put them back:

library(gdata)
library(ggplot2)

If you’re going to be working with cross-country data later, having some systematic way to refer to countries will be convenient. By systematic I mean not an English (or Chinese or German or Russian) name but something that will appear consistently in international databases. ISO codes are a good option and you might want to memorise them—or at least have some ready reference chart for them. ISO codes will be what I use to pull out countries next. Finally, reading from spreadsheets is a slow process generally. So, just as with the Maddison Project data, I’m going to save the data read in into R’s native format that I subsequently use instead.

theDataXLS  <- file.path(myDataDir, "IMF-WEO", "2015.10", "WEOOct2015alla.xlsx")
holdAggr.DF <- read.xls(theDataXLS, sheet="WEOOct2015alla", stringsAsFactors=FALSE)
rm(theDataXLS)

theDataXLS  <- file.path(myDataDir, "IMF-WEO", "2015.10", "WEOOct2015all.xlsx")
holdIndv.DF <- read.xls(theDataXLS, sheet="WEOOct2015all", stringsAsFactors=FALSE)
rm(theDataXLS)
myIMFweoAggr.file <- file.path(myDataDir, "IMF-WEO", "2015.10", "WEOOct2015Aggr.rds")
myIMFweoIndv.file <- file.path(myDataDir, "IMF-WEO", "2015.10", "WEOOCt2015Indv.rds")
saveRDS(holdAggr.DF, file=myIMFweoAggr.file)
saveRDS(holdIndv.DF, file=myIMFweoIndv.file)

When I want to use these IMF WEO data subsequently, instead of reading spreadsheets slowly, I just do

holdAggr.DF <- readRDS(myIMFweoAggr.file)
holdIndv.DF <- readRDS(myIMFweoIndv.file)
rm(myIMFweoAggr.file, myIMFweoIndv.file)
gdpIMFaggr.DF <- holdAggr.DF[holdAggr.DF$WEO.Subject.Code == "NGDPD",]
gdpIMFindv.DF <- holdIndv.DF[holdIndv.DF$WEO.Subject.Code == "NGDPD" &
 holdIndv.DF$ISO %in% c("CHN", "KOR", "SNG", "TWN", "USA"), ]
rm(holdAggr.DF, holdIndv.DF)

Thus far we have been able to do what we want keeping to just R’s basicdata.frame class. Indeed, R comes with basic method functions that understand data.frames, and thus can analyse and manipulate data contained there. However, more extensive timeseries work is better done using classes specialised to manipulate timeseries data, and for which more finely-tuned method functions are available. R contains several possible special data classes (including ts, mts, timeSeries, xts, and so on) for timeseries. Eric Zivot describes these in useful detail, and suggests why an applied researcher might choose one or another.

For our purposes hereafter, when we need timeseries specifics, we will use the zoo (Z’s Ordered Object) class (Zeileis and Grothendieck, 2005). In my view it is this class that displays the right tradeoff between ease of use and flexibility, not least for those researchers working with financial timeseries data: zoo critically adds, over the standard ts and mts classes, the ability to manage data whose indexes are irregularly spaced, i.e., don’t come in just annual, quarterly, or monthly frequencies. Such data might be perhaps spatial or perhaps drawn from a continuous underlying time record but just recorded at specific points. A researcher might want to work with exchange rates that are continuously traded throughout the day but only recorded at particular instants. For working with annual data this specific advantage—handling irregular but ordered indexes—does not yet make a difference but we might as well get used to zoo now: we’re going to need to put IMF’s reorganised data somewhere useful in any case.

A zoo object, like that of a data.frame, is just a two-dimensional array, and can be added to, extracted from, have parts removed, and generally manipulated using much the same conventions as for a data.frame. Its contents too run down columns, each of the latter making up a single variable.
Two differences from data.frames are key. First, a zoo object knows about timeseries structure and methods; this is good. Second, its body comprises just numbers; this needn’t be bad until when you want to analyze something other than numbers.

library(zoo)
library(dynlm)
begYear  <- 1980
endYear  <- 2015
nmbYears <- endYear - begYear + 1

Because a data.frame is typically more general than just numbers, we have to take added care when moving contents between data.frame and zooobjects. I use the R function sapply, together with as.numeric and gsubbelow to achieve the proper translation but there are other ways to do this. Again, if we didn’t apply gsub, human eye-recognisable big numbers with “,”’s in them will not be interpreted as numbers but simply coerced to NA.

In this application I have cheated by looking into the IMF spreadsheets for where the numerical data sit, rather than automating the procedure more elegantly. But the resulting code below is at least short and transparent. Simply adjust the next few lines appropriately when you use this on IMF data that might have had their formats altered. In software code whenever you see a number other than “0” or “1” and it’s not part of a name, you need to think about why something so special appears in what should otherwise be general. (So watch out on the “9” and “10” below.)

begColumn   <- 9
tmpDatMtr   <- sapply(gdpIMFaggr.DF[, begColumn:(begColumn+nmbYears-1)], 
 function (x) {as.numeric(gsub(",", "", as.character(x)))})
begColumn   <- 10
tmpHldMtr   <- sapply(gdpIMFindv.DF[, begColumn:(begColumn+nmbYears-1)],
 function (x) {as.numeric(gsub(",", "", as.character(x)))})

tmpDatMtr   <- rbind(tmpDatMtr, tmpHldMtr)
imfGDP.oo   <- zoo (t(tmpDatMtr), c(begYear:endYear))
names(imfGDP.oo) <- c(gdpIMFaggr.DF$Country.Group.Name, gdpIMFindv.DF$ISO)

rm(gdpIMFaggr.DF, gdpIMFindv.DF, tmpDatMtr, tmpHldMtr, begColumn)

For aggregates the IMF gives descriptions (such as “Middle East and North Africa”) and corresponding cryptic WEO Country Group Codes (603) but no really convenient label to use in coding. So we give up and just remember what columns in aggrsGDP.oo refer to what country groupings from the output to

print(names(imfGDP.oo))
##  [1] "World"                                                                   
##  [2] "Advanced economies"                                                      
##  [3] "Euro area "                                                              
##  [4] "Major advanced economies (G7)"                                           
##  [5] "Other advanced economies (Advanced economies excluding G7 and euro area)"
##  [6] "European Union"                                                          
##  [7] "Emerging market and developing economies"                                
##  [8] "Commonwealth of Independent States"                                      
##  [9] "Emerging and developing Asia"                                            
## [10] "Emerging and developing Europe"                                          
## [11] "ASEAN-5"                                                                 
## [12] "Latin America and the Caribbean"                                         
## [13] "Middle East, North Africa, Afghanistan, and Pakistan"                    
## [14] "Middle East and North Africa"                                            
## [15] "Sub-Saharan Africa"                                                      
## [16] "CHN"                                                                     
## [17] "KOR"                                                                     
## [18] "TWN"                                                                     
## [19] "USA"

Having seen that output, trash some series in specific rows, and clean up some names that we are more likely to want to use. At least the ISO codes, relatively memorable and convenient to use, can remain unchanged.

names(imfGDP.oo)[2]  <- "Advanced.Economies"
names(imfGDP.oo)[3]  <- "Euro.Area"
names(imfGDP.oo)[4]  <- "G7"
names(imfGDP.oo)[6]  <- "EU"
names(imfGDP.oo)[7]  <- "EMDE"
names(imfGDP.oo)[14] <- "MENA"

imfGDP.oo <- imfGDP.oo[, c(-5, -8:-10, -12, -13, -15)]

Now construct vectors of the economies and groupings to which we will want to pay closer attention:

indWrldG7EMDE <- match(c("World", "G7", "EMDE"), names(imfGDP.oo))
indG7EMDE     <- match(c("G7", "EMDE"), names(imfGDP.oo))
indUSACHN     <- match(c("USA", "CHN"), names(imfGDP.oo))

Using match here to create index vectors is much more convenient and less error-prone than trying to remember which column in the zoo objectimfGDP.oo corresponds to what aggregate grouping or economy.

Take a look at some of what we’ve put together:

thisTitle <- "World, G7, and Emerging Economies"
autoplot(imfGDP.oo[, indWrldG7EMDE], facets=NULL) + 
 geom_line(size=2) + myTStheme + ggtitle(thisTitle)

thisTitle <- "G7 and Emerging Economies"
autoplot(imfGDP.oo[, indG7EMDE], facets=NULL) + 
 geom_line(size=2) + myTStheme + ggtitle(thisTitle)

thisTitle <- "US and China"
autoplot(imfGDP.oo[, indUSACHN], facets=NULL) + 
 geom_line(size=2) + myTStheme + ggtitle(thisTitle)

As an exercise, I re-create here that extrapolated-trend graph with China data:

imfGDP.oo$logCHN <- log(imfGDP.oo$CHN)
thisTitle        <- "China"
theBegFit <- 1980
theEndFit <- 2000
theEndSmp <- 2015
olsFIT    <- dynlm(logCHN ~ trend(imfGDP.oo), data = imfGDP.oo, 
          start=theBegFit, end=theEndFit)
expTrendFitted <- function(x) {ifelse (x>=theBegFit & x<=theEndFit,
 exp(coef(olsFIT)[1] + ((x-theBegFit) * coef(olsFIT)[2])), NA)
}
expTrendExtrap <- function(x) {ifelse (x>=theEndFit+1 & x<=theEndSmp,
 exp(coef(olsFIT)[1] + ((x-theBegFit) * coef(olsFIT)[2])), NA)
}
thisTitle <- paste0(thisTitle, " GDP (US$bn, Market Exchange Rates)")
autoplot(imfGDP.oo$CHN, facets=NULL) +
 stat_function(fun=expTrendFitted, linetype=1, colour="blue", size=1.1) +
 stat_function(fun=expTrendExtrap, linetype=2, colour="blue", size=1.05) +
 geom_line(size=2) + myTStheme + ggtitle(thisTitle)

One of the most striking features in these data is how “Emerging Markets and Developing Economies” have assumed a dramatically larger footprint in the global economy. To be clear, the G7 economies, at least from the visual perspective in these graphs, have not slowed dramatically in their collective growth trajectory. Instead, it is that the emerging markets have just grown so much faster since the early 2000s.

thisTitle <- "Underlying Trends (5-year moving average). Trillions current US$"
tmpGDP.oo <- rollmean(imfGDP.oo[, indG7EMDE], 5, align="center")
autoplot(tmpGDP.oo/1000, facets=NULL) +
 geom_line(size=2) + myTStheme + ggtitle(thisTitle)

rm(tmpGDP.oo)
thisTitle  <- "G7-Emerging Economies gap as fraction of G7 GDP"
autoplot((imfGDP.oo$G7 - imfGDP.oo$EMDE) / imfGDP.oo$G7, facets=NULL) +
 geom_line(size=2, colour="dark blue") + myTStheme + ggtitle(thisTitle)

Some of that catch-up is of course due simply (i.e., arithmetically) to China. But the graph comparing aggregate GDP for China and the US shows that that can’t be the entire explanation.

For readers accustomed to thinking of cross-country comparison in per capita GDP, it is useful to remember why these aggregate GDP statistics are useful. Obviously, they wouldn’t be what someone would want to look at for, say, convergence in a neoclassical growth model. However, they are exactly what someone would need to assess shifts in the global balance of power—economic initially of course but then perhaps more generally. It is these statistics someone would want to use to gauge the capacity of different economies or groupings to drive or drag down global economic performance, or to measure needs and functions for appropriate global governance.

For the same reason, the analysis here looks not at GDP corrected for purchasing power parity but instead at market exchange rates. Again, it is these latter that matter for evaluating contribution to the global economy and for assessing global power shifts, while purchasing power parity adjustment is useful instead for estimating residents’ well-being.

A recurrent question given this perspective is, can the emerging economies continue to grow if the advanced ones stagnate? The preceding might suggest yes. Nonetheless, that the emerging economies might slow if the G7 undergoes a more prolonged secular stagnation is often suggested by the following calculations on growth rates:

imfGDP.oo$G7g   <- diff(log(imfGDP.oo$G7))
imfGDP.oo$EMDEg <- diff(log(imfGDP.oo$EMDE))
indG7EMDEg <- match(c("G7g", "EMDEg"), names(imfGDP.oo))

thisTitle <- "G7 and Emerging Economy annual growth rates"
autoplot(imfGDP.oo[, indG7EMDEg], facets=NULL) +
 geom_line(size=2) + myTStheme + ggtitle(thisTitle)

Notice how growth rates across the two groups seem, over time, to have moved closer and closer in sync with each other. We can confirm this visual impression by calculating cross-correlations:

earlySample <- as.character(seq(1981, 2000))
laterSample <- as.character(seq(2001, 2015))

cor (imfGDP.oo[earlySample, indG7EMDEg])
##         G7g EMDEg
## G7g   1.000 0.153
## EMDEg 0.153 1.000
cor (imfGDP.oo[laterSample, indG7EMDEg])
##         G7g EMDEg
## G7g   1.000 0.787
## EMDEg 0.787 1.000

The early part of the sample, up through 2000, the cross-correlation was 0.2 between growth rates in the G7 and Emerging Markets and Developing Economies. Towars the end of the sample, after 2000, that statistic had risen more than four-fold to 0.8. However, despite this higher cyclical correlation (ever “tighter coupling”), the long-term trend behaviour—as shown dramatically in the graphs—shows growth occurring in the emerging markets even without corresponding speed up in the G7. Some might even say this last feature shows “decoupling”.

Polity IV

No one – least of all its impeccably conscientious creators and maintainers – pretends that ideas as complicated as democracy or autocracy can be summarized in a single numerical index. Nonetheless, without some basis to start the discussion, we are just making up ideals as we go along, and make little progress. The Polity IV data give us that substantial first step.

The splashpage contains this disclaimer:

Polity-IV-disclaimer

I heartily recommend this form of words for all projects financed by governments of nation states.

(TBC – What I did with this for my Middle-Income Trap project.)

CONCLUSION

This writeup has described, for R manipulation, the access and use of a number of datasets central to studying the global economy. Because of its nature, this document is never finished. Everytime the author finds and uses a new dataset valuable to add to our understanding the world economy, that dataset’s description and manipulation appear here.

APPENDIX

The is is the code chunk for plotting multiple series neatly:

read_chunk(file.path(myRoutinesDir, "multplot-maddp.R"))
# This works only for my Maddison Project DF;
# I haven't found it worth coding more generally
# Thu Jan 07 16:33:21 2016 - Danny Quah
getSeries <- c("Year", "Economy", theSeries)
theAES    <- aes_string(x="Year", y=theSeries, group="Economy",
                         colour="Economy")
this.DF   <-
  MaddP.DF[(MaddP.DF$Economy %in% theEconomies) &
           (MaddP.DF$Year >= theBegSmpl) & (MaddP.DF$Year <= theEndSmpl),
           getSeries]

ggplot(data=this.DF, theAES) + geom_line(size=2) + 
       myTStheme + ggtitle(thisTitle)

rm(this.DF, theAES, getSeries)

This is the code chunk that implements the eyeballing-trend operation.

read_chunk(file.path(myRoutinesDir, "eyetrend-maddp.R"))
# This works only for my Maddison Project DF;
# I haven't found it worth coding more generally
# Thu Jan 07 16:33:21 2016 - Danny Quah
olsFIT <- lm(logPerCapGDP ~ Year,
              data=MaddP.DF[(MaddP.DF$Economy == theEconomy) &
              (MaddP.DF$Year >= theBegFit) & (MaddP.DF$Year <= theEndFit), ])

thisTitle <- paste0(theEconomy,
 ": log Per Capita GDP in constant 1990 Int. GK$")

this.DF <-
  MaddP.DF[(MaddP.DF$Economy == theEconomy) &
           (MaddP.DF$Year >= theBegFit) & (MaddP.DF$Year <= theEndSmp),
           c("Year", "perCapitaGDP", "logPerCapGDP")]

 ggplot(data=this.DF, aes(x=Year, y=logPerCapGDP)) + geom_line(size=2) +
 geom_segment(data=this.DF, aes(x=theBegFit, xend=theEndFit,
              y=coef(olsFIT)[1]+coef(olsFIT)[2]*theBegFit,
               yend=coef(olsFIT)[1]+coef(olsFIT)[2]*theEndFit),
              linetype=1, colour="blue", size=1.1) +
 geom_segment(data=this.DF, aes(x=theEndFit+1, xend=theEndSmp,
              y=coef(olsFIT)[1]+coef(olsFIT)[2]*(theEndFit+1),
               yend=coef(olsFIT)[1]+coef(olsFIT)[2]*theEndSmp),
              linetype=2, colour="blue", size=1.05) +
 myTStheme + ggtitle(thisTitle)

#
thisTitle      <- paste0(theEconomy,
": Per Capita GDP in constant 1990 Int. GK$")

expTrendFitted <- function(x) {ifelse (x>=theBegFit & x<=theEndFit,
  exp(coef(olsFIT)[1] + (x * coef(olsFIT)[2])), NA)
}
expTrendExtrap <- function(x) {ifelse (x>=theEndFit+1 & x<=theEndSmp,
  exp(coef(olsFIT)[1] + (x * coef(olsFIT)[2])), NA)
}
ggplot(data=this.DF, aes(x=Year, y=perCapitaGDP)) + geom_line(size=2) +
stat_function(fun=expTrendFitted, linetype=1,
               colour="blue", size=1.1) +
stat_function(fun=expTrendExtrap, linetype=2,
               colour="blue", size=1.05) +
myTStheme + ggtitle(thisTitle)

rm(olsFIT, expTrendFitted, expTrendExtrap)

Leave a Reply