Long Run World Economic Growth

Economics and International Development, LSE
Background Common to Projects
Long-Run World Economic Growth
October 2014
D. Quah

This writeup summarises some useful information on long-run world economic growth using data from the Maddison Project.

library(knitr)
opts_chunk$set(echo=TRUE, tidy=FALSE, warning=FALSE)
setwd("~/Dropbox/1/j/Code/2014.03-World-Growth")

The Maddison Project provides the now-standard data to study comparative economic growth over the very long run. These data are provided as an Excel spreadsheet. Unfortunately, that information is given in a way that is more useful visually than for data manipulation and analysis. So, typically, one would need to go through the following to put the data into a more usable form.

library(ggplot2)
library(gdata)
## gdata: read.xls support for 'XLS' (Excel 97-2004) files ENABLED.
## 
## gdata: read.xls support for 'XLSX' (Excel 2007+) files ENABLED.
## 
## Attaching package: 'gdata'
## 
## The following object is masked from 'package:stats':
## 
##     nobs
## 
## The following object is masked from 'package:utils':
## 
##     object.size
library(reshape2)
library(stringr)
theMaddisonXLS <- "~/Dropbox/1/j/data/Maddison-Project/mpd_2013-01.xlsx"
hold.DF        <- read.xls(theMaddisonXLS, skip=1, stringsAsFactors=FALSE)
colNames       <- as.character(hold.DF[1, ])
colNames[1]    <- "Year"
new.DF         <- hold.DF[-1, ]
names(new.DF)  <- colNames
MaddP.DF       <- melt(new.DF, id.vars="Year")
rm(new.DF, hold.DF, colNames)
MaddP.DF$value <- as.numeric(MaddP.DF$value)
MaddP.DF       <- MaddP.DF[!is.na(MaddP.DF$value), ]

names(MaddP.DF)[2] <- "Economy"
names(MaddP.DF)[3] <- "perCapitaGDP"

MaddP.DF$logPerCapGDP <- log(MaddP.DF$perCapitaGDP)
MaddP.DF$Economy      <- str_trim(MaddP.DF$Economy, side="both")
detach("package:stringr")
detach("package:reshape2")
detach("package:gdata")

(You can check that if we didn’t do this trimming of whitespace withstr_trim(), we wouldn’t get a match for “Sweden” in the codechunk to follow. Astounding but true: Spreadsheets and casual hand-editing are dangerous things to mix and then have lying around on a computer.)

Since I knew I would want to use these data repeatedly and I didn’t want to keep running the codechunk above, I saved my own copy of the Maddison Project GDP data in R’s native format:

myMaddP.file <- "~/Dropbox/1/j/data/Maddison-Project/maddp-201301-DQ.rds"
saveRDS(MaddP.DF, file=myMaddP.file)

(This is just for my personal use so I’m not packaging it up as a library.)

When I now need to use these data I no longer need to do all the stripping and cleaning after (slowly) reading a spreadsheet as above. Instead I just go:

MaddP.DF <- readRDS(myMaddP.file)

and to get growth rates to study subsequently:

MaddP.DF$annGrowth <- NA
for (anEconomy in unique(MaddP.DF$Economy)) {
  theYears <- MaddP.DF[MaddP.DF$Economy==anEconomy, ]$Year
  logPCGDP <- MaddP.DF[MaddP.DF$Economy==anEconomy, ]$logPerCapGDP
  theAnnGr <- rep(NA, length(logPCGDP))
  for (jLoop in 2:length(theAnnGr)) {
    if (theYears[jLoop-1] == theYears[jLoop]-1) {
      theAnnGr[jLoop] <- logPCGDP[jLoop] - logPCGDP[jLoop-1]
    }
  }
# Change to percent and then move into dataframe
  MaddP.DF[MaddP.DF$Economy==anEconomy, ]$annGrowth <- 100.0 * theAnnGr
  rm(theAnnGr)
}

(for those who know R, notice I can’t vectorise the inner loop using, say,diff as I need to check if the data are available sequentially in time).

What economies are we working with here?

unique(MaddP.DF$Economy)
##   [1] "Austria"                          
##   [2] "Belgium"                          
##   [3] "Denmark"                          
##   [4] "Finland"                          
##   [5] "France"                           
##   [6] "Germany"                          
##   [7] "(Centre-   North)           Italy"
##   [8] "Holland/     Netherlands"         
##   [9] "Norway"                           
##  [10] "Sweden"                           
##  [11] "Switzerland"                      
##  [12] "England/GB/UK"                    
##  [13] "12 W. Europe"                     
##  [14] "Ireland"                          
##  [15] "Greece"                           
##  [16] "Portugal"                         
##  [17] "Spain"                            
##  [18] "14 small WEC"                     
##  [19] "30 W. Europe"                     
##  [20] "Australia"                        
##  [21] "N. Zealand"                       
##  [22] "Canada"                           
##  [23] "USA"                              
##  [24] "W. Offshoots"                     
##  [25] "Albania"                          
##  [26] "Bulgaria"                         
##  [27] "Czecho-slovakia"                  
##  [28] "Hungary"                          
##  [29] "Poland"                           
##  [30] "Romania"                          
##  [31] "Yugoslavia"                       
##  [32] "7 E. Europe"                      
##  [33] "Bosnia"                           
##  [34] "Croatia"                          
##  [35] "Macedonia"                        
##  [36] "Slovenia"                         
##  [37] "Montenegro"                       
##  [38] "Serbia"                           
##  [39] "Kosovo"                           
##  [40] "F. Yugoslavia"                    
##  [41] "Czech Rep."                       
##  [42] "Slovakia"                         
##  [43] "F. Czecho-slovakia"               
##  [44] "Armenia"                          
##  [45] "Azerbaijan"                       
##  [46] "Belarus"                          
##  [47] "Estonia"                          
##  [48] "Georgia"                          
##  [49] "Kazakhstan"                       
##  [50] "Kyrgyzstan"                       
##  [51] "Latvia"                           
##  [52] "Lithuania"                        
##  [53] "Moldova"                          
##  [54] "Russia"                           
##  [55] "Tajikistan"                       
##  [56] "Turk-menistan"                    
##  [57] "Ukraine"                          
##  [58] "Uzbekistan"                       
##  [59] "F. USSR"                          
##  [60] "Argentina"                        
##  [61] "Brazil"                           
##  [62] "Chile"                            
##  [63] "Colombia"                         
##  [64] "Mexico"                           
##  [65] "Peru"                             
##  [66] "Uruguay"                          
##  [67] "Venezuela"                        
##  [68] "8 L. America"                     
##  [69] "Bolivia"                          
##  [70] "Costa Rica"                       
##  [71] "Cuba"                             
##  [72] "Dominican Rep."                   
##  [73] "Ecuador"                          
##  [74] "El Salvador"                      
##  [75] "Guatemala"                        
##  [76] "Haïti"                           
##  [77] "Honduras"                         
##  [78] "Jamaica"                          
##  [79] "Nicaragua"                        
##  [80] "Panama"                           
##  [81] "Paraguay"                         
##  [82] "Puerto Rico"                      
##  [83] "T. &amp; Tobago"                  
##  [84] "15 L. America"                    
##  [85] "21 Caribbean"                     
##  [86] "L. America"                       
##  [87] "China"                            
##  [88] "India"                            
##  [89] "Indonesia (Java before 1880)"     
##  [90] "Japan"                            
##  [91] "Philippines"                      
##  [92] "S. Korea"                         
##  [93] "Thailand"                         
##  [94] "Taiwan"                           
##  [95] "Bangladesh"                       
##  [96] "Burma"                            
##  [97] "Hong Kong"                        
##  [98] "Malaysia"                         
##  [99] "Nepal"                            
## [100] "Pakistan"                         
## [101] "Singapore"                        
## [102] "Sri Lanka"                        
## [103] "16 E. Asia"                       
## [104] "Afghanistan"                      
## [105] "Cambodia"                         
## [106] "Laos"                             
## [107] "Mongolia"                         
## [108] "North Korea"                      
## [109] "Vietnam"                          
## [110] "24 Sm. E. Asia"                   
## [111] "30 E. Asia"                       
## [112] "Bahrain"                          
## [113] "Iran"                             
## [114] "Iraq"                             
## [115] "Israel"                           
## [116] "Jordan"                           
## [117] "Kuwait"                           
## [118] "Lebanon"                          
## [119] "Oman"                             
## [120] "Qatar"                            
## [121] "Saudi Arabia"                     
## [122] "Syria"                            
## [123] "NA"                               
## [124] "UAE"                              
## [125] "Yemen"                            
## [126] "W. Bank &amp; Gaza"               
## [127] "15 W. Asia"                       
## [128] "Asia"                             
## [129] "Algeria"                          
## [130] "Angola"                           
## [131] "Benin"                            
## [132] "Botswana"                         
## [133] "Burkina Faso"                     
## [134] "Burundi"                          
## [135] "Cameroon"                         
## [136] "Cape Verde"                       
## [137] "Centr. Afr. Rep."                 
## [138] "Chad"                             
## [139] "Comoro Islands"                   
## [140] "Congo 'Brazzaville'"              
## [141] "Côte d'Ivoire"                   
## [142] "Djibouti"                         
## [143] "Egypt"                            
## [144] "Equatorial Guinea"                
## [145] "Eritrea &amp; Ethiopia"           
## [146] "Gabon"                            
## [147] "Gambia"                           
## [148] "Ghana"                            
## [149] "Guinea"                           
## [150] "Guinea Bissau"                    
## [151] "Kenya"                            
## [152] "Lesotho"                          
## [153] "Liberia"                          
## [154] "Libya"                            
## [155] "Madagascar"                       
## [156] "Malawi"                           
## [157] "Mali"                             
## [158] "Mauritania"                       
## [159] "Mauritius"                        
## [160] "Morocco"                          
## [161] "Mozambique"                       
## [162] "Namibia"                          
## [163] "Niger"                            
## [164] "Nigeria"                          
## [165] "Rwanda"                           
## [166] "Sao Tomé &amp; Principe"         
## [167] "Senegal"                          
## [168] "Seychelles"                       
## [169] "Sierra Leone"                     
## [170] "Somalia"                          
## [171] "Cape Colony/ South Africa"        
## [172] "Sudan"                            
## [173] "Swaziland"                        
## [174] "Tanzania"                         
## [175] "Togo"                             
## [176] "Tunisia"                          
## [177] "Uganda"                           
## [178] "Congo-Kinshasa"                   
## [179] "Zambia"                           
## [180] "Zimbabwe"                         
## [181] "3 Small Afr."                     
## [182] "Total Africa"                     
## [183] "Total World"

A bit of a mess, isn’t it? And this is after we’ve already done a str_trim().
Look, I am in awe of the amount of work that has gone into constructing these Maddison Project data. These researchers have my greatest respect. But the mess above is what happens when authors use names they go around making up, like “England/GB/UK”, “Holland/ Netherlands”, “(Centre- North) Italy”, “14 small WEC”, or “3 Small Afr.”; or when they insert peculiar characters like “&” or random invisible whitespace or other non-ASCII characters.

(Without ISO standardisation, of course, it’s inevitable we have to make things up. Still.)

It’s bad enough when we have to guess what these names mean in a spreadsheet; trying to write computer code to select things systematically from this is almost impossible.

But having done our best to clean these data, take a look at some selected growth experiences. For convenience and aesthetics, set up a theme for the charts to come.

myTStheme <- theme_classic() +
  theme(
    plot.title=element_text(size=rel(1.5)),
    legend.title=element_text(size=rel(1.5)),
    legend.text=element_text(size=rel(1.5)),
    legend.position=c(1,0), legend.justification=c(1,0),
    axis.text=element_text(size=rel(1.5)),
    axis.title=element_text(size=rel(1.5)),
    axis.title.y=element_blank()
 )

In the Appendix I set up R code to do this conveniently; that code will be re-used subsequently as well.

Using that code now, check out these four economies from 1870 to 2010:

theBegSmpl   <- 1870
theEndSmpl   <- 2010
theEconomies <- c("USA", "England/GB/UK", "France", "Sweden")
theSeries    <- "logPerCapGDP"
thisTitle    <- "log Per Capita GDP in constant 1990 Int. GK$"

source(file="./multplot.R", local=TRUE, echo=TRUE)
## 
## > getSeries <- c("Year", "Economy", theSeries)
## 
## > theAES <- aes_string(x = "Year", y = theSeries, group = "Economy", 
## +     colour = "Economy")
## 
## > this.DF <- MaddP.DF[(MaddP.DF$Economy %in% theEconomies) & 
## +     (MaddP.DF$Year >= theBegSmpl) & (MaddP.DF$Year <= theEndSmpl), 
## +     getSeries]
## 
## > ggplot(data = this.DF, theAES) + geom_line(size = 2) + 
## +     myTStheme + ggtitle(thisTitle)

## 
## > rm(this.DF, theAES, getSeries)
rm(thisTitle, theSeries, theEconomies, theEndSmpl, theBegSmpl)

To structure more clearly this information, I seek to eyeball an extrapolated trend in these per capita incomes data. As previously, I provide in the Appendix the R code to do this. Here, I just call that code after setting up the things I want to see.

Begin with US data:

theBegFit  <- 1870
theEndFit  <- 1980
theEndSmp  <- 2010
theEconomy <- "USA"
source(file="./eyetrend.R", local=TRUE, echo=TRUE)
## 
## > olsFIT <- lm(logPerCapGDP ~ Year, data = MaddP.DF[(MaddP.DF$Economy == 
## +     theEconomy) & (MaddP.DF$Year >= theBegFit) & (MaddP.DF$Year <= 
## +      .... [TRUNCATED] 
## 
## > thisTitle <- paste0(theEconomy, ": log Per Capita GDP in constant 1990 Int. GK$")
## 
## > this.DF <- MaddP.DF[(MaddP.DF$Economy == theEconomy) & 
## +     (MaddP.DF$Year >= theBegFit) & (MaddP.DF$Year <= theEndSmp), 
## +     c("Year", "perCapi ..." ... [TRUNCATED] 
## 
## > ggplot(data = this.DF, aes(x = Year, y = logPerCapGDP)) + 
## +     geom_line(size = 2) + geom_segment(data = this.DF, aes(x = theBegFit, 
## +     xend = .... [TRUNCATED]

## 
## > thisTitle <- paste0(theEconomy, ": Per Capita GDP in constant 1990 Int. GK$")
## 
## > expTrendFitted <- function(x) {
## +     ifelse(x >= theBegFit & x <= theEndFit, exp(coef(olsFIT)[1] + 
## +         (x * coef(olsFIT)[2])), NA)
## + }
## 
## > expTrendExtrap <- function(x) {
## +     ifelse(x >= theEndFit + 1 & x <= theEndSmp, exp(coef(olsFIT)[1] + 
## +         (x * coef(olsFIT)[2])), NA)
## + }
## 
## > ggplot(data = this.DF, aes(x = Year, y = perCapitaGDP)) + 
## +     geom_line(size = 2) + stat_function(data = this.DF, fun = expTrendFitted, 
## +     li .... [TRUNCATED]

## 
## > rm(olsFIT, expTrendFitted, expTrendExtrap)
rm(theBegFit, theEndFit, theEndSmp, theEconomy)

where presented are both the fitted linear trend for the log of US per capita GDP, and the resulting exponential trend for the original series. The solid line is the fitted trend; the dashed line the extrapolation.

Remarkably, a smooth exponential trend, fitted from 1870 through as early as 1980, gives a reasonable description on the out-of-sample post-1980 behaviour of US per capita GDP.

Do the same for China but now beginning in 1950 as it’s from then that the Maddison Project data provide a usefully uninterrupted sequence:

theBegFit  <- 1950
theEndFit  <- 1980
theEndSmp  <- 2010
theEconomy <- "China"

source(file="./eyetrend.R", local=TRUE, echo=TRUE)
## 
## > olsFIT <- lm(logPerCapGDP ~ Year, data = MaddP.DF[(MaddP.DF$Economy == 
## +     theEconomy) & (MaddP.DF$Year >= theBegFit) & (MaddP.DF$Year <= 
## +      .... [TRUNCATED] 
## 
## > thisTitle <- paste0(theEconomy, ": log Per Capita GDP in constant 1990 Int. GK$")
## 
## > this.DF <- MaddP.DF[(MaddP.DF$Economy == theEconomy) & 
## +     (MaddP.DF$Year >= theBegFit) & (MaddP.DF$Year <= theEndSmp), 
## +     c("Year", "perCapi ..." ... [TRUNCATED] 
## 
## > ggplot(data = this.DF, aes(x = Year, y = logPerCapGDP)) + 
## +     geom_line(size = 2) + geom_segment(data = this.DF, aes(x = theBegFit, 
## +     xend = .... [TRUNCATED]

## 
## > thisTitle <- paste0(theEconomy, ": Per Capita GDP in constant 1990 Int. GK$")
## 
## > expTrendFitted <- function(x) {
## +     ifelse(x >= theBegFit & x <= theEndFit, exp(coef(olsFIT)[1] + 
## +         (x * coef(olsFIT)[2])), NA)
## + }
## 
## > expTrendExtrap <- function(x) {
## +     ifelse(x >= theEndFit + 1 & x <= theEndSmp, exp(coef(olsFIT)[1] + 
## +         (x * coef(olsFIT)[2])), NA)
## + }
## 
## > ggplot(data = this.DF, aes(x = Year, y = perCapitaGDP)) + 
## +     geom_line(size = 2) + stat_function(data = this.DF, fun = expTrendFitted, 
## +     li .... [TRUNCATED]

## 
## > rm(olsFIT, expTrendFitted, expTrendExtrap)
rm(theEconomy, theEndSmp, theEndFit, theBegFit)

In stark contrast to the US, China’s per capita GDP follows post-1980 a completely different trajectory from its pre-1980 history. This, of course, is no surprise to anyone even vaguely aware of global economic developments. The value of the calculation is to quantify how large the change is that has occurred: if anyone thought growth trends were slow and difficult to change, China provides a striking and positive counter-example.

Finally, for comparison, let’s do this for the UK:

theBegFit  <- 1950
theEndFit  <- 1980
theEndSmp  <- 2010
theEconomy <- "England/GB/UK"

source(file="./eyetrend.R", local=TRUE, echo=TRUE)
## 
## > olsFIT <- lm(logPerCapGDP ~ Year, data = MaddP.DF[(MaddP.DF$Economy == 
## +     theEconomy) & (MaddP.DF$Year >= theBegFit) & (MaddP.DF$Year <= 
## +      .... [TRUNCATED] 
## 
## > thisTitle <- paste0(theEconomy, ": log Per Capita GDP in constant 1990 Int. GK$")
## 
## > this.DF <- MaddP.DF[(MaddP.DF$Economy == theEconomy) & 
## +     (MaddP.DF$Year >= theBegFit) & (MaddP.DF$Year <= theEndSmp), 
## +     c("Year", "perCapi ..." ... [TRUNCATED] 
## 
## > ggplot(data = this.DF, aes(x = Year, y = logPerCapGDP)) + 
## +     geom_line(size = 2) + geom_segment(data = this.DF, aes(x = theBegFit, 
## +     xend = .... [TRUNCATED]

## 
## > thisTitle <- paste0(theEconomy, ": Per Capita GDP in constant 1990 Int. GK$")
## 
## > expTrendFitted <- function(x) {
## +     ifelse(x >= theBegFit & x <= theEndFit, exp(coef(olsFIT)[1] + 
## +         (x * coef(olsFIT)[2])), NA)
## + }
## 
## > expTrendExtrap <- function(x) {
## +     ifelse(x >= theEndFit + 1 & x <= theEndSmp, exp(coef(olsFIT)[1] + 
## +         (x * coef(olsFIT)[2])), NA)
## + }
## 
## > ggplot(data = this.DF, aes(x = Year, y = perCapitaGDP)) + 
## +     geom_line(size = 2) + stat_function(data = this.DF, fun = expTrendFitted, 
## +     li .... [TRUNCATED]

## 
## > rm(olsFIT, expTrendFitted, expTrendExtrap)
rm(theEconomy, theEndSmp, theEndFit, theBegFit)

Get a final sense of the difference here by putting all these on the same graph.

theBegSmpl   <- 1950
theEndSmpl   <- 2010
theEconomies <- c("USA", "England/GB/UK", "China")
theSeries    <- "logPerCapGDP"
thisTitle    <- "log Per Capita GDP in constant 1990 Int. GK$"

source(file="./multplot.R", local=TRUE, echo=TRUE)
## 
## > getSeries <- c("Year", "Economy", theSeries)
## 
## > theAES <- aes_string(x = "Year", y = theSeries, group = "Economy", 
## +     colour = "Economy")
## 
## > this.DF <- MaddP.DF[(MaddP.DF$Economy %in% theEconomies) & 
## +     (MaddP.DF$Year >= theBegSmpl) & (MaddP.DF$Year <= theEndSmpl), 
## +     getSeries]
## 
## > ggplot(data = this.DF, theAES) + geom_line(size = 2) + 
## +     myTStheme + ggtitle(thisTitle)

## 
## > rm(this.DF, theAES, getSeries)
rm(thisTitle, theSeries, theEconomies, theEndSmpl, theBegSmpl)

A more useful perspective on the size of these cross-country differences come from the levels of the series themselves, not their logs.

myTStheme <- theme_classic() +
  theme(
    plot.title=element_text(size=rel(1.5)),
    legend.title=element_text(size=rel(1.5)),
    legend.text=element_text(size=rel(1.5)),
    legend.position=c(0.45,0.6), legend.justification=c(1,0),
    axis.text=element_text(size=rel(1.5)),
    axis.title=element_text(size=rel(1.5)),
    axis.title.y=element_blank()
 )
theBegSmpl   <- 1950
theEndSmpl   <- 2010
theEconomies <- c("USA", "England/GB/UK", "China")
theSeries    <- "perCapitaGDP"
thisTitle    <- "Per Capita GDP in constant 1990 Int. GK$"

source(file="./multplot.R", local=TRUE, echo=TRUE)
## 
## > getSeries <- c("Year", "Economy", theSeries)
## 
## > theAES <- aes_string(x = "Year", y = theSeries, group = "Economy", 
## +     colour = "Economy")
## 
## > this.DF <- MaddP.DF[(MaddP.DF$Economy %in% theEconomies) & 
## +     (MaddP.DF$Year >= theBegSmpl) & (MaddP.DF$Year <= theEndSmpl), 
## +     getSeries]
## 
## > ggplot(data = this.DF, theAES) + geom_line(size = 2) + 
## +     myTStheme + ggtitle(thisTitle)

## 
## > rm(this.DF, theAES, getSeries)
rm(thisTitle, theSeries, theEconomies, theEndSmpl, theBegSmpl)

Remember, however, that this is for per capita GDP and obviously therefore does not take into account the sizes of the different populations.

APPENDIX

The is is the code chunk for plotting multiple series neatly:

read_chunk("./multplot.R")
getSeries <- c("Year", "Economy", theSeries)
theAES    <- aes_string(x="Year", y=theSeries, group="Economy",
                         colour="Economy")
this.DF   <-
  MaddP.DF[(MaddP.DF$Economy %in% theEconomies) &
           (MaddP.DF$Year >= theBegSmpl) & (MaddP.DF$Year <= theEndSmpl),
           getSeries]

ggplot(data=this.DF, theAES) + geom_line(size=2) + 
       myTStheme + ggtitle(thisTitle)

rm(this.DF, theAES, getSeries)

This is the code chunk that implements the eyeballing-trend operation.

read_chunk("./eyetrend.R")
olsFIT <- lm(logPerCapGDP ~ Year,
              data=MaddP.DF[(MaddP.DF$Economy == theEconomy) &
              (MaddP.DF$Year >= theBegFit) & (MaddP.DF$Year <= theEndFit), ])

thisTitle <- paste0(theEconomy,
 ": log Per Capita GDP in constant 1990 Int. GK$")

this.DF <-
  MaddP.DF[(MaddP.DF$Economy == theEconomy) &
           (MaddP.DF$Year >= theBegFit) & (MaddP.DF$Year <= theEndSmp),
           c("Year", "perCapitaGDP", "logPerCapGDP")]

 ggplot(data=this.DF, aes(x=Year, y=logPerCapGDP)) + geom_line(size=2) +
 geom_segment(data=this.DF, aes(x=theBegFit, xend=theEndFit,
              y=coef(olsFIT)[1]+coef(olsFIT)[2]*theBegFit,
               yend=coef(olsFIT)[1]+coef(olsFIT)[2]*theEndFit),
              linetype=1, colour="blue", size=1.1) +
 geom_segment(data=this.DF, aes(x=theEndFit+1, xend=theEndSmp,
              y=coef(olsFIT)[1]+coef(olsFIT)[2]*(theEndFit+1),
               yend=coef(olsFIT)[1]+coef(olsFIT)[2]*theEndSmp),
              linetype=2, colour="blue", size=1.05) +
 myTStheme + ggtitle(thisTitle)

#
thisTitle      <- paste0(theEconomy,
": Per Capita GDP in constant 1990 Int. GK$")

expTrendFitted <- function(x) {ifelse (x>=theBegFit & x<=theEndFit,
  exp(coef(olsFIT)[1] + (x * coef(olsFIT)[2])), NA)
}
expTrendExtrap <- function(x) {ifelse (x>=theEndFit+1 & x<=theEndSmp,
  exp(coef(olsFIT)[1] + (x * coef(olsFIT)[2])), NA)
}
ggplot(data=this.DF, aes(x=Year, y=perCapitaGDP)) + geom_line(size=2) +
stat_function(data=this.DF, fun=expTrendFitted, linetype=1,
               colour="blue", size=1.1) +
stat_function(data=this.DF, fun=expTrendExtrap, linetype=2,
               colour="blue", size=1.05) +
myTStheme + ggtitle(thisTitle)

rm(olsFIT, expTrendFitted, expTrendExtrap)

Quickly Starting on R

R-stat

Short notes upon (re) starting to use R (also here)
D. Quah
Economics and International Development, LSE
October 2014

This document is a short guide for someone starting R, or just coming back to it. The writeup is terse and does not seek to explain matters fully. Instead, it is intended as a quick reference for the reader to get going fast and then on to other work.

In this writeup you will:

  1. install R on your own computer (unless you have already done so);
  2. install RStudio (optional but useful);
  3. do one quick basic operation in R to make sure your system is now capable of running R.

MOTIVATION

I like to use R because:

  • R is open source software, and is freely available; you don’t have to be logged into the LSE (or your work) network to use it; you can use it without being connected to the Internet; you can perform your research while on a long flight or on the beach; you can freely install and use R on as many machines as you like;
  • a large community of scientists across many disciplines works with R regularly, and post online questions, answers, and experiences regarding it (e.g., “R vs Stata: .. Datasets”, “R – a second language”, and many others);
  • R is a language besides being statistical software, so you can extend R to pretty much any application your mind can imagine;
  • R encourages open science—literate programming and reproducible research—and thus makes convenient the replication of empirical findings;
  • worldwide, network servers from Argentina and Colombia through Vietnam and New Zealand carry its latest versions;
  • in poorer societies these features of R promote research and human capital accumulation, so public spending can then usefully go elsewhere rather than on costly licensing arrangements.
  • R is constantly being improved.

Convenient summaries of R commands are available (e.g., the cheatsheet or the Wikibook) but won’t of course be necessarily the best way to start learning to use the software. Books on R (e.g., Michael Crawley 2012 or its earlier first edition) are similarly useful as references but, again, might not always be where someone should head first to get going quickly.

Instead, what I’ve found useful to start is simply to cut and paste from what other people have already written that is most closely related to what I want to do. I intend the exercises that follow to give you that kind of a base so you can then get going on your own research. There is nothing holy or admirable or morally uplifting about writing code from scratch when others have already done so. Our primary goal on this journey is to find out things about the world; aesthetic is secondary.

Before plunging in, some points that many first-time users might not routinely think about:

  1. To run lines of code, you have to be totally obsessive about getting things exactly right [sometimes you’re lucky and things work anyway even if you slip up, but it’s best not to rely on that].
  2. If something appears in quotes, i.e., like “[…]”, make sure you put those quotes in exactly: Double quotes ” are different from single quotes ’. Use the right ones.
  3. If a name or a command is UPPER CASE or lower case, make sure that’s exactly how you type it. R distinguishes case.
  4. Sometimes R chatters back at you, with no action required back from you. Sometimes it tells you something you need to fix. Either way, pay attention to what it says, even if only to ignore it after you get the meaning.

R, RStudio, and PERL

R is the core collection of routines for statistical computing, while RStudio provides a convenient front end to R. The way that I operate, RStudio works best for me. Others might prefer to engage with R directly, or use a different front-end environment to R.

For many things I do, it is convenient to have R draw on the added functionality of the (separate) PERL language. As one example, to read data from Excel spreadsheets, R uses PERL modules—previously written by others and made freely available.

Therefore, for some, R alone suffices; for me, I want all three.

(Others might wish to use R in tandem with yet other additional software. They’re free to do so.)

Install PERL

If you’re on Mac OS X or Linux, you can skip this as you already have Perl on your machines. If, however, you’re on Windows, point your Internet browser here and download and install Strawberry Perl for Windows.

Install R

For R, point your Internet browser at this landing page. This gives you information on R generally and shows you a link to download R. Go there and select from the list a CRAN Mirror nearest you. I chose the one at Imperial College but it doesn’t much matter: they all work the same way. If you’re reading this document from Beijing, say, you might of course want to choose a different CRAN Mirror. Once you’ve selected the mirror, choose “Download R for Windows” or “Download R for (Mac) OS X”, or “Download R for Linux’’—depending on your system. Run that file to install R on your machine.

Install RStudio

Again, this is optional but RStudio provides a clean and convenient interface to use R. Point your browser here and choose the version appropriate for your platform. The website actually guesses what you’re going to need and serves that up for you as a lead recommendation. If, however, the website gets it wrong, you go ahead and choose what will work for you. Download and install.

(If you really want to get fancy, you can select the RStudio Server but that’s only if you run your own Linux server, in which case you likely shouldn’t even be reading this document.)

Running R

With a fresh R on your computer, depending on what you want to do, you will need to install some libraries first—but you’ll only need to do this once. Here, we want to read data from an Excel spreadsheet into R, so we need to augment R with some libraries.

install.packages()

Fire up RStudio and go into “Tools/Install Packages” (i.e., mouse over to the “Tools” menu item, click on it, and activate the “Install Packages…” entry). You’ll see that some defaults have already been filled in: if you know what you want with alternative values to these defaults, go ahead and plug in those values. Otherwise, just leave them. Type gdata into “Packages (separate multiple…)” . Make sure “Install dependencies” is activated, and then press “Install”. This gdata library is what will let R read data in Excel spreadsheets. To install this library, your machine needs to be online, as R will reach out into its servers, wherever they are, to retrieve this code and then install the library on your machine.

setwd()In RStudio (if you’re there still, or if you’ve just come back to this, start up RStudio and then) set your working directory by “Session/Set Working Directory …/Choose Directory”. This is one way to do it; alternatively, you can hit return after keying into the panel labelledConsole the line of R code:

setwd("~/Dropbox/1/t/courses/ec402/2013t14/w")

where, obviously, you substitute for the phrase in green your own working directory. The tilde ~ denotes your home directory, wherever that might be. (Mine is either C:/Users/DQUAH or /home/dquah, depending on whether I happen to be using Windows or Linux right at that moment. The nice thing about using the tilde is that my code then works the same regardless where I am.) Alternatively, you can copy and paste that preceding line into your own RStudio Console, edit the relevant clause with your keyboard, and then hit return. You can do this for any of the chunks of R code that follow.

To make sure you’ve got things under control, save this, i.e., mouse to “File/New File/R Script” and then copy the one line of code we’ve just executed into the newly-appeared top left-hand window in RStudio (that new window will typically be called “Untitled1”), and then go like all “File/Save As … ” on RStudio. I’m saving this as the R script e1.R.

This will be a first R program, containing just the one setwd() line. I know I’m going to want to be adding to this R program to do my analysis. But for now I just want to make sure, if I can help it, that my work doesn’t go away unexpectedly.

If you look at this working directory now on your machine, you’ll see it has at least the file e1.R, or whatever you decided to call your R script. You can take that peek using Windows Explorer or a bash Terminal or the Finder… whatever. You can also get this same information from within RStudio by hitting return after keying into the Console (i.e., by executing the line):

dir()

You should see a listing of the directory that you’ve setwd’d to, including at least the file e1.R (and whatever else might be there). So, perhaps something like this:

[1] “e1.R” “WB-GDP-cleaned-DQ.xls”

(where WB-GDP-cleaned-DQ.xls is the Excel spreadsheet with which I happen to be working.)

You can execute R code by keying it directly into the RStudio console or more typically opening an R script (such as e1.R — which is just puretext that you can edit in any text editor) from RStudio, making sure your RStudio focus is on that R Script panel, and then going “Code/Run Region/Run All. When you do the latter, you’ll see RStudio Console automatically stepping through your code.

Now go get a drink, stretch your legs, do some taiji.

DATA IN DATAFRAMES

The key object that we will use to hold data is what R calls a dataframe.

A dataframe is a 2-dimensional array but like most modern things on computers, a dataframe can hold text, numbers, items of logic, calendar dates (i.e., not just as numbers but recognising the structure of quarters, months, and days), and possibly even more complicated objects as its entries, all freely intermingled. (Matrices of just numbers are very last-century.)

Among other reasons the dataframe is key for our work is that a dataframe is what R builds when it reads in an Excel spreadsheet. So, for instance, if we have a spreadsheet 2014.01-Poverty+Growth-DQ.xlsx in the folder ~/Dropbox/1/j/data/Global-Distribution, we can read the data in it, in its different sheets, into different dataframes:

library(gdata)
## Warning: package 'gdata' was built under R version 3.1.1
setwd("~/Dropbox/1/j/data/Global-Distribution/")
theDataXLS      <- "2014.01-Poverty+Growth-DQ.xlsx"
Country.Info.DF <- read.xls(theDataXLS, sheet="Country-Info")
World.Pov.DF    <- read.xls(theDataXLS, sheet="WB-Pov")
World.GNI.DF    <- read.xls(theDataXLS, sheet="WB-GNI-pc")

From the code just run and the earlier chunk, you’ll notice that I can use spreadsheets saved in either “.xls” or “.xlsx” formats: the code in gdatatakes into account which of the two I happen to be using when I call read.xls().

Unlike, say, computer systems that need a specific file extension to tell them what kind of a file is being used, R doesn’t care what I name the objects I create within it. Nonetheless, although of course you don’t have to do this, I like putting “.DF” at the ends of the names to my dataframes as doing so helps me remember what they are.

Also, because I often need to read the R code I’ve written and to understand its logic quickly, I’m a little obsessive about how my code is formatted. So, in the preceding I’ve lined up the assignment <- symbols. Again, not everyone needs to do this and most of the time R simply doesn’t care how its code looks.

CONCLUSION

This document has provided brief notes as a quick guide for someone starting to use R (or coming back to it).

Modern computing platforms and R allow multiple pathways to achieve any given end goal. The setup in this document prepares a system that ends up looking like mine; others, however, might prefer a different organizational structure for their work.

REFERENCES

  1. R Cheatsheet
  2. R Programming Wikibook
  3. Chang, Winston. Cookbook for R
  4. Crawley, Michael. 2012. The R Book or its earlier first edition
  5. Kabacoff, Rob. 2012. Quick-R
  6. The R Manuals
  7. R Tutorial: An R Introduction to Statistics
  8. R Tutorial: Introduction
  9. R for Econometrics