This is just a simple example of how to download and visualize data from the web by using the R-Project framework. Specifically data tables including GDP values from the world-bank data section are used which include absolute GDP per country data from 1980 to 2012.
http://data.worldbank.org/indicator/NY.GDP.MKTP.CD/countries/1W?display=default
There are 7 tables including 4 – 5 years of GDP data each. What this script does is to download each of this tables and merge it together into a single data frame which makes the data easily accessible for further analysis.
If we simply sum up all the single absolute GDP values per year we get the world GDP values for the period 1980 – 2012. The graphic below shows this development, measured in billions of USD.
Its easy to see that there are two periods of stronger growth as well as a period of stagnation around 1996 – 2002, which might be explained trough the asia financial crisis in 1997/98 and the dot com bubble turmoil. But it is interesting that, as compared to the impact of the global financial crisis in 2007/08, there was no severe setback in the world GDP output in this earlier period. The impact of the subprime crisis driven, globaly spread, financial crisis in 2007 – 2008 is easily observable in the graphic above, though it comes with an expected time delay of one year. The global output value for the year 2007/08 was still growing while the financial downturn was already spreading on a global level (this can be seen by taking a look at different major stock market indices around the world – which i will try to show in an other post). The setback in GDP output came one year later in 2009, where we can see a strong setback in the plot, but with a recovery in output another year later in 2010. Here is an other example of a simple output for the seven wealthiest countries in the world, measured by their absolute GDP value, from 1980 to 2012.The similarity of the US output compared to the world GDP output is very obvious at first sight, showing the same setback in 2009 but no stagnation trough the 1998 – 2002 period. The fact that there is no setback in GDP output trough that period might indicate a week relationship between the financial turmoil in that time and the real economic output.
An other interesting fact is the growth of Chinas output, especially from 2002 onwards. This opens up the question on further resaerch on the reasons for this rapid growth and if some kind of regime switch occured turing that time . This rapid growth change lets China take the worlds second place, measured inabsolute GDP output, which makes China the second riches nation in the world, measured in absolute GDP value, from 2009 onwards.
The last graphic is for the 8-14 ranked countries, measured by their absolute GDP USD value.
1 2 |
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - # # Programming Part |
Ok, so here is the code:
1 2 3 4 5 6 7 8 9 |
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - # # additional packages # install.packages("XML") # install.packages("gridExtra") library("XML") library("gridExtra") # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - # |
Basically i use two packages for these simple analytics. The “XML” packages is used to gather the data from the worldbank database and the “gridExtra” package is used for the graphical representation of a given dataset. The rest of the functionality already comes with the standard installation of the r-project software package.
The first step in the process is to download all the available data sets from the worldbank database and merge it together into one single data frame. This provides us with the opportunity of easy data handling. (Straight forward i just downloaded all the sets of data. Of course it would be possible to compress and automate this process by using loops, … , but i will try to cover this possibility at an other point here. As well as the automated gathering of data from different web sources, such as worldbank data, imf data, yahoo data, google data, quandl data, …)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 |
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - # # Download World GDP Data gdp6 <- readHTMLTable("http://data.worldbank.org/indicator/NY.GDP.MKTP.CD/countries/1W?display=default") gdp5 <- readHTMLTable("http://data.worldbank.org/indicator/NY.GDP.MKTP.CD/countries/1W?page=1&display=default") gdp4 <- readHTMLTable("http://data.worldbank.org/indicator/NY.GDP.MKTP.CD/countries/1W?page=2&display=default") gdp3 <- readHTMLTable("http://data.worldbank.org/indicator/NY.GDP.MKTP.CD/countries/1W?page=3&display=default") gdp2 <- readHTMLTable("http://data.worldbank.org/indicator/NY.GDP.MKTP.CD/countries/1W?page=4&display=default") gdp1 <- readHTMLTable("http://data.worldbank.org/indicator/NY.GDP.MKTP.CD/countries/1W?page=5&display=default") gdp <- readHTMLTable("http://data.worldbank.org/indicator/NY.GDP.MKTP.CD/countries/1W?page=6&display=default") gdp <- gdp[[1]] gdp <- as.data.frame(gdp) gdp.all <- gdp[,1:5] gdp1 <- gdp1[[1]] gdp1 <- as.data.frame(gdp1) gdp.all <- cbind(gdp.all,gdp1[,2:6]) # repeat that another 4 times gdp6 <- gdp6[[1]] gdp6 <- as.data.frame(gdp6) gdp.all <- cbind(gdp.all,gdp6[,2:5]) #save(gdp.all,file="gdp_all.R") # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - # |
Once we gathered all the GDP data in one frame it is necessary to transform and clean some of the data. This is due to the fact that some of the cells in the data frame have no values (i.e. NA values in R) or they are in the wrong data formate (e.g. a character instead of numeric)
Basically we replace all NA values with a numerical 0, set all entries to numerical values and add the column names, which are the respective years of the data series, to the data frame.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - # # Data cleaning load("gdp_all.R") gdp.all.new <- data.frame() for(e in 1:length(gdp.all[,1])){ p<-NULL for(i in 2:length(gdp.all[1,])){ p[i-1]<-as.numeric(gsub(",","",gdp.all[e,i])) } p[is.na(p)]<-0 gdp.all.new<-rbind(gdp.all.new,p) } gdp.all.new <- cbind(as.character(gdp.all[,1]),gdp.all.new) colnames(gdp.all.new)<- c("Coutry","1980","1981","1982","1983","1984","1985","1986","1987","1988","1989","1990" ,"1991","1992","1993","1994","1995","1996","1997","1998","1999","2000","2001","2002" ,"2003","2004","2005","2006","2007","2008","2009","2010","2011","2012") rownames(gdp.all.new) <- as.character(gdp.all.new[,1]) gdp.all.new <- gdp.all.new[,2:34] jpeg(filename = "gdp_1980-2012_data.jpg", width=1280,height=280,res=100) grid.table(head(gdp.all.new[,1:10])) dev.off() #save(gdp.all.new,file="gdp_all_new.R") # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - # |
The results from this process can the seen in the plot below, which is a presentation of a part of the whole compiled table.
Output 1:
The next step is the graphical representation of the data which is the main goal of this short research. Therefore we apply the transpose function for the data frame and use the “xts” framework for time series handling and graphical output.
For the first plot (the cumulative world GDP) we have to process an other step before we can present the data. The single values for each country have to be summed up for each year to get the value for the world GDP of that specific year.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - # # Graphical representations load("gdp_all_new.R") gdp.all.new <- t(gdp.all.new) gdp.all.new.xts <-as.xts(gdp.all.new, order.by = as.Date(gsub(" ","-",paste(rownames(gdp.all.new),"01-01")), "%Y-%m-%d")) gdp.sum.new.xts <-as.xts(apply(gdp.all.new.xts,1,sum), order.by = as.Date(gsub(" ","-",paste(rownames(gdp.all.new),"01-01")), "%Y-%m-%d")) options(scipen=100) # Plot of the complete world GDP jpeg(filename = "gdp_1980-2012.jpg", width=880,height=880,res=100) plot(gdp.sum.new.xts/1000000000,type="l",main="World GDP 1980 - 2012 (in billion $)") dev.off() # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - # |
Output 2:
Plots for the top 7 countries measured in GDP value:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - # # Order Countries by GDP size top 7en gdp.all.new.xts.order <- gdp.all.new.xts[,order(gdp.all.new.xts[33,], decreasing = TRUE)] jpeg(filename = "gdp_1980-2012_top7.jpg", width=880,height=880,res=100) plot(gdp.all.new.xts.order[,1]/1000000000,type="l",main="GDP 1980 - 2012 (in billion $)",ylim=c(0,16000)) myColors <- c("black","red", "darkgreen", "goldenrod", "darkblue", "darkviolet","blue") for(i in 2:7){ lines(gdp.all.new.xts.order[,i]/1000000000, col = myColors[i]) } legend("topleft", legend = colnames(gdp.all.new.xts.order[,1:7]), lty = 1, col = myColors) dev.off() # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - # |
Output 3:
Plots for the top 8-14 countries measured in GDP value:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - # # Order Countries by GDP size top 8-14en jpeg(filename = "gdp_1980-2012_top8-14.jpg", width=880,height=880,res=100) plot(gdp.all.new.xts.order[,8]/1000000000,type="l",main="GDP 1980 - 2012 (in billion $)",ylim=c(0,2500)) myColors <- c("black","red", "darkgreen", "goldenrod", "darkblue", "darkviolet","blue") for(i in 9:14){ lines(gdp.all.new.xts.order[,i]/1000000000, col = myColors[i-7]) } legend("topleft", legend = colnames(gdp.all.new.xts.order[,8:14]), lty = 1, col = myColors) dev.off() # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - # |
Output 4:
– – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – –
Please note that i do not always try to produce perfectly efficient code in a programming attitude but rather focus on the results and sociological, econometrical, … , insights gathered trough the process of data analytics.
If my focus especially belongs to coding i will try to catch that with an unique post, just for this specific topic. Nevertheless i strongly appreciate suggestions, additions and/or correction in both the analytical insights and the data analytical process and programming code.
– – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – –
Data Source: data.worldbank.org/
A more easy way for the download process would be the usage of the WDI package:
http://cran.r-project.org/web/packages/WDI/index.html
Regarding the growth of china, Xiaodong Zhu provides a good overview of chinas economic development, published in the Journal of Economic Perspectives—Volume 26.
http://pubs.aeaweb.org/doi/pdfplus/10.1257/jep.26.4.103