{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The atmospheric CO2 content in the year 2030\n",
    "--------------------------------------------------------------------------------\n",
    "\n",
    "Use Gaussian process regression to predict the global atmospheric CO2 concentration in the year 2030. Obviously, this is a question of great relevance. To make your prediction, you have available measurements taken monthly on the vulcano Mauna Loa on Hawai'i as shown below.  "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "library(mvtnorm)                                 # multivariate Gaussian\n",
    "library(repr)\n",
    "options(repr.plot.width=8, repr.plot.height=5.5) # set plot size"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "options(repr.plot.width=15, repr.plot.height=5.5) \n",
    "d    <- read.table(\"monthly_in_situ_co2_mlo.csv\", skip=57, sep=\",\")\n",
    "mask <- d$V5 > 0\n",
    "year <- d$V4[mask]\n",
    "co2  <- d$V5[mask]\n",
    "plot(year,co2, main=\"Mauna Loa CO2 measurements\", xlab=\"year\", ylab=\"CO2 concentration [ppm]\", type='l')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Here comes an array of kernel functions to work with**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "k1 <- function(sigf,l,dx) sigf^2*exp(-0.5*(dx/l)^2)        # squared exponential, very often first and only trial \n",
    "kn <- function(sigy,dx) sigy^2*diag(1.0,nrow(dx),ncol(dx)) # for noise contribution\n",
    "k2 <- function(sigf,l1,l2,dx) sigf^2*exp(-0.5*(dx/l1)^2-2.0*(sin(pi*dx)/l2)^2)   # damped periodic kernel \n",
    "                                                                                 # with period 1\n",
    "k4 <- function(sigf,l,dx) sigf^2*exp(-2.0*(sin(pi*dx)/l)^2)                      # periodic kernel with period 1\n",
    "k3 <- function(sigf,l,alpha,dx) sigf^2*(1+0.5*(dx/l)^2/alpha)^alpha              # rational quadratic kernel\n",
    "k5 <- function(sigf,l,dx) {                                                      # Matérn 3/2\n",
    "  d <- sqrt(3)*abs(dx)/l\n",
    "  return(sigf*(1+d)*exp(-d))\n",
    "}\n",
    "k6 <- function(sigf,l,dx) {                                                      # Matérn 5/2\n",
    "  d <- sqrt(5)*abs(dx)/l\n",
    "  return(sigf*(1+d+(d^2)/3)*exp(-d))\n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Here an implementation of Gaussian process regression**\n",
    " \n",
    "- GPtrain is an optimizer for the hyperparameters\n",
    "\n",
    "- GPpred produces the regression"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "GPtrain <- function(kernel,theta,x,y,noise=0) {\n",
    "  dx <- outer(x,x,'-')\n",
    "  nx <- length(x)\n",
    "  yq <- y-mean(y)\n",
    "  logl <- function(theta) { \n",
    "    K     <- kernel(theta,noise,dx)\n",
    "    Lup   <- chol(K)\n",
    "    invKy <- backsolve(Lup,forwardsolve(t(Lup),yq))\n",
    "    return(0.5* (t(yq) %*% invKy + 2.0*sum(log(diag(Lup))) + nx*log(2*pi)))\n",
    "  }\n",
    "  return(optim(theta,logl,method=\"Nelder-Mead\"))\n",
    "} \n",
    "\n",
    "GPpred <- function(kernel,theta,x,y,xs,noise=0) { \n",
    "  ymean <- mean(y)\n",
    "  K     <- kernel(theta, noise, dx=outer(x,x,'-'))\n",
    "  Ks    <- kernel(theta, 0,     dx=outer(xs,x,'-'))\n",
    "  Kss   <- kernel(theta, 0,     dx=outer(xs,xs,'-'))\n",
    "  Lup   <- chol(K)\n",
    "  invKy <- backsolve(Lup,forwardsolve(t(Lup),(y-ymean)))\n",
    "  ys    <- drop(Ks %*% invKy) + ymean\n",
    "  v     <- apply(Ks, MARGIN=1, FUN=forwardsolve, l=t(Lup))\n",
    "  covys <- Kss - t(v) %*% v\n",
    "  return(list(ys=ys,covys=covys))\n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Here comes an example solution that helps but is not the real stuff:** "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "co2_with_noise <- function(theta, noise, dx) {\n",
    "  sigf  <- theta[1]\n",
    "  l     <- theta[2]\n",
    "  k    <- k1(sigf,l,dx)\n",
    "    \n",
    "  if (noise > 0) return(k+kn(noise,dx))\n",
    "  else           return(k)\n",
    "}\n",
    "\n",
    "x    <- year[550:699]\n",
    "y    <- co2[550:699]\n",
    "xs   <- 2015.0 + seq(0,15,0.1)\n",
    "sigy <- 1.0\n",
    "\n",
    "o <- GPtrain(co2_with_noise,c(50.0, 35.0),x,y,noise=sigy)\n",
    "cat(\"Optimal hyperparameters: \", o$par, \"\\n\")\n",
    "cat(\"Counts and convergence:\"  , o$counts, o$convergence, \"\\n\")\n",
    "p <- GPpred(co2_with_noise,o$par,x,y, xs,noise=sigy)\n",
    "plot(year,co2, main=\"Mauna Loa CO2 measurements\", xlab=\"year\", ylab=\"CO2 concentration [ppm]\", type='l', \n",
    "     xlim=c(2010,2030), ylim=c(380,440))\n",
    "lines(xs,p$ys,type=\"l\", col=\"green\")\n",
    "lines(x,y,type=\"l\",col=\"red\")\n",
    "d <- 1.96*sqrt(abs(diag(p$covys))) # 2*sigma pointwise uncertainties\n",
    "polygon(c(xs,rev(xs)), c(p$ys-d,rev(p$ys+d)), border=NA, \n",
    "        col = rgb(red=0.2,green=0.8,blue=0,alpha=0.3))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Predicting the CO2 content of the atmosphere in the year 2030\n",
    "+ periodic signal needs to be factored in (use k4?, or k2?)\n",
    "+ work incrementally when increasing kernel complexity\n",
    "  `optim()` with Nelder-Mead fairly simple optimizer\n",
    "+ could a Markov sampler put to use?\n",
    "+ taking all data for training the GP produces large matrices slowing   down the calculations;\n",
    "  are old data particularly relevant?\n",
    "+ instead of assuming a noise level introducing the noise as another free parameter\n",
    "+ first using training data upto - say - 2016 and check for \"prediction\" is perhaps a good way to estimate\n",
    "  the actual capability to make predictions\n",
    "  \n",
    "In any case, Master Yoda states correctly: \"Hard to see the future is!\""
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "R",
   "language": "R",
   "name": "ir"
  },
  "language_info": {
   "codemirror_mode": "r",
   "file_extension": ".r",
   "mimetype": "text/x-r-source",
   "name": "R",
   "pygments_lexer": "r",
   "version": "3.5.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}