{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/neuromatch/climate-course-content/blob/main/tutorials/W1D5_IntroductiontoClimateModeling/instructor/W1D5_Tutorial7.ipynb)   <a href=\"https://kaggle.com/kernels/welcome?src=https://raw.githubusercontent.com/neuromatch/climate-course-content/main/tutorials/W1D5_IntroductiontoClimateModeling/instructor/W1D5_Tutorial7.ipynb\" target=\"_blank\"><img alt=\"Open in Kaggle\" src=\"https://kaggle.com/static/images/open-in-kaggle.svg\"/></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "# Tutorial 7: Introduction to CMIP6 Earth System Models\n",
    "\n",
    "**Week 1, Day 5, Introduction to Climate Modeling**\n",
    "\n",
    "**Content creators:** Julius Busecke, Robert Ford, Tom Nicholas, Brodie Pearson, and Brian E. J. Rose\n",
    "\n",
    "**Content reviewers:** Mujeeb Abdulfatai, Nkongho Ayuketang Arreyndip, Jeffrey N. A. Aryee, Younkap Nina Duplex, Sloane Garelick, Paul Heubel, Zahra Khodakaramimaghsoud, Peter Ohue, Jenna Pearson, Agustina Pesce, Abel Shibu, Derick Temfack, Peizhen Yang, Cheng Zhang, Chi Zhang, Ohad Zivan\n",
    "\n",
    "**Content editors:** Paul Heubel, Jenna Pearson, Ohad Zivan, Chi Zhang\n",
    "\n",
    "**Production editors:** Wesley Banfield, Paul Heubel, Jenna Pearson, Konstantine Tsafatinos, Chi Zhang, Ohad Zivan\n",
    "\n",
    "**Our 2024 Sponsors:** CMIP, NFDI4Earth"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "# Tutorial Objectives\n",
    "\n",
    "*Estimated timing of tutorial:* 30 minutes\n",
    "\n",
    "Earth System Models (ESMs) provide physically-based projections of how Earth's climate could change in the coming years, decades, and centuries at both global and local scales. In the following tutorial, you will:\n",
    "\n",
    "- Learn how to load, visualize, and manipulate ESM data from the [Coupled Model Intercomparison Project (CMIP6)](https://wcrp-cmip.org/cmip-phase-6-cmip6/)\n",
    "- Create maps showing projected future changes in sea surface temperature (SST)\n",
    "- Regrid SST data from a model-native grid to a regular latitude-longitude grid."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "# Setup\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {},
    "tags": [
     "colab"
    ]
   },
   "outputs": [],
   "source": [
    "# installations ( uncomment and run this cell ONLY when using google colab or kaggle )\n",
    "\n",
    "# !pip install condacolab &> /dev/null\n",
    "# import condacolab\n",
    "# condacolab.install()\n",
    "\n",
    "# # Install all packages in one call (+ use mamba instead of conda), this must in one line or code will fail\n",
    "# !mamba install xarray-datatree intake-esm gcsfs xmip aiohttp cartopy nc-time-axis cf_xarray xarrayutils \"esmf<=8.3.1\" xesmf &> /dev/null\n",
    "# # For xesmf install we need to pin \"esmf<=8.3.1\". More context here: https://github.com/pangeo-data/xESMF/issues/246"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {},
    "executionInfo": {
     "elapsed": 2,
     "status": "ok",
     "timestamp": 1683928271799,
     "user": {
      "displayName": "Brodie Pearson",
      "userId": "05269028596972519847"
     },
     "user_tz": 420
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "# imports\n",
    "\n",
    "import intake\n",
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "import xarray as xr\n",
    "import xesmf as xe\n",
    "\n",
    "from xmip.preprocessing import combined_preprocessing\n",
    "\n",
    "from datatree import DataTree\n",
    "\n",
    "import cartopy.crs as ccrs"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##  Install and import feedback gadget\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "execution": {},
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "# @title Install and import feedback gadget\n",
    "\n",
    "!pip3 install vibecheck datatops --quiet\n",
    "\n",
    "from vibecheck import DatatopsContentReviewContainer\n",
    "def content_review(notebook_section: str):\n",
    "    return DatatopsContentReviewContainer(\n",
    "        \"\",  # No text prompt\n",
    "        notebook_section,\n",
    "        {\n",
    "            \"url\": \"https://pmyvdlilci.execute-api.us-east-1.amazonaws.com/klab\",\n",
    "            \"name\": \"comptools_4clim\",\n",
    "            \"user_key\": \"l5jpxuee\",\n",
    "        },\n",
    "    ).render()\n",
    "\n",
    "\n",
    "feedback_prefix = \"W1D5_T7\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##  Figure settings\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "execution": {},
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "# @title Figure settings\n",
    "import ipywidgets as widgets  # interactive display\n",
    "\n",
    "plt.style.use(\n",
    "    \"https://raw.githubusercontent.com/neuromatch/climate-course-content/main/cma.mplstyle\"\n",
    ")\n",
    "\n",
    "%matplotlib inline"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##  Video 1: Introduction to Earth System Models\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "execution": {},
    "tags": [
     "remove-input"
    ]
   },
   "outputs": [],
   "source": [
    "# @title Video 1: Introduction to Earth System Models\n",
    "\n",
    "from ipywidgets import widgets\n",
    "from IPython.display import YouTubeVideo\n",
    "from IPython.display import IFrame\n",
    "from IPython.display import display\n",
    "\n",
    "\n",
    "class PlayVideo(IFrame):\n",
    "  def __init__(self, id, source, page=1, width=400, height=300, **kwargs):\n",
    "    self.id = id\n",
    "    if source == 'Bilibili':\n",
    "      src = f'https://player.bilibili.com/player.html?bvid={id}&page={page}'\n",
    "    elif source == 'Osf':\n",
    "      src = f'https://mfr.ca-1.osf.io/render?url=https://osf.io/download/{id}/?direct%26mode=render'\n",
    "    super(PlayVideo, self).__init__(src, width, height, **kwargs)\n",
    "\n",
    "\n",
    "def display_videos(video_ids, W=400, H=300, fs=1):\n",
    "  tab_contents = []\n",
    "  for i, video_id in enumerate(video_ids):\n",
    "    out = widgets.Output()\n",
    "    with out:\n",
    "      if video_ids[i][0] == 'Youtube':\n",
    "        video = YouTubeVideo(id=video_ids[i][1], width=W,\n",
    "                             height=H, fs=fs, rel=0)\n",
    "        print(f'Video available at https://youtube.com/watch?v={video.id}')\n",
    "      else:\n",
    "        video = PlayVideo(id=video_ids[i][1], source=video_ids[i][0], width=W,\n",
    "                          height=H, fs=fs, autoplay=False)\n",
    "        if video_ids[i][0] == 'Bilibili':\n",
    "          print(f'Video available at https://www.bilibili.com/video/{video.id}')\n",
    "        elif video_ids[i][0] == 'Osf':\n",
    "          print(f'Video available at https://osf.io/{video.id}')\n",
    "      display(video)\n",
    "    tab_contents.append(out)\n",
    "  return tab_contents\n",
    "\n",
    "\n",
    "video_ids = [('Youtube', '1ErpqVCb2wQ'), ('Bilibili', 'BV1YgGDepEL6')]\n",
    "tab_contents = display_videos(video_ids, W=730, H=410)\n",
    "tabs = widgets.Tab()\n",
    "tabs.children = tab_contents\n",
    "for i in range(len(tab_contents)):\n",
    "  tabs.set_title(i, video_ids[i][0])\n",
    "display(tabs)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##  Submit your feedback\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "execution": {},
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "# @title Submit your feedback\n",
    "content_review(f\"{feedback_prefix}_Recap_Earth_System_Models_Video\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "execution": {},
    "pycharm": {
     "name": "#%%\n"
    },
    "tags": [
     "remove-input"
    ]
   },
   "outputs": [],
   "source": [
    "# @markdown\n",
    "from ipywidgets import widgets\n",
    "from IPython.display import IFrame\n",
    "\n",
    "link_id = \"x3kwy\"\n",
    "\n",
    "print(f\"If you want to download the slides: https://osf.io/download/{link_id}/\")\n",
    "IFrame(src=f\"https://mfr.ca-1.osf.io/render?url=https://osf.io/{link_id}/?direct%26mode=render%26action=download%26mode=render\", width=854, height=480)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##  Submit your feedback\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "execution": {},
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "# @title Submit your feedback\n",
    "content_review(f\"{feedback_prefix}_Recap_Earth_System_Models_Slides\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "# Section 1: Accessing Earth System Model data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "In the previous tutorials we developed some simple conceptual climate models. Here we will jump to the most complex type of climate model, an **Earth System Model (ESM)**. \n",
    "\n",
    "ESMs include the physical processes typical of General Circulation Models (GCMs), but also include chemical and biological changes within the climate system (e.g. changes in vegetation, biomes, atmospheric CO$_2$)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "The several systems simulated in an ESM (ocean, atmosphere, cryosphere, land) are coupled to each other, and each system has its own variables, physics, and discretizations -- both of the spatial grid and the timestep.\n",
    "\n",
    "<img src=\"https://upload.wikimedia.org/wikipedia/commons/7/73/AtmosphericModelSchematic.png\" alt= “EarthSystemModel” width=\"420\" height=\"400\">\n",
    "\n",
    "Atmospheric Model Schematic (Credit: [Wikipedia](https://upload.wikimedia.org/wikipedia/commons/7/73/AtmosphericModelSchematic.png))\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "The one specific ESM we will analyze here is the [**Taiwan Earth System Model version 1** (**TaiESM1**)](https://gmd.copernicus.org/articles/13/3887/2020/). \n",
    "\n",
    "TaiESM1 was developed by modifying an earlier version of CESM2, [the Community Earth System Model, version 2](https://www.cesm.ucar.edu/models/cesm2), to include different parameterizations (i.e., physics). As a result, the two models are distinct from each other."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "## Section 1.1: Finding & Opening CMIP6 Data with Xarray\n",
    "\n",
    "Massive projects like [CMIP6](https://wcrp-cmip.org/cmip-phase-6-cmip6/) can contain millions of datasets. For most practical applications we only need a subset of the data, which we can select by specifying exactly which data sets we need. \n",
    "\n",
    " **Although we will only work with monthly SST (ocean) data today, the methods introduced can easily be applied/extended to load and analyze other CMIP6 variables, including from other components of the Earth system.**\n",
    " \n",
    "There are many ways to access the CMIP6 data, but here we will show a workflow using an [intake-esm](https://intake-esm.readthedocs.io/en/stable/) catalog object based on a CSV file that is maintained by the [pangeo community](https://pangeo.io/). Additional methods to access CMIP data are discussed in our [CMIP Resource Bank](https://github.com/neuromatch/climate-course-content/blob/main/tutorials/CMIP/CMIP_resource_bank.md)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {}
   },
   "outputs": [],
   "source": [
    "col = intake.open_esm_datastore(\n",
    "    \"https://storage.googleapis.com/cmip6/pangeo-cmip6.json\"\n",
    ")  # open an intake catalog containing the Pangeo CMIP cloud data\n",
    "col"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "We just loaded the full collection of Pangeo cloud datasets into an [intake catalog](https://intake-esm.readthedocs.io/en/stable/reference/esm-catalog-spec.html). The naming conventions of CMIP6 data sets are standardized across all models and experiments, which allows us to access multiple related data sets with efficient code.\n",
    "\n",
    "In the intake catalog above, we can see several different aspects of the CMIP6 naming conventions, including the following:\n",
    "\n",
    "1. ***variable_id***: The variable(s) of interest  \n",
    "   * Here we'll be working with SST, which in CMIP6 SST is called *tos*  \n",
    "2. ***source_id***: The CMIP6 model(s) that we want data from.\n",
    "3. ***table_id***: The origin system and output frequency desired of the variable(s) \n",
    "   * Here we use *Omon* - data from the ocean model at monthly resolution.\n",
    "4. ***grid_id***: The grid that we want the data to be on.\n",
    "   * Here we use *gn*  which is data on the model's *native* grid. Some models also provide *gr* (*regridded* data) and other grid options.\n",
    "5. ***experiment_id***: The CMIP6 experiments that we want to analyze\n",
    "   * We will load one experiment: *ssp585*. We'll discuss scenarios more in the next tutorial.\n",
    "6. ***member_id***: this distinguishes simulations if the same model is run repeatedly for an experiment\n",
    "   * We use *r1i1p1f1* for now, but will explore this in a later tutorial\n",
    "\n",
    "Each of these terms is called a *facet* in CMIP vocabulary. To learn more about CMIP and the possible facets please see our [CMIP Resource Bank](https://github.com/neuromatch/climate-course-content/blob/main/tutorials/CMIP/CMIP_resource_bank.md) and the [CMIP website](https://wcrp-cmip.org/).\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "Try running \n",
    "\n",
    "```\n",
    "col.df['source_id'].unique()\n",
    "```\n",
    "in the next cell to get a list of all available models!"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "Now we will create a subset according to the provided facets using the `.search()` method, and finally open the cloud-stored [zarr stores](https://pangeo-data.github.io/pangeo-cmip6-cloud/overview.html) into Xarray datasets. \n",
    "\n",
    "The data returned are Xarray datasets that contain [dask arrays](https://docs.dask.org/en/stable/array.html). These are 'lazy', meaning the actual data will only be loaded when a computation is performed. What is loaded here is only the metadata, which enables us to inspect the data (e.g. the dimensionality/variable units) without loading in GBs or TBs of data!\n",
    "\n",
    "A subtle but important step in the opening stage is the use of a preprocessing function! By passing `preprocess=combined_preprocessing` we apply crowdsourced fixes from the [xMIP](https://github.com/jbusecke/xmip) package to each dataset. This ensures consistent naming of dimensions (and other convenient things - see [here](https://cmip6-preprocessing.readthedocs.io/en/latest/tutorial.html) for more)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    " **Although we will only work with monthly SST (ocean) data today, the methods introduced can easily be applied/extended to load and analyze other CMIP6 variables, including from other components of the Earth system.**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {},
    "executionInfo": {
     "elapsed": 6561,
     "status": "ok",
     "timestamp": 1683910891406,
     "user": {
      "displayName": "Brodie Pearson",
      "userId": "05269028596972519847"
     },
     "user_tz": 420
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "# from the full `col` object, create a subset using facet search\n",
    "cat = col.search(\n",
    "    source_id=[\"TaiESM1\"\n",
    "               #,\"MPI-ESM1-2-LR\" # alternative model specification\n",
    "              ],\n",
    "    variable_id=\"tos\",\n",
    "    member_id=\"r1i1p1f1\",\n",
    "    table_id=\"Omon\",\n",
    "    grid_label=\"gn\",\n",
    "    experiment_id=[\"ssp585\",\n",
    "                   #\"ssp245\",\n",
    "                   \"historical\"],\n",
    "    require_all_on=[\n",
    "        \"source_id\"\n",
    "    ],  # make sure that we only get models which have all of the above experiments\n",
    ")\n",
    "\n",
    "# convert the sub-catalog into a datatree object, by opening each dataset into an xarray.Dataset (without loading the data)\n",
    "kwargs = dict(\n",
    "    preprocess=combined_preprocessing,  # apply xMIP fixes to each dataset\n",
    "    xarray_open_kwargs=dict(\n",
    "        use_cftime=True\n",
    "    ),  # ensure all datasets use the same time index\n",
    "    storage_options={\n",
    "        \"token\": \"anon\"\n",
    "    },  # anonymous/public authentication to google cloud storage\n",
    ")\n",
    "\n",
    "cat.esmcat.aggregation_control.groupby_attrs = [\"source_id\", \"experiment_id\"]\n",
    "dt = cat.to_datatree(**kwargs)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {}
   },
   "outputs": [],
   "source": [
    "cat.keys()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "## Section 1.2: Checking the CMIP6 DataTree\n",
    "\n",
    "We now have a \"[datatree](https://xarray-datatree.readthedocs.io/en/latest/quick-overview.html)\" containing the data we searched for. A datatree is a high-level container of Xarray data, useful for organizing many related datasets together. You can think of a single `DataTree` object as being like a (nested) dictionary of `xarray.Dataset` objects. Each dataset in the tree is known as a \"node\" or \"group\", and we can also have empty nodes. \n",
    "\n",
    "*This `DataTree` object may seem overly complicated with just a couple of datasets, but it will prove to be very useful in later tutorials where you will work with multiple models, experiments, and ensemble members.* \n",
    "\n",
    "You can explore the nodes of the tree and its contents interactively in a similar way to how you can explore the contents of an `xarray.Dataset`. Click on the arrows to expand the information about the datatree below:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {},
    "executionInfo": {
     "elapsed": 1011,
     "status": "ok",
     "timestamp": 1683910899038,
     "user": {
      "displayName": "Brodie Pearson",
      "userId": "05269028596972519847"
     },
     "user_tz": 420
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "dt"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "Each group in the tree is stored under a corresponding `name`, and we can select nodes via their name. The real usefulness of a datatree comes from having many groups at different depths, analogous to how one might store files in nested directories (e.g. `day1/experiment1/data.txt`, `day1/experiment2/data.txt` etc.). \n",
    "\n",
    "In our case, the particular datatree object has different CMIP models and different experiments stored at distinct levels of the tree. This is useful because we can select just one experiment for one model, or all experiments for one model, or all experiments for all models!\n",
    "\n",
    "We can also apply Xarray operations (e.g. taking the average using the `.mean()` method) over all the data in a tree at once, just by calling that same method on the `DataTree` object. We can even map custom functions over all nodes in the tree using [`dt.map_over_subtree(my_function)`](https://xarray-datatree.readthedocs.io/en/latest/generated/datatree.map_over_subtree.html#datatree-map-over-subtree).\n",
    "\n",
    "All the operations below can be accomplished without using datatrees, but it saves us many lines of code as we don't have to use `for` loops over all our the different datasets. For more information about datatree see the [documentation here](https://xarray-datatree.readthedocs.io/en/latest/index.html)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "Now, let's pull out a single model (**_TaiESM1_**) and experiment (**_ssp585_**) from our datatree:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {}
   },
   "outputs": [],
   "source": [
    "ssp585 = dt[\"TaiESM1\"][\"ssp585\"].ds\n",
    "ssp585"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "We now have a more familiar single Xarray dataset containing a single Data variable `tos`. We can access the DataArray for our `tos` variable as usual, and inspect its attributes like `long_name` and `units`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {}
   },
   "outputs": [],
   "source": [
    "ssp585.tos"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "# Section 2: Plotting maps of Sea Surface Temperature"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "Now that we have the model dataset organized within this datatree `dt` we can plot the datasets. Let's start by plotting a map of SST from TaiESM in July 2024. \n",
    "\n",
    "*Note that CMIP6 experiments were run several years ago, so the cut-off between **past** (observed forcing) and **future** (scenario-based/projected forcing) was at the start of 2015. This means that July 2024 is about 9 years into the CMIP6 **future** and so it is unlikely to look exactly like Earth's current SST state.*"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {},
    "executionInfo": {
     "elapsed": 11488,
     "status": "ok",
     "timestamp": 1683910911990,
     "user": {
      "displayName": "Brodie Pearson",
      "userId": "05269028596972519847"
     },
     "user_tz": 420
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "# Set up our figure with a Cartopy map projection\n",
    "fig, (ax_present) = plt.subplots(subplot_kw={\"projection\": ccrs.Robinson()}\n",
    ")\n",
    "\n",
    "# select the model data for July 2024\n",
    "sst_present = ssp585.tos.sel(time=\"2024-07\").squeeze()\n",
    "# note that .squeeze() just removes singleton dimensions\n",
    "\n",
    "# plot the model data\n",
    "sst_present.plot(\n",
    "    ax=ax_present,\n",
    "    x=\"lon\",\n",
    "    y=\"lat\",\n",
    "    transform=ccrs.PlateCarree(),\n",
    "    vmin=-2,\n",
    "    vmax=30,\n",
    "    cmap=\"magma\",\n",
    "    robust=True,\n",
    ")\n",
    "ax_present.coastlines()\n",
    "ax_present.set_title(\"July 2024\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "## Coding Exercises 2\n",
    "\n",
    "Now that we can plot maps of CMIP6 data, let's look at some projected future changes using this data!\n",
    "\n",
    "In this coding exercise your goals are to: \n",
    "1. Create a map of the projected sea surface temperature in July 2100 under the SSP5-8.5 high-emissions scenario (we'll discuss scenarios in the next mini-lecture) using data from the *TaiESM1* CMIP6 model.\n",
    "2. Create a map showing how this sea surface temperature (SST, tos) projection is different from the current (July 2024) sea surface temperature in this model\n",
    "3. Plot a similar map for this model that shows how *January* 2100 is different from *January* 2024\n",
    "\n",
    "To get you started, we have provided code to load the required data set into a variable called *sst_ssp585*, and we have plotted the current (July 2024) sea surface temperature from this data set.\n",
    "\n",
    "*Note: differences between two snapshots of SST are not the same as the **anomalies** that you encountered earlier in the course, which were the difference relative to the average during a reference period.*"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "execution": {},
    "executionInfo": {
     "elapsed": 163,
     "status": "error",
     "timestamp": 1683910918443,
     "user": {
      "displayName": "Brodie Pearson",
      "userId": "05269028596972519847"
     },
     "user_tz": 420
    }
   },
   "source": [
    "```python\n",
    "# select just a single model and experiment\n",
    "sst_ssp585 = dt[\"TaiESM1\"][\"ssp585\"].ds.tos\n",
    "\n",
    "fig, ([ax_present, ax_future], [ax_diff_july, ax_diff_jan]) = plt.subplots(\n",
    "    ncols=2, nrows=2, figsize=[12, 6], subplot_kw={\"projection\": ccrs.Robinson()}\n",
    ")\n",
    "\n",
    "# plot a timestep for 2024\n",
    "sst_present = sst_ssp585.sel(time=\"2024-07\").squeeze()\n",
    "sst_present.plot(\n",
    "    ax=ax_present,\n",
    "    x=\"lon\",\n",
    "    y=\"lat\",\n",
    "    transform=ccrs.PlateCarree(),\n",
    "    vmin=-2,\n",
    "    vmax=30,\n",
    "    cmap=\"magma\",\n",
    "    robust=True,\n",
    ")\n",
    "ax_present.coastlines()\n",
    "ax_present.set_title(\"July 2024\")\n",
    "\n",
    "# repeat for 2100\n",
    "# complete the following line to extract data for July 2100\n",
    "sst_future = ...\n",
    "_ = ...\n",
    "ax_future.coastlines()\n",
    "ax_future.set_title(\"July 2100\")\n",
    "\n",
    "# now find the difference between July 2100 and July 2024\n",
    "# complete the following line to extract the July difference\n",
    "sst_difference_july = ...\n",
    "_ = ...\n",
    "ax_diff_july.coastlines()\n",
    "ax_diff_july.set_title(\"2100 vs. 2024 Difference (July)\")\n",
    "\n",
    "# finally, find the difference between January of the two years used above\n",
    "# complete the following line to extract the January difference\n",
    "sst_difference_jan = ...\n",
    "_ = ...\n",
    "ax_diff_jan.coastlines()\n",
    "ax_diff_jan.set_title(\"2100 vs. 2024 Difference (January)\")\n",
    "\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {},
    "executionInfo": {
     "elapsed": 36973,
     "status": "ok",
     "timestamp": 1683910956791,
     "user": {
      "displayName": "Brodie Pearson",
      "userId": "05269028596972519847"
     },
     "user_tz": 420
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "# to_remove solution\n",
    "# select just a single model and experiment\n",
    "sst_ssp585 = dt[\"TaiESM1\"][\"ssp585\"].ds.tos\n",
    "\n",
    "fig, ([ax_present, ax_future], [ax_diff_july, ax_diff_jan]) = plt.subplots(\n",
    "    ncols=2, nrows=2, figsize=[12, 6], subplot_kw={\"projection\": ccrs.Robinson()}\n",
    ")\n",
    "\n",
    "# plot a timestep for 2024\n",
    "sst_present = sst_ssp585.sel(time=\"2024-07\").squeeze()\n",
    "sst_present.plot(\n",
    "    ax=ax_present,\n",
    "    x=\"lon\",\n",
    "    y=\"lat\",\n",
    "    transform=ccrs.PlateCarree(),\n",
    "    vmin=-2,\n",
    "    vmax=30,\n",
    "    cmap=\"magma\",\n",
    "    robust=True,\n",
    ")\n",
    "ax_present.coastlines()\n",
    "ax_present.set_title(\"July 2024\")\n",
    "\n",
    "# repeat for 2100\n",
    "# complete the following line to extract data for July 2100\n",
    "sst_future = sst_ssp585.sel(time=\"2100-07\").squeeze()\n",
    "_ = sst_future.plot(\n",
    "    ax=ax_future,\n",
    "    x=\"lon\",\n",
    "    y=\"lat\",\n",
    "    transform=ccrs.PlateCarree(),\n",
    "    vmin=-10,\n",
    "    vmax=30,\n",
    "    cmap=\"magma\",\n",
    "    robust=True,\n",
    ")\n",
    "ax_future.coastlines()\n",
    "ax_future.set_title(\"July 2100\")\n",
    "\n",
    "# now find the difference between July 2100 and July 2024\n",
    "# complete the following line to extract the July difference\n",
    "sst_difference_july = sst_future - sst_present\n",
    "_ = sst_difference_july.plot(\n",
    "    ax=ax_diff_july,\n",
    "    x=\"lon\",\n",
    "    y=\"lat\",\n",
    "    transform=ccrs.PlateCarree(),\n",
    "    vmin=-7.5,\n",
    "    vmax=7.5,\n",
    "    cmap=\"coolwarm\",\n",
    ")\n",
    "ax_diff_july.coastlines()\n",
    "ax_diff_july.set_title(\"2100 vs. 2024 Difference (July)\")\n",
    "\n",
    "# finally, find the difference between January of the two years used above\n",
    "# complete the following line to extract the January difference\n",
    "sst_difference_jan = (\n",
    "    sst_ssp585.sel(time=\"2100-01\").squeeze() - sst_ssp585.sel(time=\"2024-01\").squeeze()\n",
    ")\n",
    "_ = sst_difference_jan.plot(\n",
    "    ax=ax_diff_jan,\n",
    "    x=\"lon\",\n",
    "    y=\"lat\",\n",
    "    transform=ccrs.PlateCarree(),\n",
    "    vmin=-7.5,\n",
    "    vmax=7.5,\n",
    "    cmap=\"coolwarm\",\n",
    ")\n",
    "ax_diff_jan.coastlines()\n",
    "ax_diff_jan.set_title(\"2100 vs. 2024 Difference (January)\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "###  Submit your feedback\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "execution": {},
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "# @title Submit your feedback\n",
    "content_review(f\"{feedback_prefix}_Coding_Exercises_2\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {},
    "tags": []
   },
   "source": [
    "## Questions 2: Climate Connection\n",
    "\n",
    "1.   *Comparing only the top two panels*, how is the July SST projected to change in this particular model simulation? Do these changes agree with the map of July change that you plotted in the bottom left, and are these changes easier to see in this bottom map? \n",
    "2.   In what ways are the July and January maps similar or dissimilar, and can you think of any physical explanations for these (dis)similarities? \n",
    "3. Why do you think the color bar axes vary? (i.e., the top panels say \"*Sea Surface Temperature [$^oC$]*\" while the bottom panels say \"*tos*\")\n",
    "\n",
    "Many of the changes seen in the maps are a result of a changing climate under this high-emissions scenarios. However, keep in mind that these are differences between two months that are almost 80 years apart, so some of the changes are due to weather/synoptic differences between these particular months.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {},
    "tags": []
   },
   "outputs": [],
   "source": [
    "# to_remove explanation\n",
    "\n",
    "\"\"\"\n",
    "1. Based on the top maps, it looks like the Equator and low latitudes warm significantly, and the higher latitudes also warm. The northern hemisphere warms more than the southern hemisphere. These changes agree qualitatively with the \"change map\" (bottom left), although the change map makes it clear that the Arctic surface waters are warming faster than the rest of the planet and that the warming is not spatially uniform anywhere (in fact parts of the North Atlantic cool slightly!). The warming in the low latitudes and Southern hemisphere is still significant, and shows interesting spatial patterns.\n",
    "2. There are various things you might notice. For example, the January maps show more warming in the Southern hemisphere than the July maps, consistent with the Southern hemisphere summer creating a warmer baseline and more potential for extreme heat. We also see warming in the Equatorial Pacific region for January, which was not present for July, which could be due to different ENSO phases across the two months and the two years. A final example is that the North Atlantic shows even stronger cooling in January than in July, this is a common signal in many climate models. This cooling can result from melting ice sheets and glaciers creating colder, fresher surface water, which increases stratification. This can reduce the amount of deep convection in the North Atlantic region (by trapping fresh cold water at the surface), weakening the thermohaline circulation.\n",
    "3. The metadata of the CMIP6 dataset we are using in the first two plots contains a long-name for the variable and its units, which are automatically used for the axis labels. When we perform a mathematical operation (subtraction) on the dataset to create a new DataArray, the long-name metadata is not transferred to the new array to avoid confusion in case the operation creates a new variable (that could also have different units). This leads to the plot using the variable name (tos) for the x-axis instead of the long name.\n",
    "\"\"\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "###  Submit your feedback\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "execution": {},
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "# @title Submit your feedback\n",
    "content_review(f\"{feedback_prefix}_Questions_2\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "# Section 3: Horizontal Regridding\n",
    "\n",
    "Many CMIP6 models use distinct spatial grids, we call this the model's *native grid*. \n",
    "\n",
    "You are likely familiar with the *regular latitude-longitude* grid where we separate the planet into boxes that have a fixed latitude and longitude span like this image we saw in the tutorial:\n",
    "\n",
    "<img src=\"https://upload.wikimedia.org/wikipedia/commons/4/4c/Azimutalprojektion-schief_kl-cropped.png\" alt= \"Lat_Lon_Grid\" width=\"250\" height=\"250\">"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "## Section 3.1: A Rotated Pole grid\n",
    "\n",
    "Let's look at the grid used for the ocean component of the *TaiESM1* CMIP6 model:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {},
    "executionInfo": {
     "elapsed": 1822,
     "status": "ok",
     "timestamp": 1683910973589,
     "user": {
      "displayName": "Brodie Pearson",
      "userId": "05269028596972519847"
     },
     "user_tz": 420
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "# create a scatter plot with a symbol at the center of each ocean grid cell in TaiESM1\n",
    "fig, ax = plt.subplots()\n",
    "ax.scatter(x=sst_ssp585.lon, y=sst_ssp585.lat, s=0.1)\n",
    "ax.set_ylabel(\"Latitude\")\n",
    "ax.set_xlabel(\"Longitude\")\n",
    "ax.set_title(\"Grid cell locations in TaiESM1\");"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "### Questions 3.1\n",
    "\n",
    "1. How would this plot look for a *regular latitude-longitude* grid like the globe image shown above and in the slides? In what ways is the TaiESM1 grid different from this regular grid? \n",
    "2. Can you think of a reason the Northern and Southern Hemisphere ocean grids differ?*\n",
    "\n",
    "**Hint: from an oceanographic context, how are the North and South poles different from each other?*"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {},
    "tags": []
   },
   "outputs": [],
   "source": [
    "# to_remove explanation\n",
    "\n",
    "\"\"\"\n",
    "1. For a regular latitude-longitude grid the plot should consist of straight lines from top to bottom, and straight lines from left to right, that are evenly spaced in each of those directions. The grid of TaiESM1 looks like a regular latitude-longitude grid in the Southern Hemisphere, but is quite different in the Northern Hemisphere, with the grid cells getting small (converging) at a \"grid North pole\" which is actually placed at ~75 degrees North and 40 degrees West. a large part of this \"grid North pole\" doesn't contain any grid points (the white hole).\n",
    "2. On a regular latitude-longitude grid, the grid cells rapidly get very small as you approach the pole which causes numerical issues for the ocean model. For example, the time step has to be reduced to physically capture the movement between the smallest cells, leading to many more computations required to evolve the model. This is not a problem for ocean models at the South Pole because the pole is on land! In the Northern hemisphere, it is common to move the \"grid North pole\" of ocean models to occur in a land region (e.g., Asian and/or North American continents), and sometimes there are poles in both these land masses!\n",
    "\"\"\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "####  Submit your feedback\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "execution": {},
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "# @title Submit your feedback\n",
    "content_review(f\"{feedback_prefix}_Questions_3_1\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "## Section 3.2: Regridding to a regular grid\n",
    "\n",
    "If you want to compare spatial maps from different models/observations, e.g. plot a map averaged over several models or the bias of this map relative to observations, you must first ensure the data from all the models (and observations) is on the same spatial grid. This is where regridding becomes essential!\n",
    "\n",
    "> Regridding is applied lazily, but it is still taking time to compute *when* it is applied. So if you want to compare for example the mean over time of several models it is often much quicker to compute the mean in time over the native grid and then regrid the result of that, instead of regridding each timestep and then calculating the mean!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {},
    "executionInfo": {
     "elapsed": 190,
     "status": "ok",
     "timestamp": 1683912424678,
     "user": {
      "displayName": "Brodie Pearson",
      "userId": "05269028596972519847"
     },
     "user_tz": 420
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "# define a 'target' grid. This is simply a regular lon/lat grid that we will interpolate our data on\n",
    "ds_target = xr.Dataset(\n",
    "    {\n",
    "        \"lat\": ([\"lat\"], np.arange(-90, 90, 1.0), {\"units\": \"degrees_north\"}),\n",
    "        \"lon\": ([\"lon\"], np.arange(0, 360, 1.0), {\"units\": \"degrees_east\"}),\n",
    "    }\n",
    ")  # you can try to modify the parameters above to e.g. just regrid onto a region or make the resolution coarser etc\n",
    "ds_target"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {},
    "executionInfo": {
     "elapsed": 9588,
     "status": "ok",
     "timestamp": 1683912434512,
     "user": {
      "displayName": "Brodie Pearson",
      "userId": "05269028596972519847"
     },
     "user_tz": 420
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "# define the regridder object (from our source dataarray to the target)\n",
    "regridder = xe.Regridder(\n",
    "    sst_ssp585, ds_target, \"bilinear\", periodic=True\n",
    ")  # this takes some time to calculate a weight matrix for the regridding\n",
    "regridder"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {},
    "executionInfo": {
     "elapsed": 3184,
     "status": "ok",
     "timestamp": 1683912437684,
     "user": {
      "displayName": "Brodie Pearson",
      "userId": "05269028596972519847"
     },
     "user_tz": 420
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "# now we can apply the regridder to our data\n",
    "sst_ssp585_regridded = regridder(sst_ssp585)  # this is a lazy operation!\n",
    "# so it does not slow us down significantly to apply it to the full data!\n",
    "# we can work with this array just like before and the regridding will only be\n",
    "# applied to the parts that we later load into memory or plot.\n",
    "sst_ssp585_regridded"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {},
    "executionInfo": {
     "elapsed": 191,
     "status": "ok",
     "timestamp": 1683912440448,
     "user": {
      "displayName": "Brodie Pearson",
      "userId": "05269028596972519847"
     },
     "user_tz": 420
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "# compare the shape to the original array\n",
    "sst_ssp585"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "## Section 3.3: Visually Comparing Data with Different Map Projections\n",
    "\n",
    "Let's use the code from above to plot a map of the model data on its original (*native*) grid, and a map of the model data after it is regridded."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {},
    "executionInfo": {
     "elapsed": 21337,
     "status": "ok",
     "timestamp": 1683912463177,
     "user": {
      "displayName": "Brodie Pearson",
      "userId": "05269028596972519847"
     },
     "user_tz": 420
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "fig, ([ax_regridded, ax_native]) = plt.subplots(\n",
    "    ncols=2, figsize=[12, 3], subplot_kw={\"projection\": ccrs.Robinson()}\n",
    ")\n",
    "\n",
    "# Native grid data\n",
    "sst_future = sst_ssp585.sel(time=\"2100-07\").squeeze()\n",
    "sst_future.plot(\n",
    "    ax=ax_native,\n",
    "    x=\"lon\",\n",
    "    y=\"lat\",\n",
    "    transform=ccrs.PlateCarree(),\n",
    "    vmin=-2,\n",
    "    vmax=30,\n",
    "    cmap=\"magma\",\n",
    "    robust=True,\n",
    ")\n",
    "ax_native.coastlines()\n",
    "ax_native.set_title(\"July 2100 Native Grid\")\n",
    "\n",
    "# Regridded data\n",
    "sst_future_regridded = sst_ssp585_regridded.sel(time=\"2100-07\").squeeze()\n",
    "sst_future_regridded.plot(\n",
    "    ax=ax_regridded,\n",
    "    x=\"lon\",\n",
    "    y=\"lat\",\n",
    "    transform=ccrs.PlateCarree(),\n",
    "    vmin=-2,\n",
    "    vmax=30,\n",
    "    cmap=\"magma\",\n",
    "    robust=True,\n",
    ")\n",
    "ax_regridded.coastlines()\n",
    "ax_regridded.set_title(\"July 2100 Regridded\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "### Questions 3.3\n",
    "\n",
    "1. Is this what you expected to see after regridding the data?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {},
    "tags": []
   },
   "outputs": [],
   "source": [
    "# to_remove explanation\n",
    "\n",
    "\"\"\"\n",
    "1. They look similar, which is what we expect from the regridding operation. It should not significantly change the underlying spatial information (i.e., the data), it should just adjust the locations at which that information is provided.\n",
    "\"\"\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "####  Submit your feedback\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "execution": {},
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "# @title Submit your feedback\n",
    "content_review(f\"{feedback_prefix}_Questions_3_3\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "# Summary\n",
    "\n",
    "In this tutorial you have: \n",
    "\n",
    "*   Loaded and manipulated data from a CMIP6 model under a high-emissions future scenario experiment\n",
    "*   Created maps of future projected changes in the Earth system using CMIP6 data\n",
    "*   Converted/regridded CMIP6 model data onto a desired grid. This is a critical processing step that allows us to directly compare data from different models and/or observations "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "# Resources\n",
    "\n",
    "This tutorial uses data from the simulations conducted as part of the [CMIP6](https://wcrp-cmip.org/) multi-model ensemble. \n",
    "\n",
    "For examples on how to access and analyze data, please visit the [Pangeo Cloud CMIP6 Gallery](https://gallery.pangeo.io/repos/pangeo-gallery/cmip6/index.html) \n",
    "\n",
    "For more information on what CMIP is and how to access the data, please see this [page](https://github.com/neuromatch/climate-course-content/blob/main/tutorials/CMIP/CMIP_resource_bank.md)."
   ]
  }
 ],
 "metadata": {
  "colab": {
   "collapsed_sections": [],
   "include_colab_link": true,
   "machine_shape": "hm",
   "name": "W1D5_Tutorial7",
   "provenance": [
    {
     "file_id": "1WfT8oN22xywtecNriLptqi1SuGUSoIlc",
     "timestamp": 1680298239014
    }
   ],
   "toc_visible": true
  },
  "gpuClass": "standard",
  "kernel": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.19"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}