{ "cells": [ { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/neuromatch/climate-course-content/blob/main/tutorials/W1D1_ClimateSystemOverview/student/W1D1_Tutorial2.ipynb)   \"Open" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "# Tutorial 2: Selection, Interpolation and Slicing\n", "\n", "**Week 1, Day 1, Climate System Overview**\n", "\n", "**Content creators:** Sloane Garelick, Julia Kent\n", "\n", "**Content reviewers:** Katrina Dobson, Younkap Nina Duplex, Danika Gupta, Maria Gonzalez, Will Gregory, Nahid Hasan, Paul Heubel, Sherry Mi, Beatriz Cosenza Muralles, Jenna Pearson, Agustina Pesce, Chi Zhang, Ohad Zivan\n", "\n", "**Content editors:** Paul Heubel, Jenna Pearson, Chi Zhang, Ohad Zivan\n", "\n", "**Production editors:** Wesley Banfield, Paul Heubel, Jenna Pearson, Konstantine Tsafatinos, Chi Zhang, Ohad Zivan\n", "\n", "**Our 2024 Sponsors:** CMIP, NFDI4Earth" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "## ![project pythia](https://projectpythia.org/_static/images/logos/pythia_logo-blue-rtext.svg)\n", "\n", "Pythia credit: Rose, B. E. J., Kent, J., Tyle, K., Clyne, J., Banihirwe, A., Camron, D., May, R., Grover, M., Ford, R. R., Paul, K., Morley, J., Eroglu, O., Kailyn, L., & Zacharias, A. (2023). Pythia Foundations (Version v2023.05.01) https://zenodo.org/record/8065851\n", "\n", "## ![CMIP.png](https://github.com/ClimateMatchAcademy/course-content/blob/main/tutorials/Art/CMIP.png?raw=true)\n" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "# Tutorial Objectives\n", "\n", "*Estimated timing of tutorial:* 25 minutes \n", "\n", "In the previous tutorial, we learned how to use `Xarray` to create `DataArray` and `Dataset` objects. Global climate datasets can be very large with multiple variables, and DataArrays and Datasets are very useful tools for organizing, comparing, and interpreting such data. However, sometimes we are not interested in examining a *global* dataset but wish to examine a specific time or location. For example, we might want to look at climate variables in a particular region of Earth, and potentially compare that to another region. In order to carry out such analyses, it’s useful to be able to extract and compare subsets of data from a global dataset. \n", "\n", "In this tutorial, you will explore multiple computational tools in `Xarray` that allow you to select data from a specific spatial and temporal range. In particular, you will practice using:\n", "\n", "\n", "* [**`.sel()`:**](https://docs.xarray.dev/en/latest/generated/xarray.DataArray.sel.html) select data based on coordinate values or date\n", "* [**`.interp()`:**](https://docs.xarray.dev/en/latest/generated/xarray.DataArray.interp.html) interpolate to any latitude/longitude location to extract data\n", "* [**`slice()`:**](https://docs.xarray.dev/en/latest/user-guide/indexing.html#indexing-with-dimension-names) to select a range (or slice) along one or more coordinates, we can pass a Python slice object to `.sel()`\n" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "# Setup" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": {} }, "outputs": [], "source": [ "# imports\n", "from datetime import timedelta\n", "import numpy as np\n", "import pandas as pd\n", "import xarray as xr\n", "import matplotlib.pyplot as plt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Install and import feedback gadget\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# @title Install and import feedback gadget\n", "\n", "!pip3 install vibecheck datatops --quiet\n", "\n", "from vibecheck import DatatopsContentReviewContainer\n", "def content_review(notebook_section: str):\n", " return DatatopsContentReviewContainer(\n", " \"\", # No text prompt\n", " notebook_section,\n", " {\n", " \"url\": \"https://pmyvdlilci.execute-api.us-east-1.amazonaws.com/klab\",\n", " \"name\": \"comptools_4clim\",\n", " \"user_key\": \"l5jpxuee\",\n", " },\n", " ).render()\n", "\n", "\n", "feedback_prefix = \"W1D1_T2\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Figure Settings\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# @title Figure Settings\n", "import ipywidgets as widgets # interactive display\n", "%config InlineBackend.figure_format = 'retina'\n", "plt.style.use(\n", " \"https://raw.githubusercontent.com/neuromatch/climate-course-content/main/cma.mplstyle\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Video 1: Solar Radiation and Earth's Energy Budget\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "remove-input" ] }, "outputs": [], "source": [ "# @title Video 1: Solar Radiation and Earth's Energy Budget\n", "\n", "from ipywidgets import widgets\n", "from IPython.display import YouTubeVideo\n", "from IPython.display import IFrame\n", "from IPython.display import display\n", "\n", "\n", "class PlayVideo(IFrame):\n", " def __init__(self, id, source, page=1, width=400, height=300, **kwargs):\n", " self.id = id\n", " if source == 'Bilibili':\n", " src = f'https://player.bilibili.com/player.html?bvid={id}&page={page}'\n", " elif source == 'Osf':\n", " src = f'https://mfr.ca-1.osf.io/render?url=https://osf.io/download/{id}/?direct%26mode=render'\n", " super(PlayVideo, self).__init__(src, width, height, **kwargs)\n", "\n", "\n", "def display_videos(video_ids, W=400, H=300, fs=1):\n", " tab_contents = []\n", " for i, video_id in enumerate(video_ids):\n", " out = widgets.Output()\n", " with out:\n", " if video_ids[i][0] == 'Youtube':\n", " video = YouTubeVideo(id=video_ids[i][1], width=W,\n", " height=H, fs=fs, rel=0)\n", " print(f'Video available at https://youtube.com/watch?v={video.id}')\n", " else:\n", " video = PlayVideo(id=video_ids[i][1], source=video_ids[i][0], width=W,\n", " height=H, fs=fs, autoplay=False)\n", " if video_ids[i][0] == 'Bilibili':\n", " print(f'Video available at https://www.bilibili.com/video/{video.id}')\n", " elif video_ids[i][0] == 'Osf':\n", " print(f'Video available at https://osf.io/{video.id}')\n", " display(video)\n", " tab_contents.append(out)\n", " return tab_contents\n", "\n", "\n", "video_ids = [('Youtube', 'EX4BMd3ZItQ'), ('Bilibili', 'BV1rg4y1w7Yq')]\n", "tab_contents = display_videos(video_ids, W=730, H=410)\n", "tabs = widgets.Tab()\n", "tabs.children = tab_contents\n", "for i in range(len(tab_contents)):\n", " tabs.set_title(i, video_ids[i][0])\n", "display(tabs)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Submit your feedback\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# @title Submit your feedback\n", "content_review(f\"{feedback_prefix}_Solar_Radiation_Video\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "pycharm": { "name": "#%%\n" }, "tags": [ "remove-input" ] }, "outputs": [], "source": [ "# @markdown\n", "from ipywidgets import widgets\n", "from IPython.display import IFrame\n", "\n", "link_id = \"gh5us\"\n", "\n", "print(f\"If you want to download the slides: https://osf.io/download/{link_id}/\")\n", "IFrame(src=f\"https://mfr.ca-1.osf.io/render?url=https://osf.io/{link_id}/?direct%26mode=render%26action=download%26mode=render\", width=854, height=480)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Submit your feedback\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# @title Submit your feedback\n", "content_review(f\"{feedback_prefix}_Solar_Radiation_Slides\")" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "# Section 1: Subsetting and Selection by Coordinate Values\n", "\n", "Since Xarray allows us to label coordinates, you can select data based on coordinate names and values, rather than array indices. We'll explore this briefly here. To explore these Xarray tools, first recreate the synthetic temperature and pressure DataArrays you generated in the previous Tutorial 1, and combine these two DataArrays into a Dataset." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {} }, "outputs": [], "source": [ "# temperature data\n", "rand_data = 283 + 5 * np.random.randn(5, 3, 4)\n", "times_index = pd.date_range(\"2018-01-01\", periods=5)\n", "lons = np.linspace(-120, -60, 4)\n", "lats = np.linspace(25, 55, 3)\n", "temperature = xr.DataArray(\n", " rand_data, coords=[times_index, lats, lons], dims=[\"time\", \"lat\", \"lon\"]\n", ")\n", "temperature.attrs[\"units\"] = \"Kelvin\"\n", "temperature.attrs[\"standard_name\"] = \"air_temperature\"\n", "\n", "# pressure data\n", "pressure_data = 1000.0 + 5 * np.random.randn(5, 3, 4)\n", "pressure = xr.DataArray(\n", " pressure_data, coords=[times_index, lats, lons], dims=[\"time\", \"lat\", \"lon\"]\n", ")\n", "pressure.attrs[\"units\"] = \"hPa\"\n", "pressure.attrs[\"standard_name\"] = \"air_pressure\"\n", "\n", "# combinate temperature and pressure DataArrays into a Dataset called 'ds'\n", "ds = xr.Dataset(data_vars={\"Temperature\": temperature, \"Pressure\": pressure})\n", "ds" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "To refresh your memory from the previous tutorial, take a look at the DataArrays you created for temperature and pressure by clicking on those variables in the dataset above." ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "\n", "## Section 1.1: NumPy-like Selection\n", "\n", "Suppose you want to extract all the spatial data for one single date: *January 2, 2018*. It's possible to achieve that with NumPy-like index selection:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": {}, "executionInfo": { "elapsed": 369, "status": "ok", "timestamp": 1681570689311, "user": { "displayName": "Sloane Garelick", "userId": "04706287370408131987" }, "user_tz": 240 } }, "outputs": [], "source": [ "indexed_selection = temperature[\n", " 1, :, :\n", "] # index 1 along axis 0 is the time slice we want...\n", "indexed_selection" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "However, notice that this requires us (the user) to have detailed knowledge of the order of the axes and the meaning of the indices along those axes. By having named coordinates in Xarray, we can avoid this issue." ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "## Section 1.2: `.sel()`\n", "\n", "Rather than using a NumPy-like index selection, in Xarray, we can instead select data based on coordinate values using the `.sel()` method, which takes one or more named coordinate(s) as a keyword argument:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": {}, "executionInfo": { "elapsed": 14, "status": "ok", "timestamp": 1681570692916, "user": { "displayName": "Sloane Garelick", "userId": "04706287370408131987" }, "user_tz": 240 } }, "outputs": [], "source": [ "named_selection = temperature.sel(time=\"2018-01-02\")\n", "named_selection" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "We got the same result as when we used the NumPy-like index selection, but \n", "- we didn't have to know anything about how the array was created or stored\n", "- our code is agnostic about how many dimensions we are dealing with\n", "- the intended meaning of our code is much clearer!\n", "\n", "By using the `.sel()` method in Xarray, we can easily isolate data from a specific time. You can also isolate data from a specific coordinate. " ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "### Coding Exercises 1.2" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "1. Write a line of code to select the temperature data from the grid point with the coordinates (25, -120)." ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "execution": {}, "tags": [] }, "source": [ "```python\n", "coordinate_selection = ...\n", "coordinate_selection\n", "\n", "```" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": {}, "tags": [] }, "outputs": [], "source": [ "# to_remove solution\n", "\n", "coordinate_selection = temperature.sel(lat=\"25.0\", lon=\"-120.0\")\n", "coordinate_selection" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Submit your feedback\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# @title Submit your feedback\n", "content_review(f\"{feedback_prefix}_Coding_Exercise_1_2\")" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "## Section 1.3: Approximate Selection and Interpolation\n", "\n", "The spatial and temporal resolution of climate data often differs between datasets or a dataset may be incomplete. Therefore, with time and space data, we frequently want to sample \"near\" the coordinate points in our dataset. For example, we may want to analyze data from a specific coordinate or a specific time, but may not have a value from that specific location or date. In that case, we would want to use the data from the closest coordinate or time step. Here are a few simple ways to achieve that.\n", "\n", "### Section 1.3.1: Nearest-neighbor Sampling\n", "\n", "Suppose we want to know the temperature from `2018-01-07`. However, the last day on our `time` axis is `2018-01-05`. We can therefore sample within two days of our desired date of `2018-01-07`. We can do this using the `.sel()` method we used earlier but with the added flexibility of performing [nearest neighbor sampling](https://docs.xarray.dev/en/stable/user-guide/indexing.html#nearest-neighbor-lookups) and specifying an optional tolerance. This is called an **inexact lookup** because we are not searching for a perfect match, although there may be one. Here the **tolerance** is the maximum distance away from our desired point Xarray will search for the nearest neighbor." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": {}, "executionInfo": { "elapsed": 321, "status": "ok", "timestamp": 1681570696404, "user": { "displayName": "Sloane Garelick", "userId": "04706287370408131987" }, "user_tz": 240 } }, "outputs": [], "source": [ "temperature.sel(time=\"2018-01-07\", method=\"nearest\", tolerance=timedelta(days=2))" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "Notice that the resulting data is from the date `2018-01-05`." ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "### Section 1.3.2: Interpolation\n", "\n", "The latitude values of our dataset are 25ºN, 40ºN, 55ºN, and the longitude values are 120ºW, 100ºW, 80ºW, 60ºW. But suppose we want to extract a timeseries for Boulder, Colorado, USA (40°N, 105°W). Since `lon=-105` is _not_ a point on our longitude axis, this requires interpolation between data points.\n", "\n", "We can do this using the `.interp()` method (see the docs [here](http://xarray.pydata.org/en/stable/interpolation.html)), which works similarly to `.sel()`. Using `.interp()`, we can interpolate to any latitude/longitude location using an interpolation method of our choice. In the example below, you will linearly interpolate between known points." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": {}, "executionInfo": { "elapsed": 646, "status": "ok", "timestamp": 1681570700081, "user": { "displayName": "Sloane Garelick", "userId": "04706287370408131987" }, "user_tz": 240 } }, "outputs": [], "source": [ "temperature.interp(lon=-105, lat=40, method=\"linear\")" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "In this case, we specified a linear interpolation method, yet one can choose other methods as well (e.g., nearest, cubic, quadratic). Note that the temperature values we extracted in the code cell above are not actual values in the dataset, but are instead calculated based on linear interpolations between values that are in the dataset." ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "## Section 1.4: Slicing Along Coordinates\n", "\n", "Frequently we want to select a range (or _slice_) along one or more coordinate(s). For example, you may wish to only assess average annual temperatures in equatorial regions. We can achieve this by passing a Python [slice](https://docs.python.org/3/library/functions.html#slice) object to `.sel()`. The calling sequence for slice always looks like slice(start, stop[, step]), where step is optional. In this case, let's only look at values between 110ºW-70ºW and 25ºN-40ºN:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": {}, "executionInfo": { "elapsed": 21, "status": "ok", "timestamp": 1681570702748, "user": { "displayName": "Sloane Garelick", "userId": "04706287370408131987" }, "user_tz": 240 } }, "outputs": [], "source": [ "temperature.sel(\n", " time=slice(\"2018-01-01\", \"2018-01-03\"), lon=slice(-110, -70), lat=slice(25, 45)\n", ")" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "## Section 1.5: One More Selection Method: `.loc`\n", "\n", "All of these operations can also be done within square brackets on the `.loc` attribute of the `DataArray`:\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": {}, "executionInfo": { "elapsed": 14, "status": "ok", "timestamp": 1681570706874, "user": { "displayName": "Sloane Garelick", "userId": "04706287370408131987" }, "user_tz": 240 } }, "outputs": [], "source": [ "temperature.loc['2018-01-02']" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "This is sort of in between the NumPy-style selection\n", "\n", "```\n", "temp[1,:,:]\n", "```\n", "\n", "and the fully label-based selection using `.sel()`\n", "\n", "With `.loc`, we make use of the coordinate _values_, but lose the ability to specify the _names_ of the various dimensions. Instead, the slicing must be done in the correct order:\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": {}, "executionInfo": { "elapsed": 315, "status": "ok", "timestamp": 1681570712906, "user": { "displayName": "Sloane Garelick", "userId": "04706287370408131987" }, "user_tz": 240 } }, "outputs": [], "source": [ "temperature.loc['2018-01-01':'2018-01-03', 25:45, -110:-70]" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "One advantage of using `.loc` is that we can use NumPy-style slice notation like `25:45`, rather than the more verbose `slice(25,45)`. But of course that also works:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": {}, "executionInfo": { "elapsed": 365, "status": "ok", "timestamp": 1681570719989, "user": { "displayName": "Sloane Garelick", "userId": "04706287370408131987" }, "user_tz": 240 } }, "outputs": [], "source": [ "temperature.loc[\"2018-01-01\":\"2018-01-03\", slice(25, 45), -110:-70]" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "What _doesn't_ work is passing the slices in a different order to the dimensions of the dataset:\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": {}, "tags": [] }, "outputs": [], "source": [ "# This will generate an error\n", "# temperature.loc[-110:-70, 25:45,'2018-01-01':'2018-01-03']" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "# Summary\n", "\n", "In this tutorial, we have explored the practical use of **`.sel()`**, **`.interp()`**, **`.loc`**, and **slicing** techniques to extract data from specific spatial and temporal ranges. These methods are valuable when we are interested in only certain pieces of large datasets." ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "# Resources\n" ] }, { "cell_type": "markdown", "metadata": { "execution": {}, "tags": [] }, "source": [ "Code and data for this tutorial is based on existing content from [Project Pythia](https://foundations.projectpythia.org/core/xarray/xarray-intro.html)." ] } ], "metadata": { "colab": { "collapsed_sections": [], "include_colab_link": true, "name": "W1D1_Tutorial2", "provenance": [ { "file_id": "1f2uyMuRNCH2LLG5u4Z4Tdb_OHLHB9saW", "timestamp": 1679941598643 } ], "toc_visible": true }, "kernel": { "display_name": "Python 3", "language": "python", "name": "python3" }, "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.19" } }, "nbformat": 4, "nbformat_minor": 4 }