{ "cells": [ { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/neuromatch/climate-course-content/blob/main/tutorials/W2D3_ExtremesandVariability/instructor/W2D3_Tutorial7.ipynb)   \"Open" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "# Bonus Tutorial 7: Non-Stationarity in the EVT-framework\n", "\n", "**Week 2, Day 3, Extremes & Variability**\n", "\n", "**Content creators:** Matthias Aengenheyster, Joeri Reinders\n", "\n", "**Content reviewers:** Younkap Nina Duplex, Sloane Garelick, Paul Heubel, Zahra Khodakaramimaghsoud, Peter Ohue, Laura Paccini, Jenna Pearson, Agustina Pesce, Derick Temfack, Peizhen Yang, Cheng Zhang, Chi Zhang, Ohad Zivan\n", "\n", "**Content editors:** Paul Heubel, Jenna Pearson, Chi Zhang, Ohad Zivan\n", "\n", "**Production editors:** Wesley Banfield, Paul Heubel, Jenna Pearson, Konstantine Tsafatinos, Chi Zhang, Ohad Zivan\n", "\n", "**Our 2024 Sponsors:** CMIP, NFDI4Earth" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "# Tutorial Objectives" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "*Estimated timing of tutorial:* 30 minutes\n", "\n", "In this tutorial, we will enhance our GEV model by incorporating non-stationarity to achieve a better fit for the sea surface height data from Washington DC we analyzed previously.\n", "\n", "By the end of the tutorial you will be able to:\n", "- Apply a GEV distribution to a non-stationary record by including a time-dependent parameter.\n", "- Compare and analyze the fits of various models with different time-dependent parameters (e.g., location, scale, or shape).\n", "- Calculate effective return levels using our non-stationary GEV model.\n" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "# Setup" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": {}, "tags": [] }, "outputs": [], "source": [ "# installations ( uncomment and run this cell ONLY when using google colab or kaggle )\n", "\n", "# !pip install -q condacolab\n", "# import condacolab\n", "# condacolab.install()\n", "# !mamba install eigen numpy matplotlib seaborn pandas cartopy scipy texttable intake xarrayutils xmip cf_xarray intake-esm\n", "# !pip install https://github.com/yrobink/SDFC-python/archive/master.zip\n", "\n", "# # get some accessory functions (you will need to install wget)\n", "!wget https://raw.githubusercontent.com/neuromatch/climate-course-content/main/tutorials/W2D3_ExtremesandVariability/extremes_functions.py\n", "!wget https://raw.githubusercontent.com/neuromatch/climate-course-content/main/tutorials/W2D3_ExtremesandVariability/gev_functions.py\n", "!wget https://raw.githubusercontent.com/neuromatch/climate-course-content/main/tutorials/W2D3_ExtremesandVariability/sdfc_classes.py" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": {}, "tags": [] }, "outputs": [], "source": [ "# imports\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "import seaborn as sns\n", "import pandas as pd\n", "from scipy import stats\n", "from scipy.stats import genextreme as gev\n", "import pooch\n", "import os\n", "import tempfile\n", "import gev_functions as gf\n", "import extremes_functions as ef\n", "import sdfc_classes as sd" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Install and import feedback gadget\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# @title Install and import feedback gadget\n", "\n", "!pip3 install vibecheck datatops --quiet\n", "\n", "from vibecheck import DatatopsContentReviewContainer\n", "def content_review(notebook_section: str):\n", " return DatatopsContentReviewContainer(\n", " \"\", # No text prompt\n", " notebook_section,\n", " {\n", " \"url\": \"https://pmyvdlilci.execute-api.us-east-1.amazonaws.com/klab\",\n", " \"name\": \"comptools_4clim\",\n", " \"user_key\": \"l5jpxuee\",\n", " },\n", " ).render()\n", "\n", "\n", "feedback_prefix = \"W2D3_T7\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Figure Settings\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# @title Figure Settings\n", "import ipywidgets as widgets # interactive display\n", "\n", "%config InlineBackend.figure_format = 'retina'\n", "plt.style.use(\n", " \"https://raw.githubusercontent.com/neuromatch/climate-course-content/main/cma.mplstyle\"\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Helper functions\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# @title Helper functions\n", "\n", "\n", "def pooch_load(filelocation=None, filename=None, processor=None):\n", " shared_location = \"/home/jovyan/shared/Data/tutorials/W2D3_ExtremesandVariability\" # this is different for each day\n", " user_temp_cache = tempfile.gettempdir()\n", "\n", " if os.path.exists(os.path.join(shared_location, filename)):\n", " file = os.path.join(shared_location, filename)\n", " else:\n", " file = pooch.retrieve(\n", " filelocation,\n", " known_hash=None,\n", " fname=os.path.join(user_temp_cache, filename),\n", " processor=processor,\n", " )\n", "\n", " return file\n", "\n", "def estimate_return_level(quantile,model):\n", " loc, scale, shape = model.loc, model.scale, model.shape\n", " level = loc - scale / shape * (1 - (-np.log(quantile))**(-shape))\n", " return level" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Video 1: Non-Stationarity and EVT\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "remove-input" ] }, "outputs": [], "source": [ "# @title Video 1: Non-Stationarity and EVT\n", "\n", "from ipywidgets import widgets\n", "from IPython.display import YouTubeVideo\n", "from IPython.display import IFrame\n", "from IPython.display import display\n", "\n", "\n", "class PlayVideo(IFrame):\n", " def __init__(self, id, source, page=1, width=400, height=300, **kwargs):\n", " self.id = id\n", " if source == 'Bilibili':\n", " src = f'https://player.bilibili.com/player.html?bvid={id}&page={page}'\n", " elif source == 'Osf':\n", " src = f'https://mfr.ca-1.osf.io/render?url=https://osf.io/download/{id}/?direct%26mode=render'\n", " super(PlayVideo, self).__init__(src, width, height, **kwargs)\n", "\n", "\n", "def display_videos(video_ids, W=400, H=300, fs=1):\n", " tab_contents = []\n", " for i, video_id in enumerate(video_ids):\n", " out = widgets.Output()\n", " with out:\n", " if video_ids[i][0] == 'Youtube':\n", " video = YouTubeVideo(id=video_ids[i][1], width=W,\n", " height=H, fs=fs, rel=0)\n", " print(f'Video available at https://youtube.com/watch?v={video.id}')\n", " else:\n", " video = PlayVideo(id=video_ids[i][1], source=video_ids[i][0], width=W,\n", " height=H, fs=fs, autoplay=False)\n", " if video_ids[i][0] == 'Bilibili':\n", " print(f'Video available at https://www.bilibili.com/video/{video.id}')\n", " elif video_ids[i][0] == 'Osf':\n", " print(f'Video available at https://osf.io/{video.id}')\n", " display(video)\n", " tab_contents.append(out)\n", " return tab_contents\n", "\n", "\n", "video_ids = [('Youtube', 'A-ANgs-HfPY'), ('Bilibili', 'BV1y14y1R7iA')]\n", "tab_contents = display_videos(video_ids, W=730, H=410)\n", "tabs = widgets.Tab()\n", "tabs.children = tab_contents\n", "for i in range(len(tab_contents)):\n", " tabs.set_title(i, video_ids[i][0])\n", "display(tabs)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Submit your feedback\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# @title Submit your feedback\n", "content_review(f\"{feedback_prefix}_Non_Stationarity_and_EVT_Video\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "pycharm": { "name": "#%%\n" }, "tags": [ "remove-input" ] }, "outputs": [], "source": [ "# @markdown\n", "from ipywidgets import widgets\n", "from IPython.display import IFrame\n", "\n", "link_id = \"w9zcu\"\n", "\n", "print(f\"If you want to download the slides: https://osf.io/download/{link_id}/\")\n", "IFrame(src=f\"https://mfr.ca-1.osf.io/render?url=https://osf.io/{link_id}/?direct%26mode=render%26action=download%26mode=render\", width=854, height=480)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Submit your feedback\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# @title Submit your feedback\n", "content_review(f\"{feedback_prefix}_Non_Stationarity_and_EVT_Slides\")" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "# Section 0: Definitions and Concepts from Previous Tutorials\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Glossary widget\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# @title Glossary widget\n", "\n", "import ipywidgets as widgets\n", "from IPython.display import display, clear_output\n", "from ipywidgets import interact\n", "\n", "# Create a function that will be called when the dropdown value changes\n", "def handle_dropdown_change(selected_value):\n", " clear_output(wait=True) # Clear previous output\n", " print(f'\\nDefinition: {selected_value}')\n", "\n", "descr_gev = '\\n\\nThe Generalized Value Distribution is a generalized probability distribution'\\\n", " 'with 3 parameters:\\n1. Location parameter - tells you where the distribution is located\\n2.'\\\n", " 'Scale parameter - determines the width of the distribution\\n3.'\\\n", " 'Shape parameter - affects the tails.\\n\\nIt is more suitable for data with skewed histograms. '\n", "descr_boot = '\\n\\nBootstrapping'\n", "descr_pdf = '\\n\\nThe Probability density function (PDF)'\n", "descr_xyrevent = '\\n\\nAn X-year event here means a given level of a variable that we expect to see every X many years.'\\\n", " '\\n\\nExample:\\nThe 100-year event has a 1% change of happening each year...'\\\n", " 'meaning you expect to see at least one 100-year event every 100 years.'\n", "\n", "# Create the dropdown widget\n", "dropdown = widgets.Dropdown(\n", " options=[('GEV', f'Generalized Extreme Value Distribution {descr_gev}'),\n", " ('Bootstrapping', f'Bootstrapping {descr_boot}'),\n", " ('PDF', f'Probability Density Function {descr_pdf}'),\n", " ('X-year event', f'{descr_xyrevent}')\n", " ],\n", " value=f'Generalized Extreme Value Distribution {descr_gev}',\n", " description='Choose abbreviation or concept:',\n", " style={'description_width': 'initial'}\n", ")\n", "\n", "# Attach the function to the dropdown's value change event using interact\n", "interact(handle_dropdown_change, selected_value=dropdown);" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "# Section 1: Download and Explore the Data\n", "As before, we start by visually inspecting the data. Create a plot of the recorded data over time." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": {}, "tags": [] }, "outputs": [], "source": [ "# download file: 'WashingtonDCSSH1930-2022.csv'\n", "filename_WashingtonDCSSH1 = \"WashingtonDCSSH1930-2022.csv\"\n", "url_WashingtonDCSSH1 = \"https://osf.io/4zynp/download\"\n", "data = pd.read_csv(\n", " pooch_load(url_WashingtonDCSSH1, filename_WashingtonDCSSH1), index_col=0\n", ").set_index(\"years\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": {}, "tags": [] }, "outputs": [], "source": [ "data.ssh.plot(\n", " linestyle=\"-\",\n", " marker=\".\",\n", " ylabel=\"Annual Maximum Sea Surface \\nHeight Anomaly (mm)\",\n", " xlabel=\"Time (years)\",\n", " grid=True\n", ")" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "As before, we fit a stationary Generalized Extreme Value (GEV) distribution to the data and print the resulting fit parameters:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": {}, "tags": [] }, "outputs": [], "source": [ "# fit once\n", "shape, loc, scale = gev.fit(data.ssh.values, 0)\n", "print(f\"Fitted parameters:\\nLocation: {loc:.5f}, Scale: {scale:.5f}, Shape: {shape:.5f}\\n\")\n", "\n", "# bootstrap approach with 100 repetitions (N_boot) to quantify uncertainty range\n", "fit = gf.fit_return_levels(data.ssh.values, years=np.arange(1.1, 1000), N_boot=100)" ] }, { "cell_type": "markdown", "metadata": { "execution": {}, "tags": [] }, "source": [ "The former `gev.fit()` is also implemented in the `fit_return_levels()` function from our helper script `gev_functions.py`, hence this resulted in similar parameter estimates. The latter list shows the resulting ranges of the bootstrap approach that helps to quantify the uncertainty range." ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "The following figure integrates multiple elements, including the QQ-plot (as discussed in previous tutorials), the comparison between modeled and empirical probability density functions, and the fitted return level plot as well as the uncertainty estimate from the bootstrapping approach." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": {}, "tags": [] }, "outputs": [], "source": [ "# create figures\n", "fig, axs = plt.subplots(2, 2, constrained_layout=True)\n", "# reduce dimension of axes to have only one index\n", "ax = axs.flatten()\n", "\n", "\n", "# create x-data for QQ-plot (top left)\n", "x = np.linspace(0, 1, 100)\n", "\n", "# calculate quantiles by sorting from min to max\n", "# and plot vs. return levels calculated with ppf (percent point function) and GEV fit parameters,\n", "# cf. e.g. Tut4)\n", "ax[0].plot(gev.ppf(x, shape, loc=loc, scale=scale), np.quantile(data.ssh, x), \"o\")\n", "\n", "# plot identity line\n", "xlim = ax[0].get_xlim()\n", "ylim = ax[0].get_ylim()\n", "ax[0].plot(\n", " [min(xlim[0], ylim[0]), max(xlim[1], ylim[1])],\n", " [min(xlim[0], ylim[0]), max(xlim[1], ylim[1])],\n", " \"k\",\n", ")\n", "#ax[0].set_xlim(xlim)\n", "#ax[0].set_ylim(ylim)\n", "\n", "# create x-data in SSH-data range for a methodology comparison (bottom left)\n", "x = np.linspace(data.ssh.min() - 200, data.ssh.max() + 200, 1000)\n", "\n", "# plot PDF of modeled GEV and empirical PDF estimate (kernel density estimation)\n", "ax[2].plot(x, gev.pdf(x, shape, loc=loc, scale=scale), label=\"Modeled\")\n", "sns.kdeplot(data.ssh, ax=ax[2], label=\"Empirical\")\n", "\n", "# add legend\n", "ax[2].legend()\n", "\n", "\n", "# plot return levels (bottom right) using functions also used in Tutorial 6\n", "gf.plot_return_levels(fit, ax=ax[3])\n", "ax[3].set_xlim(1.5, 1000)\n", "ax[3].set_ylim(0, None)\n", "\n", "# aesthetics\n", "ax[0].set_title(\"QQ-plot\")\n", "ax[2].set_title(\"PDF\")\n", "ax[3].set_title(\"Return Levels\")\n", "\n", "ax[0].set_xlabel(\"SSH Anomalies (mm)\")\n", "ax[0].set_ylabel(\"Return Levels (mm)\")\n", "\n", "ax[2].set_xlabel(\"SSH Anomalies (mm)\")\n", "ax[3].set_xlabel(\"Return Periods (years)\")\n", "ax[3].set_ylabel(\"Return Levels (mm)\")\n", "\n", "ax[1].remove() # remove top right axis" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "
\n", " Click here for a detailed description of the plot.\n", "\n", "*Top left*: \n", "The QQ-plot shows the quantiles versus the return levels both in millimeters, calculated from the sea surface height (SSH) anomaly data and emphasized as blue dots. For small quantiles, the return levels follow the identity line (black), in contrast to larger quantiles, where return levels increase with a smaller slope than the id line. This indicates a difference in the distributions of the two properties.\n", " \n", "*Bottom left*: \n", "A comparison between modeled and empirical probability density functions (PDFs) shows different distributions. The empirical PDF weights the density of larger anomalies more heavily, so it appears that the GEV PDF underestimates large SSH anomalies. In addition, the modeled scale parameter yields a higher maximum than the empirical estimate. The tails are well comparable.\n", "\n", "*Bottom right* \n", "The fitted return levels of the SSH anomalies in millimeters versus return periods log-scaled in years. Blue dots highlight the empirical calculations (using the algorithm introduced in Tutorial 2), while the blue line emphasizes the fitted GEV corresponding relationship. Shaded areas represent the uncertainty ranges that were calculated with a bootstrap approach before. The larger the return periods, the less data on such events is given, hence the uncertainty increases. \n", "
" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "We use the function `estimate_return_level_period()` to compute the X-year return level:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": {}, "tags": [] }, "outputs": [], "source": [ "# X-year of interest\n", "x = 100\n", "print(\n", " \"{}-year return level: {:.2f} mm\".format(x,\n", " gf.estimate_return_level_period(x, loc, scale, shape).mean()\n", " )\n", ")" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "We have the option to make our GEV (Generalized Extreme Value) model non-stationary by introducing a time component to the GEV parameters. This means that the parameters will vary with time or can be dependent on other variables such as global air temperature. We call this **adding covariates** to parameters. \n", "\n", "The simplest way to implement a non-stationary GEV model is by adding a linear time component to the location parameter. Instead of a fixed location parameter like \"100\", it becomes a linear function of time (e.g., $\\text{time} * 1.05 + 80$). This implies that the location parameter increases with time. We can incorporate this into our fitting process. The [scipy GEV](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.genextreme.html#scipy.stats.genextreme) implementation does not support covariates. Therefore for this part, we will use classes and functions from the package [SDFC](https://github.com/yrobink/SDFC-python) (cf. ```sdfc_classes.py```) and are particularly using the `c_loc` option - to add a covariate (`c`) to the location parameter (`loc`)." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": {}, "tags": [] }, "outputs": [], "source": [ "# instantiate a GEV distribution\n", "law_ns = sd.GEV()" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "Due to our small data sample, our fitting method is not always converging to a non-trivial (non-zero) solution. To find adequate parameters of the GEV distribution, we therefore repeat the fitting procedure until we find non-zero coefficients (`coef_`)." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": {} }, "outputs": [], "source": [ "# fit the GEV to the data, while specifying that the location parameter ('loc') is meant\n", "# to be a covariate ('c_') of the time axis (data.index)\n", "for i in range(250):\n", " law_ns.fit(data.ssh.values, c_loc=np.arange(data.index.size))\n", " #print(law_ns.coef_)\n", " # if the first coefficient is not zero, we stop fitting\n", " if law_ns.coef_[0] != 0:\n", " print(f'Found non-trivial solution after {i} fitting iterations.')\n", " print(law_ns.coef_)\n", " break" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "We can use a helper function named `print_law()` from the `extreme_functions.py` script to create a prettified and readable table that contains all fitted parameters." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": {}, "tags": [] }, "outputs": [], "source": [ "ef.print_law(law_ns)" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "Now the location parameter is of type **Covariate** (that is, it *covaries* with the array we provided to `c_loc`), and it now consists of *two* coefficients (intercept and slope) of the linear relationship between the location parameter and the provided array." ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "As observed, we established a linear relationship by associating the location parameter with a simple array of sequential numbers (1, 2, 3, 4, 5, 6, etc.). Alternatively, if we had used an exponential function of time (e.g., 1, 2, 4, 8, 16, by setting `c_loc = np.exp(np.arange(data.index.size)/data.index.size)`), we would have created an exponential relationship.\n", "\n", "Similarly, squaring the array (`c_loc = np.arange(data.index.size)**2`) would have resulted in a quadratic relationship. Note that our subset of SDFC classes is not tested with these exponential and squared relationships, hence a separate installation might be necessary to apply them.\n", "\n", "It's worth noting that various types of relationships can be employed, and this approach can be extended to multiple parameters by utilizing `c_scale` and `c_shape`. For instance, it's possible to establish a relationship with the global mean CO2 concentration rather than solely relying on time, which could potentially enhance the fitting accuracy." ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "Since the location parameter is now time-dependent, we are unable to create a return level/return period plot as we did in the previous tutorials. This would require 93 separate plots, one for each year. However, there is an alternative approach that is visually appealing. We can calculate the level of the X-year event over time and overlay it onto our SSH (Sea Surface Height) anomaly record. This is referred to as the \"**effective return levels**\".\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": {}, "tags": [] }, "outputs": [], "source": [ "fig, ax = plt.subplots()\n", "# plot SSH time series\n", "data.ssh.plot(c=\"k\", label=\"sea level anomaly\", ax=ax)\n", "\n", "# years of interest\n", "x_years = [2, 10, 50, 100, 500]\n", "\n", "# repeat plotting of effective return levels for all years of interest\n", "for x in x_years:\n", " ax.plot(\n", " data.index, estimate_return_level(1 - 1 / x, law_ns), label=f\"{x}-year return level\"\n", " )\n", "\n", "ax.legend()\n", "ax.grid(True)\n", "ax.set_ylabel(\"Annual Maximum Sea Surface \\nHeight Anomaly (mm)\")\n", "ax.set_xlabel(\"Time (years)\")" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "You can evaluate the quality of your model fit by examining the AIC value. \n", "The [Akaike Information Criterion (AIC)]( https://en.wikipedia.org/wiki/Akaike_information_criterion) is useful to estimate how well the model fits the data, while preferring simpler models with fewer free parameters. A lower AIC value indicates a better fit. In the code provided below, a simple function is defined to calculate the AIC for a given fitted *model*." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": {}, "tags": [] }, "outputs": [], "source": [ "def compute_aic(model):\n", " return 2 * len(model.coef_) + 2 * model.info_.mle_optim_result.fun" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": {}, "tags": [] }, "outputs": [], "source": [ "compute_aic(law_ns)" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "For comparison, let's repeat the computation without the covariate by dropping the c_loc option. Then we can compute the AIC for the stationary model, and compare how well the model fits the data:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": {}, "tags": [] }, "outputs": [], "source": [ "law_stat = sd.GEV()\n", "law_stat.fit(data.ssh.values)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": {}, "tags": [] }, "outputs": [], "source": [ "compute_aic(law_stat)" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "You can see that AIC is lower for the model that has the location parameter as a *covariate* - this suggests this model fits the data better." ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "## Coding Exercises 1\n", "\n", "**Scale and/or shape as a function of time**\n", "\n", "You have seen above how to make the location parameter a function of time, by providing the keyword `c_loc` to the `fit()` function.\n", "1. Now repeat this procedure, making scale, shape, or both parameters a function of time. To do this, you need to instantiate a new GEV object and call it e.g. `law_ns_scale = sd.GEV()`. Then call the `.fit()` function similarly to before, but replace `c_loc` with `c_scale` and/or `c_shape`. Plot the effective return levels by providing your fitted model to the `estimate_return_level()` function, as done previously. Observe if the model fits the data better, worse, or remains the same.\n", "2. Lastly, compute the AIC for your new fits. Compare the AIC values to determine if the expectation from the previous step is met." ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "execution": {} }, "source": [ "```python\n", "# instantiate a GEV distribution\n", "law_ns_scale = ...\n", "\n", "# fit the GEV to the data, while specifying that the scale parameter ('scale') is meant to be a covariate ('_c') of the time axis (data.index)\n", "_ = ...\n", "\n", "# plot results\n", "fig, ax = plt.subplots()\n", "data.ssh.plot(c=\"k\", ax=ax)\n", "\n", "# years of interest\n", "x_years = [2, 10, 50, 100, 500]\n", "\n", "# repeat plotting of effective return levels for all years of interest\n", "for x in x_years:\n", " _ = ...\n", "\n", "ax.legend()\n", "ax.grid(True)\n", "ax.set_xlabel(\"Time (years)\")\n", "ax.set_ylabel(\"Annual Maximum Sea Surface \\nHeight Anomaly (mm)\")\n", "ax.set_title(\"Scale as Function of Time\")\n", "\n", "```" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": {}, "tags": [] }, "outputs": [], "source": [ "# to_remove solution\n", "\n", "# instantiate a GEV distribution\n", "law_ns_scale = sd.GEV()\n", "\n", "# fit the GEV to the data, while specifying that the scale parameter ('scale') is meant to be a covariate ('_c') of the time axis (data.index)\n", "_ = law_ns_scale.fit(data.ssh.values, c_scale=np.arange(data.index.size))\n", "\n", "# plot results\n", "fig, ax = plt.subplots()\n", "data.ssh.plot(c=\"k\", ax=ax)\n", "\n", "# years of interest\n", "x_years = [2, 10, 50, 100, 500]\n", "\n", "# repeat plotting of effective return levels for all years of interest\n", "for x in x_years:\n", " _ = ax.plot(\n", " data.index, estimate_return_level(1 - 1 / x, law_ns), label=f\"{x}-year return level\"\n", " )\n", "\n", "ax.legend()\n", "ax.grid(True)\n", "ax.set_xlabel(\"Time (years)\")\n", "ax.set_ylabel(\"Annual Maximum Sea Surface \\nHeight Anomaly (mm)\")\n", "ax.set_title(\"Scale as Function of Time\")" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "execution": {} }, "source": [ "```python\n", "# instantiate a GEV distribution\n", "law_ns_shape = ...\n", "\n", "# fit the GEV to the data, while specifying that the shape parameter ('shape') is meant to be a covariate ('_c') of the time axis (data.index)\n", "_ = ...\n", "\n", "# plot results\n", "fig, ax = plt.subplots()\n", "data.ssh.plot(c=\"k\", ax=ax)\n", "\n", "# years of interest\n", "x_years = [2, 10, 50, 100, 500]\n", "\n", "# repeat plotting of effective return levels for all years of interest\n", "for x in x_years:\n", " _ = ...\n", "\n", "ax.legend()\n", "ax.grid(True)\n", "ax.set_xlabel(\"Time (years)\")\n", "ax.set_ylabel(\"Annual Maximum Sea Surface \\nHeight Anomaly (mm)\")\n", "ax.set_title(\"Shape as Function of Time\")\n", "\n", "```" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": {}, "tags": [] }, "outputs": [], "source": [ "# to_remove solution\n", "\n", "# instantiate a GEV distribution\n", "law_ns_shape = sd.GEV()\n", "\n", "# fit the GEV to the data, while specifying that the shape parameter ('shape') is meant to be a covariate ('_c') of the time axis (data.index)\n", "_ = law_ns_shape.fit(data.ssh.values, c_shape=np.arange(data.index.size))\n", "\n", "# plot results\n", "fig, ax = plt.subplots()\n", "data.ssh.plot(c=\"k\", ax=ax)\n", "\n", "# years of interest\n", "x_years = [2, 10, 50, 100, 500]\n", "\n", "# repeat plotting of effective return levels for all years of interest\n", "for x in x_years:\n", " _ = ax.plot(\n", " data.index, estimate_return_level(1 - 1 / x, law_ns), label=f\"{x}-year return level\"\n", " )\n", "\n", "ax.legend()\n", "ax.grid(True)\n", "ax.set_xlabel(\"Time (years)\")\n", "ax.set_ylabel(\"Annual Maximum Sea Surface \\nHeight Anomaly (mm)\")\n", "ax.set_title(\"Shape as Function of Time\")" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "execution": {} }, "source": [ "```python\n", "# initialize a GEV distribution\n", "law_ns_loc_scale = ...\n", "\n", "# fit the GEV to the data using c_loc and c_scale\n", "_ = ...\n", "\n", "# plot results\n", "fig, ax = plt.subplots()\n", "data.ssh.plot(c=\"k\", ax=ax)\n", "\n", "# years of interest\n", "x_years = [2, 10, 50, 100, 500]\n", "\n", "# repeat plotting of effective return levels for all years of interest\n", "for x in x_years:\n", " _ = ...\n", "\n", "ax.legend()\n", "ax.grid(True)\n", "ax.set_xlabel(\"Time (years)\")\n", "ax.set_ylabel(\"Annual Maximum Sea Surface \\nHeight Anomaly (mm)\")\n", "ax.set_title(\"Scale and Location as Functions of Time\")\n", "\n", "```" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": {}, "tags": [] }, "outputs": [], "source": [ "# to_remove solution\n", "\n", "# initialize a GEV distribution\n", "law_ns_loc_scale = sd.GEV()\n", "\n", "# fit the GEV to the data using c_loc and c_scale\n", "_ = law_ns_loc_scale.fit(\n", " data.ssh.values,\n", " c_loc=np.arange(data.index.size),\n", " c_scale=np.arange(data.index.size),\n", ")\n", "\n", "# plot results\n", "fig, ax = plt.subplots()\n", "data.ssh.plot(c=\"k\", ax=ax)\n", "\n", "# years of interest\n", "x_years = [2, 10, 50, 100, 500]\n", "\n", "# repeat plotting of effective return levels for all years of interest\n", "for x in x_years:\n", " _ = ax.plot(\n", " data.index, estimate_return_level(1 - 1 / x, law_ns), label=f\"{x}-year return level\"\n", " )\n", "\n", "ax.legend()\n", "ax.grid(True)\n", "ax.set_xlabel(\"Time (years)\")\n", "ax.set_ylabel(\"Annual Maximum Sea Surface \\nHeight Anomaly (mm)\")\n", "ax.set_title(\"Scale and Location as Functions of Time\")" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "execution": {} }, "source": [ "```python\n", "#################################################\n", "## TODO for students: ##\n", "## Add the respective Akaike Information Criterion (AIC) computation to the data list. ##\n", "# Remove or comment the following line of code once you have completed the exercise:\n", "raise NotImplementedError(\"Student exercise: Add the respective Akaike Information Criterion (AIC) computation to the data list.\")\n", "#################################################\n", "\n", "aics = pd.Series(\n", " index=[\"Location\", \"Scale\", \"Shape\", \"Location and Scale\"],\n", " data=[\n", " ...,\n", " ...,\n", " ...,\n", " ...\n", " ],\n", ")\n", "\n", "aics.sort_values()\n", "\n", "```" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": {}, "tags": [] }, "outputs": [], "source": [ "# to_remove solution\n", "\n", "aics = pd.Series(\n", " index=[\"Location\", \"Scale\", \"Shape\", \"Location and Scale\"],\n", " data=[\n", " compute_aic(law_ns),\n", " compute_aic(law_ns_scale),\n", " compute_aic(law_ns_shape),\n", " compute_aic(law_ns_loc_scale),\n", " ],\n", ")\n", "\n", "aics.sort_values()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Submit your feedback\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# @title Submit your feedback\n", "content_review(f\"{feedback_prefix}_Coding_Exercise_1\")" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "# Summary\n", "\n", "In this tutorial, you adapted the GEV model to better fit non-stationary sea surface height (SSH) data from Washington DC. You learned how to:\n", "\n", "- Modify the GEV distribution to accommodate a time-dependent parameter, enabling non-stationarity.\n", "- Compare different models based on their time-dependent parameters.\n", "- Calculate effective return levels with the non-stationary GEV model.\n", "- Assessed the fit of the model using the Akaike Information Criterion (AIC)." ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "# Resources\n", "\n", "Original data from this tutorial can be found [here](https://climexp.knmi.nl/getsealev.cgi?id=someone@somewhere&WMO=360&STATION=WASHINGTON_DC&extraargs=). Note the data used in this tutorial were preprocessed to extract the annual maximum values." ] } ], "metadata": { "colab": { "collapsed_sections": [], "include_colab_link": true, "name": "W2D3_Tutorial7", "toc_visible": true }, "kernel": { "display_name": "Python 3", "language": "python", "name": "python3" }, "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.19" } }, "nbformat": 4, "nbformat_minor": 4 }