{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Lecture 7 - Data Manipulation with pandas"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "[![View notebook on Github](https://img.shields.io/static/v1.svg?logo=github&label=Repo&message=View%20On%20Github&color=lightgrey)](https://github.com/avakanski/Fall-2024-Applied-Data-Science-with-Python/blob/main/docs/Lectures/Theme_2-Data_Engineering/Lecture_7-Pandas/Lecture_7-Pandas.ipynb)\n",
    "[![Open In Collab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/avakanski/Fall-2024-Applied-Data-Science-with-Python/blob/main/docs/Lectures/Theme_2-Data_Engineering/Lecture_7-Pandas/Lecture_7-Pandas.ipynb) "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id='top'></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- [7.1 Introduction to pandas](#7.1-introduction-to-pandas)\n",
    "- [7.2 Importing Data and Summary Statistics](#7.2-importing-data-and-summary-statistics)\n",
    "- [7.3 Rename, Index, and Slice](#7.3-rename,-index,-and-slice)\n",
    "- [7.4 Creating New Columns, Reordering](#7.4-creating-new-columns,-reordering)\n",
    "- [7.5 Removing Columns and Rows](#7.5-removing-columns-and-rows)\n",
    "- [7.6 Merging DataFrames](#7.6-merging-dataframes)\n",
    "- [7.7 Calculating Unique and Missing Values](#7.7-calculating-unique-and-missing-values)\n",
    "- [7.8 Dealing With Missing Values: Boolean Indexing](#7.8-dealing-with-missing-values:-boolean-indexing)\n",
    "- [7.9 Exporting A DataFrame to csv](#7.9-exporting-a-dataFrame-to-csv)\n",
    "- [References](#references)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 7.1 Introduction to `pandas` <a id=\"7.1-introduction-to-pandas\"/>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`pandas` is a library designed for working with structured data, such as tabular data (e.g., from .csv files, Excel files) or relational databases (e.g., SQL). \n",
    "\n",
    "> The **DataFrame** object in `pandas` is a 2-dimensional tabular, column-oriented data structure. The DateFrame is similar to an Excel spreadsheet and can store data of different types (including text characters, integers, floating-point values, categorical data, and more)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<img src=\"images/pic1.jpg\" width=\"350\">\n",
    "<em>Figure source: Reference [2].</em>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The `pandas` name is derived from the term *panel data*, which is a term from economics for multi-dimensional structured data. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 7.2 Importing Data and Summary Statistics <a id=\"7.2-importing-data-and-summary-statistics\"/>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's begin by importing the `pandas` package using the common abbreviation `pd`. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "A wide range of input/output formats are supported by `pandas`:\n",
    "\n",
    "* CSV, text\n",
    "* SQL database\n",
    "* Excel\n",
    "* HDF5\n",
    "* json\n",
    "* html\n",
    "* pickle\n",
    "* sas, stata\n",
    "* ..."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For importing .csv files, the function `read_csv()` in `pandas` allows to easily import data. By default, it assumes that the data is comma-separated, but we can also specify the delimiter used in the data (e.g., tab, semicolon, etc.). There are several parameters that can be specified in `read_csv()`. See the documentation [here](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html). \n",
    "\n",
    "Let's load the data in the file `country-total` file located in the folder `data` and save it under the name `unemployment`. This file contains unemployment information for several countries over a period of 30 years.\n",
    "\n",
    "The function `read_csv()` returns a `DataFrame`, as shown below. Note that the DataFrame has 20,796 rows, and the output of the cell displayed only the first 5 and last 5 rows (or the first and last 30, depending on your system), since displaying all 20,976 rows is probably not what we want at this point anyway."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>country</th>\n",
       "      <th>seasonality</th>\n",
       "      <th>month</th>\n",
       "      <th>unemployment</th>\n",
       "      <th>unemployment_rate</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>at</td>\n",
       "      <td>nsa</td>\n",
       "      <td>1993.01</td>\n",
       "      <td>171000</td>\n",
       "      <td>4.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>at</td>\n",
       "      <td>nsa</td>\n",
       "      <td>1993.02</td>\n",
       "      <td>175000</td>\n",
       "      <td>4.6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>at</td>\n",
       "      <td>nsa</td>\n",
       "      <td>1993.03</td>\n",
       "      <td>166000</td>\n",
       "      <td>4.4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>at</td>\n",
       "      <td>nsa</td>\n",
       "      <td>1993.04</td>\n",
       "      <td>157000</td>\n",
       "      <td>4.1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>at</td>\n",
       "      <td>nsa</td>\n",
       "      <td>1993.05</td>\n",
       "      <td>147000</td>\n",
       "      <td>3.9</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>20791</th>\n",
       "      <td>uk</td>\n",
       "      <td>trend</td>\n",
       "      <td>2010.06</td>\n",
       "      <td>2429000</td>\n",
       "      <td>7.7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>20792</th>\n",
       "      <td>uk</td>\n",
       "      <td>trend</td>\n",
       "      <td>2010.07</td>\n",
       "      <td>2422000</td>\n",
       "      <td>7.7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>20793</th>\n",
       "      <td>uk</td>\n",
       "      <td>trend</td>\n",
       "      <td>2010.08</td>\n",
       "      <td>2429000</td>\n",
       "      <td>7.7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>20794</th>\n",
       "      <td>uk</td>\n",
       "      <td>trend</td>\n",
       "      <td>2010.09</td>\n",
       "      <td>2447000</td>\n",
       "      <td>7.8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>20795</th>\n",
       "      <td>uk</td>\n",
       "      <td>trend</td>\n",
       "      <td>2010.10</td>\n",
       "      <td>2455000</td>\n",
       "      <td>7.8</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>20796 rows × 5 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "      country seasonality    month  unemployment  unemployment_rate\n",
       "0          at         nsa  1993.01        171000                4.5\n",
       "1          at         nsa  1993.02        175000                4.6\n",
       "2          at         nsa  1993.03        166000                4.4\n",
       "3          at         nsa  1993.04        157000                4.1\n",
       "4          at         nsa  1993.05        147000                3.9\n",
       "...       ...         ...      ...           ...                ...\n",
       "20791      uk       trend  2010.06       2429000                7.7\n",
       "20792      uk       trend  2010.07       2422000                7.7\n",
       "20793      uk       trend  2010.08       2429000                7.7\n",
       "20794      uk       trend  2010.09       2447000                7.8\n",
       "20795      uk       trend  2010.10       2455000                7.8\n",
       "\n",
       "[20796 rows x 5 columns]"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Import data\n",
    "unemployment = pd.read_csv('data/country_total.csv')\n",
    "# Show the DataFrame\n",
    "unemployment"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We could have also used `print` to display the DataFrame."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "      country seasonality    month  unemployment  unemployment_rate\n",
      "0          at         nsa  1993.01        171000                4.5\n",
      "1          at         nsa  1993.02        175000                4.6\n",
      "2          at         nsa  1993.03        166000                4.4\n",
      "3          at         nsa  1993.04        157000                4.1\n",
      "4          at         nsa  1993.05        147000                3.9\n",
      "...       ...         ...      ...           ...                ...\n",
      "20791      uk       trend  2010.06       2429000                7.7\n",
      "20792      uk       trend  2010.07       2422000                7.7\n",
      "20793      uk       trend  2010.08       2429000                7.7\n",
      "20794      uk       trend  2010.09       2447000                7.8\n",
      "20795      uk       trend  2010.10       2455000                7.8\n",
      "\n",
      "[20796 rows x 5 columns]\n"
     ]
    }
   ],
   "source": [
    "print(unemployment)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "When the DataFrames have a large number of rows that take a large portion of the screen, we can inspect the data by using the `.head()` method. By default, this shows the **header** (names of the columns, commonly referred to as **column labels**) and the first five rows (having indices ranging from 0 to 4, in the first column in the table). The **indices** are also referred to as **row labels**."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>country</th>\n",
       "      <th>seasonality</th>\n",
       "      <th>month</th>\n",
       "      <th>unemployment</th>\n",
       "      <th>unemployment_rate</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>at</td>\n",
       "      <td>nsa</td>\n",
       "      <td>1993.01</td>\n",
       "      <td>171000</td>\n",
       "      <td>4.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>at</td>\n",
       "      <td>nsa</td>\n",
       "      <td>1993.02</td>\n",
       "      <td>175000</td>\n",
       "      <td>4.6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>at</td>\n",
       "      <td>nsa</td>\n",
       "      <td>1993.03</td>\n",
       "      <td>166000</td>\n",
       "      <td>4.4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>at</td>\n",
       "      <td>nsa</td>\n",
       "      <td>1993.04</td>\n",
       "      <td>157000</td>\n",
       "      <td>4.1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>at</td>\n",
       "      <td>nsa</td>\n",
       "      <td>1993.05</td>\n",
       "      <td>147000</td>\n",
       "      <td>3.9</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  country seasonality    month  unemployment  unemployment_rate\n",
       "0      at         nsa  1993.01        171000                4.5\n",
       "1      at         nsa  1993.02        175000                4.6\n",
       "2      at         nsa  1993.03        166000                4.4\n",
       "3      at         nsa  1993.04        157000                4.1\n",
       "4      at         nsa  1993.05        147000                3.9"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "unemployment.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Passing an integer number as an argument to `.head(n)` returns that number of rows. To see the last $n$ rows, use `.tail(n)`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>country</th>\n",
       "      <th>seasonality</th>\n",
       "      <th>month</th>\n",
       "      <th>unemployment</th>\n",
       "      <th>unemployment_rate</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>20788</th>\n",
       "      <td>uk</td>\n",
       "      <td>trend</td>\n",
       "      <td>2010.03</td>\n",
       "      <td>2437000</td>\n",
       "      <td>7.8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>20789</th>\n",
       "      <td>uk</td>\n",
       "      <td>trend</td>\n",
       "      <td>2010.04</td>\n",
       "      <td>2419000</td>\n",
       "      <td>7.8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>20790</th>\n",
       "      <td>uk</td>\n",
       "      <td>trend</td>\n",
       "      <td>2010.05</td>\n",
       "      <td>2419000</td>\n",
       "      <td>7.7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>20791</th>\n",
       "      <td>uk</td>\n",
       "      <td>trend</td>\n",
       "      <td>2010.06</td>\n",
       "      <td>2429000</td>\n",
       "      <td>7.7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>20792</th>\n",
       "      <td>uk</td>\n",
       "      <td>trend</td>\n",
       "      <td>2010.07</td>\n",
       "      <td>2422000</td>\n",
       "      <td>7.7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>20793</th>\n",
       "      <td>uk</td>\n",
       "      <td>trend</td>\n",
       "      <td>2010.08</td>\n",
       "      <td>2429000</td>\n",
       "      <td>7.7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>20794</th>\n",
       "      <td>uk</td>\n",
       "      <td>trend</td>\n",
       "      <td>2010.09</td>\n",
       "      <td>2447000</td>\n",
       "      <td>7.8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>20795</th>\n",
       "      <td>uk</td>\n",
       "      <td>trend</td>\n",
       "      <td>2010.10</td>\n",
       "      <td>2455000</td>\n",
       "      <td>7.8</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      country seasonality    month  unemployment  unemployment_rate\n",
       "20788      uk       trend  2010.03       2437000                7.8\n",
       "20789      uk       trend  2010.04       2419000                7.8\n",
       "20790      uk       trend  2010.05       2419000                7.7\n",
       "20791      uk       trend  2010.06       2429000                7.7\n",
       "20792      uk       trend  2010.07       2422000                7.7\n",
       "20793      uk       trend  2010.08       2429000                7.7\n",
       "20794      uk       trend  2010.09       2447000                7.8\n",
       "20795      uk       trend  2010.10       2455000                7.8"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# show the last 8 rows\n",
    "unemployment.tail(8)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To find the number of rows in a DataFrame, you can use the `len()` function, as with Python lists and other sequences."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "20796"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "len(unemployment)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Alternatively, we can use the `shape` attribute to find the numbers of rows and columns, as with NumPy arrays. The cell output is a tuple, showing that there are 20,796 rows and 5 columns. Note that the left-most column in the above table showing row indices is not part of the data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(20796, 5)"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "unemployment.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "A useful method that generates various summary statistics of a DataFrame is `.describe()`, as shown below.\n",
    "\n",
    "Notice in the above cell that the DataFrame has 5 columns, but the first 2 columns (country and seasonality) have textual (strings) data, and therefore the summary statistics are shown only for the columns with numeric data (month, unemployment, and unemployment_rate). If `.describe()` is called on textual data only, it will return the count, number of unique values, and the most frequent value along with its count."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>month</th>\n",
       "      <th>unemployment</th>\n",
       "      <th>unemployment_rate</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>count</th>\n",
       "      <td>20796.000000</td>\n",
       "      <td>2.079600e+04</td>\n",
       "      <td>19851.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>mean</th>\n",
       "      <td>1999.401290</td>\n",
       "      <td>7.900818e+05</td>\n",
       "      <td>8.179764</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>std</th>\n",
       "      <td>7.483751</td>\n",
       "      <td>1.015280e+06</td>\n",
       "      <td>3.922533</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>min</th>\n",
       "      <td>1983.010000</td>\n",
       "      <td>2.000000e+03</td>\n",
       "      <td>1.100000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25%</th>\n",
       "      <td>1994.090000</td>\n",
       "      <td>1.400000e+05</td>\n",
       "      <td>5.200000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>50%</th>\n",
       "      <td>2001.010000</td>\n",
       "      <td>3.100000e+05</td>\n",
       "      <td>7.600000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>75%</th>\n",
       "      <td>2006.010000</td>\n",
       "      <td>1.262250e+06</td>\n",
       "      <td>10.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>max</th>\n",
       "      <td>2010.120000</td>\n",
       "      <td>4.773000e+06</td>\n",
       "      <td>20.900000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "              month  unemployment  unemployment_rate\n",
       "count  20796.000000  2.079600e+04       19851.000000\n",
       "mean    1999.401290  7.900818e+05           8.179764\n",
       "std        7.483751  1.015280e+06           3.922533\n",
       "min     1983.010000  2.000000e+03           1.100000\n",
       "25%     1994.090000  1.400000e+05           5.200000\n",
       "50%     2001.010000  3.100000e+05           7.600000\n",
       "75%     2006.010000  1.262250e+06          10.000000\n",
       "max     2010.120000  4.773000e+06          20.900000"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "unemployment.describe()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "It is also possible to calculate individual statistics, such as `.min()`, `.max ()`, or `.mean()`, instead of using summary statistics with `.describe()`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "country                   at\n",
       "seasonality              nsa\n",
       "month                1983.01\n",
       "unemployment            2000\n",
       "unemployment_rate        1.1\n",
       "dtype: object"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "unemployment.min()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To view the data types for each column, use the `dtypes` attribute of the *unemployment* DataFrame. The data types in this case are strings (`object` type), floats (`float64` type), and integers (`int64` type)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "country               object\n",
       "seasonality           object\n",
       "month                float64\n",
       "unemployment           int64\n",
       "unemployment_rate    float64\n",
       "dtype: object"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "unemployment.dtypes"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And one more way to get a summary of a DataFrame is by using `info()`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'pandas.core.frame.DataFrame'>\n",
      "RangeIndex: 20796 entries, 0 to 20795\n",
      "Data columns (total 5 columns):\n",
      " #   Column             Non-Null Count  Dtype  \n",
      "---  ------             --------------  -----  \n",
      " 0   country            20796 non-null  object \n",
      " 1   seasonality        20796 non-null  object \n",
      " 2   month              20796 non-null  float64\n",
      " 3   unemployment       20796 non-null  int64  \n",
      " 4   unemployment_rate  19851 non-null  float64\n",
      "dtypes: float64(2), int64(1), object(2)\n",
      "memory usage: 812.5+ KB\n"
     ]
    }
   ],
   "source": [
    "unemployment.info()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can retrieve the name of the columns in the `unemployment` DataFrame by using the attribute `columns`, as in the next cell. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Index(['country', 'seasonality', 'month', 'unemployment', 'unemployment_rate'], dtype='object')"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "unemployment.columns"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Note that the type of the data for the returned column names is an Index object. We can also easily convert the column names into a list by using the method `tolist()`. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['country', 'seasonality', 'month', 'unemployment', 'unemployment_rate']"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "unemployment.columns.tolist()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Alternatively, we can achieve the same by using the Python built-in function `list`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['country', 'seasonality', 'month', 'unemployment', 'unemployment_rate']"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "list(unemployment.columns)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Import Data From a URL\n",
    "\n",
    "Above, we imported the unemployment data using the function `read_csv` and a relative file path to the `data` directory. The function `read_csv` is very flexible and it also allows importing data using a URL as the file path. \n",
    "\n",
    "A csv file with data on world countries and their abbreviations is located at [https://raw.githubusercontent.com/dlab-berkeley/introduction-to-pandas/master/data/countries.csv](https://raw.githubusercontent.com/dlab-berkeley/introduction-to-pandas/master/data/countries.csv)\n",
    "\n",
    "Using `read_csv`, we can import the country data and save it to the variable `countries`. This DataFrame has 30 rows."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>country</th>\n",
       "      <th>google_country_code</th>\n",
       "      <th>country_group</th>\n",
       "      <th>name_en</th>\n",
       "      <th>name_fr</th>\n",
       "      <th>name_de</th>\n",
       "      <th>latitude</th>\n",
       "      <th>longitude</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>at</td>\n",
       "      <td>AT</td>\n",
       "      <td>eu</td>\n",
       "      <td>Austria</td>\n",
       "      <td>Autriche</td>\n",
       "      <td>Österreich</td>\n",
       "      <td>47.696554</td>\n",
       "      <td>13.345980</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>be</td>\n",
       "      <td>BE</td>\n",
       "      <td>eu</td>\n",
       "      <td>Belgium</td>\n",
       "      <td>Belgique</td>\n",
       "      <td>Belgien</td>\n",
       "      <td>50.501045</td>\n",
       "      <td>4.476674</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>bg</td>\n",
       "      <td>BG</td>\n",
       "      <td>eu</td>\n",
       "      <td>Bulgaria</td>\n",
       "      <td>Bulgarie</td>\n",
       "      <td>Bulgarien</td>\n",
       "      <td>42.725674</td>\n",
       "      <td>25.482322</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>hr</td>\n",
       "      <td>HR</td>\n",
       "      <td>non-eu</td>\n",
       "      <td>Croatia</td>\n",
       "      <td>Croatie</td>\n",
       "      <td>Kroatien</td>\n",
       "      <td>44.746643</td>\n",
       "      <td>15.340844</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>cy</td>\n",
       "      <td>CY</td>\n",
       "      <td>eu</td>\n",
       "      <td>Cyprus</td>\n",
       "      <td>Chypre</td>\n",
       "      <td>Zypern</td>\n",
       "      <td>35.129141</td>\n",
       "      <td>33.428682</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>cz</td>\n",
       "      <td>CZ</td>\n",
       "      <td>eu</td>\n",
       "      <td>Czech Republic</td>\n",
       "      <td>République tchèque</td>\n",
       "      <td>Tschechische Republik</td>\n",
       "      <td>49.803531</td>\n",
       "      <td>15.474998</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>dk</td>\n",
       "      <td>DK</td>\n",
       "      <td>eu</td>\n",
       "      <td>Denmark</td>\n",
       "      <td>Danemark</td>\n",
       "      <td>Dänemark</td>\n",
       "      <td>55.939684</td>\n",
       "      <td>9.516689</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>ee</td>\n",
       "      <td>EE</td>\n",
       "      <td>eu</td>\n",
       "      <td>Estonia</td>\n",
       "      <td>Estonie</td>\n",
       "      <td>Estland</td>\n",
       "      <td>58.592469</td>\n",
       "      <td>25.806950</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>fi</td>\n",
       "      <td>FI</td>\n",
       "      <td>eu</td>\n",
       "      <td>Finland</td>\n",
       "      <td>Finlande</td>\n",
       "      <td>Finnland</td>\n",
       "      <td>64.950159</td>\n",
       "      <td>26.067564</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>fr</td>\n",
       "      <td>FR</td>\n",
       "      <td>eu</td>\n",
       "      <td>France</td>\n",
       "      <td>France</td>\n",
       "      <td>Frankreich</td>\n",
       "      <td>46.710994</td>\n",
       "      <td>1.718561</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>de</td>\n",
       "      <td>DE</td>\n",
       "      <td>eu</td>\n",
       "      <td>Germany (including  former GDR from 1991)</td>\n",
       "      <td>Allemagne (incluant l'ancienne RDA à partir de...</td>\n",
       "      <td>Deutschland (einschließlich der ehemaligen DDR...</td>\n",
       "      <td>51.163825</td>\n",
       "      <td>10.454048</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>gr</td>\n",
       "      <td>GR</td>\n",
       "      <td>eu</td>\n",
       "      <td>Greece</td>\n",
       "      <td>Grèce</td>\n",
       "      <td>Griechenland</td>\n",
       "      <td>39.698467</td>\n",
       "      <td>21.577256</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>hu</td>\n",
       "      <td>HU</td>\n",
       "      <td>eu</td>\n",
       "      <td>Hungary</td>\n",
       "      <td>Hongrie</td>\n",
       "      <td>Ungarn</td>\n",
       "      <td>47.161163</td>\n",
       "      <td>19.504265</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>ie</td>\n",
       "      <td>IE</td>\n",
       "      <td>eu</td>\n",
       "      <td>Ireland</td>\n",
       "      <td>Irlande</td>\n",
       "      <td>Irland</td>\n",
       "      <td>53.415260</td>\n",
       "      <td>-8.239122</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>it</td>\n",
       "      <td>IT</td>\n",
       "      <td>eu</td>\n",
       "      <td>Italy</td>\n",
       "      <td>Italie</td>\n",
       "      <td>Italien</td>\n",
       "      <td>42.504191</td>\n",
       "      <td>12.573787</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>lv</td>\n",
       "      <td>LV</td>\n",
       "      <td>eu</td>\n",
       "      <td>Latvia</td>\n",
       "      <td>Lettonie</td>\n",
       "      <td>Lettland</td>\n",
       "      <td>56.880117</td>\n",
       "      <td>24.606555</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>lt</td>\n",
       "      <td>LT</td>\n",
       "      <td>eu</td>\n",
       "      <td>Lithuania</td>\n",
       "      <td>Lituanie</td>\n",
       "      <td>Litauen</td>\n",
       "      <td>55.173687</td>\n",
       "      <td>23.943168</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>lu</td>\n",
       "      <td>LU</td>\n",
       "      <td>eu</td>\n",
       "      <td>Luxembourg</td>\n",
       "      <td>Luxembourg</td>\n",
       "      <td>Luxemburg</td>\n",
       "      <td>49.815319</td>\n",
       "      <td>6.133352</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18</th>\n",
       "      <td>mt</td>\n",
       "      <td>MT</td>\n",
       "      <td>eu</td>\n",
       "      <td>Malta</td>\n",
       "      <td>Malte</td>\n",
       "      <td>Malta</td>\n",
       "      <td>35.902422</td>\n",
       "      <td>14.447461</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19</th>\n",
       "      <td>nl</td>\n",
       "      <td>NL</td>\n",
       "      <td>eu</td>\n",
       "      <td>Netherlands</td>\n",
       "      <td>Pays-Bas</td>\n",
       "      <td>Niederlande</td>\n",
       "      <td>52.108118</td>\n",
       "      <td>5.330198</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>20</th>\n",
       "      <td>no</td>\n",
       "      <td>NO</td>\n",
       "      <td>non-eu</td>\n",
       "      <td>Norway</td>\n",
       "      <td>Norvège</td>\n",
       "      <td>Norwegen</td>\n",
       "      <td>64.556460</td>\n",
       "      <td>12.665766</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>21</th>\n",
       "      <td>pl</td>\n",
       "      <td>PL</td>\n",
       "      <td>eu</td>\n",
       "      <td>Poland</td>\n",
       "      <td>Pologne</td>\n",
       "      <td>Polen</td>\n",
       "      <td>51.918907</td>\n",
       "      <td>19.134334</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>22</th>\n",
       "      <td>pt</td>\n",
       "      <td>PT</td>\n",
       "      <td>eu</td>\n",
       "      <td>Portugal</td>\n",
       "      <td>Portugal</td>\n",
       "      <td>Portugal</td>\n",
       "      <td>39.558069</td>\n",
       "      <td>-7.844941</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>23</th>\n",
       "      <td>ro</td>\n",
       "      <td>RO</td>\n",
       "      <td>eu</td>\n",
       "      <td>Romania</td>\n",
       "      <td>Roumanie</td>\n",
       "      <td>Rumänien</td>\n",
       "      <td>45.942611</td>\n",
       "      <td>24.990152</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>24</th>\n",
       "      <td>sk</td>\n",
       "      <td>SK</td>\n",
       "      <td>eu</td>\n",
       "      <td>Slovakia</td>\n",
       "      <td>Slovaquie</td>\n",
       "      <td>Slowakei</td>\n",
       "      <td>48.672644</td>\n",
       "      <td>19.700032</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25</th>\n",
       "      <td>si</td>\n",
       "      <td>SI</td>\n",
       "      <td>eu</td>\n",
       "      <td>Slovenia</td>\n",
       "      <td>Slovénie</td>\n",
       "      <td>Slowenien</td>\n",
       "      <td>46.149259</td>\n",
       "      <td>14.986617</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>26</th>\n",
       "      <td>es</td>\n",
       "      <td>ES</td>\n",
       "      <td>eu</td>\n",
       "      <td>Spain</td>\n",
       "      <td>Espagne</td>\n",
       "      <td>Spanien</td>\n",
       "      <td>39.895013</td>\n",
       "      <td>-2.988296</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>27</th>\n",
       "      <td>se</td>\n",
       "      <td>SE</td>\n",
       "      <td>eu</td>\n",
       "      <td>Sweden</td>\n",
       "      <td>Suède</td>\n",
       "      <td>Schweden</td>\n",
       "      <td>62.198467</td>\n",
       "      <td>14.896307</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>28</th>\n",
       "      <td>tr</td>\n",
       "      <td>TR</td>\n",
       "      <td>non-eu</td>\n",
       "      <td>Turkey</td>\n",
       "      <td>Turquie</td>\n",
       "      <td>Türkei</td>\n",
       "      <td>38.952942</td>\n",
       "      <td>35.439795</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>29</th>\n",
       "      <td>uk</td>\n",
       "      <td>GB</td>\n",
       "      <td>eu</td>\n",
       "      <td>United Kingdom</td>\n",
       "      <td>Royaume-Uni</td>\n",
       "      <td>Vereinigtes Königreich</td>\n",
       "      <td>54.315447</td>\n",
       "      <td>-2.232612</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   country google_country_code country_group  \\\n",
       "0       at                  AT            eu   \n",
       "1       be                  BE            eu   \n",
       "2       bg                  BG            eu   \n",
       "3       hr                  HR        non-eu   \n",
       "4       cy                  CY            eu   \n",
       "5       cz                  CZ            eu   \n",
       "6       dk                  DK            eu   \n",
       "7       ee                  EE            eu   \n",
       "8       fi                  FI            eu   \n",
       "9       fr                  FR            eu   \n",
       "10      de                  DE            eu   \n",
       "11      gr                  GR            eu   \n",
       "12      hu                  HU            eu   \n",
       "13      ie                  IE            eu   \n",
       "14      it                  IT            eu   \n",
       "15      lv                  LV            eu   \n",
       "16      lt                  LT            eu   \n",
       "17      lu                  LU            eu   \n",
       "18      mt                  MT            eu   \n",
       "19      nl                  NL            eu   \n",
       "20      no                  NO        non-eu   \n",
       "21      pl                  PL            eu   \n",
       "22      pt                  PT            eu   \n",
       "23      ro                  RO            eu   \n",
       "24      sk                  SK            eu   \n",
       "25      si                  SI            eu   \n",
       "26      es                  ES            eu   \n",
       "27      se                  SE            eu   \n",
       "28      tr                  TR        non-eu   \n",
       "29      uk                  GB            eu   \n",
       "\n",
       "                                      name_en  \\\n",
       "0                                     Austria   \n",
       "1                                     Belgium   \n",
       "2                                    Bulgaria   \n",
       "3                                     Croatia   \n",
       "4                                      Cyprus   \n",
       "5                              Czech Republic   \n",
       "6                                     Denmark   \n",
       "7                                     Estonia   \n",
       "8                                     Finland   \n",
       "9                                      France   \n",
       "10  Germany (including  former GDR from 1991)   \n",
       "11                                     Greece   \n",
       "12                                    Hungary   \n",
       "13                                    Ireland   \n",
       "14                                      Italy   \n",
       "15                                     Latvia   \n",
       "16                                  Lithuania   \n",
       "17                                 Luxembourg   \n",
       "18                                      Malta   \n",
       "19                                Netherlands   \n",
       "20                                     Norway   \n",
       "21                                     Poland   \n",
       "22                                   Portugal   \n",
       "23                                    Romania   \n",
       "24                                   Slovakia   \n",
       "25                                   Slovenia   \n",
       "26                                      Spain   \n",
       "27                                     Sweden   \n",
       "28                                     Turkey   \n",
       "29                             United Kingdom   \n",
       "\n",
       "                                              name_fr  \\\n",
       "0                                            Autriche   \n",
       "1                                            Belgique   \n",
       "2                                            Bulgarie   \n",
       "3                                             Croatie   \n",
       "4                                              Chypre   \n",
       "5                                  République tchèque   \n",
       "6                                            Danemark   \n",
       "7                                             Estonie   \n",
       "8                                            Finlande   \n",
       "9                                              France   \n",
       "10  Allemagne (incluant l'ancienne RDA à partir de...   \n",
       "11                                              Grèce   \n",
       "12                                            Hongrie   \n",
       "13                                            Irlande   \n",
       "14                                             Italie   \n",
       "15                                           Lettonie   \n",
       "16                                           Lituanie   \n",
       "17                                         Luxembourg   \n",
       "18                                              Malte   \n",
       "19                                           Pays-Bas   \n",
       "20                                            Norvège   \n",
       "21                                            Pologne   \n",
       "22                                           Portugal   \n",
       "23                                           Roumanie   \n",
       "24                                          Slovaquie   \n",
       "25                                           Slovénie   \n",
       "26                                            Espagne   \n",
       "27                                              Suède   \n",
       "28                                            Turquie   \n",
       "29                                        Royaume-Uni   \n",
       "\n",
       "                                              name_de   latitude  longitude  \n",
       "0                                          Österreich  47.696554  13.345980  \n",
       "1                                             Belgien  50.501045   4.476674  \n",
       "2                                           Bulgarien  42.725674  25.482322  \n",
       "3                                            Kroatien  44.746643  15.340844  \n",
       "4                                              Zypern  35.129141  33.428682  \n",
       "5                               Tschechische Republik  49.803531  15.474998  \n",
       "6                                            Dänemark  55.939684   9.516689  \n",
       "7                                             Estland  58.592469  25.806950  \n",
       "8                                            Finnland  64.950159  26.067564  \n",
       "9                                          Frankreich  46.710994   1.718561  \n",
       "10  Deutschland (einschließlich der ehemaligen DDR...  51.163825  10.454048  \n",
       "11                                       Griechenland  39.698467  21.577256  \n",
       "12                                             Ungarn  47.161163  19.504265  \n",
       "13                                             Irland  53.415260  -8.239122  \n",
       "14                                            Italien  42.504191  12.573787  \n",
       "15                                           Lettland  56.880117  24.606555  \n",
       "16                                            Litauen  55.173687  23.943168  \n",
       "17                                          Luxemburg  49.815319   6.133352  \n",
       "18                                              Malta  35.902422  14.447461  \n",
       "19                                        Niederlande  52.108118   5.330198  \n",
       "20                                           Norwegen  64.556460  12.665766  \n",
       "21                                              Polen  51.918907  19.134334  \n",
       "22                                           Portugal  39.558069  -7.844941  \n",
       "23                                           Rumänien  45.942611  24.990152  \n",
       "24                                           Slowakei  48.672644  19.700032  \n",
       "25                                          Slowenien  46.149259  14.986617  \n",
       "26                                            Spanien  39.895013  -2.988296  \n",
       "27                                           Schweden  62.198467  14.896307  \n",
       "28                                             Türkei  38.952942  35.439795  \n",
       "29                             Vereinigtes Königreich  54.315447  -2.232612  "
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "countries = pd.read_csv('https://raw.githubusercontent.com/dlab-berkeley/introduction-to-pandas/master/data/countries.csv')\n",
    "countries"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Similar to the above example, we can use `shape` and `describe()` the understand the *countries* DataFrame. In this case `describe()` is not very useful, because only 2 of the columns have numeric values."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(30, 8)"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "countries.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>latitude</th>\n",
       "      <th>longitude</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>count</th>\n",
       "      <td>30.000000</td>\n",
       "      <td>30.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>mean</th>\n",
       "      <td>49.092609</td>\n",
       "      <td>14.324579</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>std</th>\n",
       "      <td>7.956624</td>\n",
       "      <td>11.257010</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>min</th>\n",
       "      <td>35.129141</td>\n",
       "      <td>-8.239122</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25%</th>\n",
       "      <td>43.230916</td>\n",
       "      <td>6.979186</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>50%</th>\n",
       "      <td>49.238087</td>\n",
       "      <td>14.941462</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>75%</th>\n",
       "      <td>54.090400</td>\n",
       "      <td>23.351690</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>max</th>\n",
       "      <td>64.950159</td>\n",
       "      <td>35.439795</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        latitude  longitude\n",
       "count  30.000000  30.000000\n",
       "mean   49.092609  14.324579\n",
       "std     7.956624  11.257010\n",
       "min    35.129141  -8.239122\n",
       "25%    43.230916   6.979186\n",
       "50%    49.238087  14.941462\n",
       "75%    54.090400  23.351690\n",
       "max    64.950159  35.439795"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# explore the countries data\n",
    "countries.describe()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The method `info()` provides helpful information for this DataFrame. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'pandas.core.frame.DataFrame'>\n",
      "RangeIndex: 30 entries, 0 to 29\n",
      "Data columns (total 8 columns):\n",
      " #   Column               Non-Null Count  Dtype  \n",
      "---  ------               --------------  -----  \n",
      " 0   country              30 non-null     object \n",
      " 1   google_country_code  30 non-null     object \n",
      " 2   country_group        30 non-null     object \n",
      " 3   name_en              30 non-null     object \n",
      " 4   name_fr              30 non-null     object \n",
      " 5   name_de              30 non-null     object \n",
      " 6   latitude             30 non-null     float64\n",
      " 7   longitude            30 non-null     float64\n",
      "dtypes: float64(2), object(6)\n",
      "memory usage: 2.0+ KB\n"
     ]
    }
   ],
   "source": [
    "# explore the countries data\n",
    "countries.info()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Import Data from Excel File"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In a similar way, we can import data from an Excel file using the function `read_excel()`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Note: you may need to restart the kernel to use updated packages.\n"
     ]
    }
   ],
   "source": [
    "pip install xlrd -q"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>pclass</th>\n",
       "      <th>survived</th>\n",
       "      <th>name</th>\n",
       "      <th>sex</th>\n",
       "      <th>age</th>\n",
       "      <th>sibsp</th>\n",
       "      <th>parch</th>\n",
       "      <th>ticket</th>\n",
       "      <th>fare</th>\n",
       "      <th>cabin</th>\n",
       "      <th>embarked</th>\n",
       "      <th>boat</th>\n",
       "      <th>body</th>\n",
       "      <th>home.dest</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Allen, Miss. Elisabeth Walton</td>\n",
       "      <td>female</td>\n",
       "      <td>29</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>24160</td>\n",
       "      <td>211.3375</td>\n",
       "      <td>B5</td>\n",
       "      <td>S</td>\n",
       "      <td>2</td>\n",
       "      <td>None</td>\n",
       "      <td>St Louis, MO</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Allison, Master. Hudson Trevor</td>\n",
       "      <td>male</td>\n",
       "      <td>0.9167</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>113781</td>\n",
       "      <td>151.5500</td>\n",
       "      <td>C22 C26</td>\n",
       "      <td>S</td>\n",
       "      <td>11</td>\n",
       "      <td>None</td>\n",
       "      <td>Montreal, PQ / Chesterville, ON</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>Allison, Miss. Helen Loraine</td>\n",
       "      <td>female</td>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>113781</td>\n",
       "      <td>151.5500</td>\n",
       "      <td>C22 C26</td>\n",
       "      <td>S</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>Montreal, PQ / Chesterville, ON</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>Allison, Mr. Hudson Joshua Creighton</td>\n",
       "      <td>male</td>\n",
       "      <td>30</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>113781</td>\n",
       "      <td>151.5500</td>\n",
       "      <td>C22 C26</td>\n",
       "      <td>S</td>\n",
       "      <td>None</td>\n",
       "      <td>135</td>\n",
       "      <td>Montreal, PQ / Chesterville, ON</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>Allison, Mrs. Hudson J C (Bessie Waldo Daniels)</td>\n",
       "      <td>female</td>\n",
       "      <td>25</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>113781</td>\n",
       "      <td>151.5500</td>\n",
       "      <td>C22 C26</td>\n",
       "      <td>S</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>Montreal, PQ / Chesterville, ON</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Anderson, Mr. Harry</td>\n",
       "      <td>male</td>\n",
       "      <td>48</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>19952</td>\n",
       "      <td>26.5500</td>\n",
       "      <td>E12</td>\n",
       "      <td>S</td>\n",
       "      <td>3</td>\n",
       "      <td>None</td>\n",
       "      <td>New York, NY</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Andrews, Miss. Kornelia Theodosia</td>\n",
       "      <td>female</td>\n",
       "      <td>63</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>13502</td>\n",
       "      <td>77.9583</td>\n",
       "      <td>D7</td>\n",
       "      <td>S</td>\n",
       "      <td>10</td>\n",
       "      <td>None</td>\n",
       "      <td>Hudson, NY</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>Andrews, Mr. Thomas Jr</td>\n",
       "      <td>male</td>\n",
       "      <td>39</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>112050</td>\n",
       "      <td>0.0000</td>\n",
       "      <td>A36</td>\n",
       "      <td>S</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>Belfast, NI</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Appleton, Mrs. Edward Dale (Charlotte Lamson)</td>\n",
       "      <td>female</td>\n",
       "      <td>53</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>11769</td>\n",
       "      <td>51.4792</td>\n",
       "      <td>C101</td>\n",
       "      <td>S</td>\n",
       "      <td>D</td>\n",
       "      <td>None</td>\n",
       "      <td>Bayside, Queens, NY</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>Artagaveytia, Mr. Ramon</td>\n",
       "      <td>male</td>\n",
       "      <td>71</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>PC 17609</td>\n",
       "      <td>49.5042</td>\n",
       "      <td>None</td>\n",
       "      <td>C</td>\n",
       "      <td>None</td>\n",
       "      <td>22</td>\n",
       "      <td>Montevideo, Uruguay</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>Astor, Col. John Jacob</td>\n",
       "      <td>male</td>\n",
       "      <td>47</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>PC 17757</td>\n",
       "      <td>227.5250</td>\n",
       "      <td>C62 C64</td>\n",
       "      <td>C</td>\n",
       "      <td>None</td>\n",
       "      <td>124</td>\n",
       "      <td>New York, NY</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Astor, Mrs. John Jacob (Madeleine Talmadge Force)</td>\n",
       "      <td>female</td>\n",
       "      <td>18</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>PC 17757</td>\n",
       "      <td>227.5250</td>\n",
       "      <td>C62 C64</td>\n",
       "      <td>C</td>\n",
       "      <td>4</td>\n",
       "      <td>None</td>\n",
       "      <td>New York, NY</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Aubart, Mme. Leontine Pauline</td>\n",
       "      <td>female</td>\n",
       "      <td>24</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>PC 17477</td>\n",
       "      <td>69.3000</td>\n",
       "      <td>B35</td>\n",
       "      <td>C</td>\n",
       "      <td>9</td>\n",
       "      <td>None</td>\n",
       "      <td>Paris, France</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Barber, Miss. Ellen \"Nellie\"</td>\n",
       "      <td>female</td>\n",
       "      <td>26</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>19877</td>\n",
       "      <td>78.8500</td>\n",
       "      <td>None</td>\n",
       "      <td>S</td>\n",
       "      <td>6</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Barkworth, Mr. Algernon Henry Wilson</td>\n",
       "      <td>male</td>\n",
       "      <td>80</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>27042</td>\n",
       "      <td>30.0000</td>\n",
       "      <td>A23</td>\n",
       "      <td>S</td>\n",
       "      <td>B</td>\n",
       "      <td>None</td>\n",
       "      <td>Hessle, Yorks</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>Baumann, Mr. John D</td>\n",
       "      <td>male</td>\n",
       "      <td>None</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>PC 17318</td>\n",
       "      <td>25.9250</td>\n",
       "      <td>None</td>\n",
       "      <td>S</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>New York, NY</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>Baxter, Mr. Quigg Edmond</td>\n",
       "      <td>male</td>\n",
       "      <td>24</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>PC 17558</td>\n",
       "      <td>247.5208</td>\n",
       "      <td>B58 B60</td>\n",
       "      <td>C</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>Montreal, PQ</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Baxter, Mrs. James (Helene DeLaudeniere Chaput)</td>\n",
       "      <td>female</td>\n",
       "      <td>50</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>PC 17558</td>\n",
       "      <td>247.5208</td>\n",
       "      <td>B58 B60</td>\n",
       "      <td>C</td>\n",
       "      <td>6</td>\n",
       "      <td>None</td>\n",
       "      <td>Montreal, PQ</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Bazzani, Miss. Albina</td>\n",
       "      <td>female</td>\n",
       "      <td>32</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>11813</td>\n",
       "      <td>76.2917</td>\n",
       "      <td>D15</td>\n",
       "      <td>C</td>\n",
       "      <td>8</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>Beattie, Mr. Thomson</td>\n",
       "      <td>male</td>\n",
       "      <td>36</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>13050</td>\n",
       "      <td>75.2417</td>\n",
       "      <td>C6</td>\n",
       "      <td>C</td>\n",
       "      <td>A</td>\n",
       "      <td>None</td>\n",
       "      <td>Winnipeg, MN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    pclass  survived                                               name  \\\n",
       "0        1         1                      Allen, Miss. Elisabeth Walton   \n",
       "1        1         1                     Allison, Master. Hudson Trevor   \n",
       "2        1         0                       Allison, Miss. Helen Loraine   \n",
       "3        1         0               Allison, Mr. Hudson Joshua Creighton   \n",
       "4        1         0    Allison, Mrs. Hudson J C (Bessie Waldo Daniels)   \n",
       "5        1         1                                Anderson, Mr. Harry   \n",
       "6        1         1                  Andrews, Miss. Kornelia Theodosia   \n",
       "7        1         0                             Andrews, Mr. Thomas Jr   \n",
       "8        1         1      Appleton, Mrs. Edward Dale (Charlotte Lamson)   \n",
       "9        1         0                            Artagaveytia, Mr. Ramon   \n",
       "10       1         0                             Astor, Col. John Jacob   \n",
       "11       1         1  Astor, Mrs. John Jacob (Madeleine Talmadge Force)   \n",
       "12       1         1                      Aubart, Mme. Leontine Pauline   \n",
       "13       1         1                       Barber, Miss. Ellen \"Nellie\"   \n",
       "14       1         1               Barkworth, Mr. Algernon Henry Wilson   \n",
       "15       1         0                                Baumann, Mr. John D   \n",
       "16       1         0                           Baxter, Mr. Quigg Edmond   \n",
       "17       1         1    Baxter, Mrs. James (Helene DeLaudeniere Chaput)   \n",
       "18       1         1                              Bazzani, Miss. Albina   \n",
       "19       1         0                               Beattie, Mr. Thomson   \n",
       "\n",
       "       sex     age  sibsp  parch    ticket      fare    cabin embarked  boat  \\\n",
       "0   female      29      0      0     24160  211.3375       B5        S     2   \n",
       "1     male  0.9167      1      2    113781  151.5500  C22 C26        S    11   \n",
       "2   female       2      1      2    113781  151.5500  C22 C26        S  None   \n",
       "3     male      30      1      2    113781  151.5500  C22 C26        S  None   \n",
       "4   female      25      1      2    113781  151.5500  C22 C26        S  None   \n",
       "5     male      48      0      0     19952   26.5500      E12        S     3   \n",
       "6   female      63      1      0     13502   77.9583       D7        S    10   \n",
       "7     male      39      0      0    112050    0.0000      A36        S  None   \n",
       "8   female      53      2      0     11769   51.4792     C101        S     D   \n",
       "9     male      71      0      0  PC 17609   49.5042     None        C  None   \n",
       "10    male      47      1      0  PC 17757  227.5250  C62 C64        C  None   \n",
       "11  female      18      1      0  PC 17757  227.5250  C62 C64        C     4   \n",
       "12  female      24      0      0  PC 17477   69.3000      B35        C     9   \n",
       "13  female      26      0      0     19877   78.8500     None        S     6   \n",
       "14    male      80      0      0     27042   30.0000      A23        S     B   \n",
       "15    male    None      0      0  PC 17318   25.9250     None        S  None   \n",
       "16    male      24      0      1  PC 17558  247.5208  B58 B60        C  None   \n",
       "17  female      50      0      1  PC 17558  247.5208  B58 B60        C     6   \n",
       "18  female      32      0      0     11813   76.2917      D15        C     8   \n",
       "19    male      36      0      0     13050   75.2417       C6        C     A   \n",
       "\n",
       "    body                        home.dest  \n",
       "0   None                     St Louis, MO  \n",
       "1   None  Montreal, PQ / Chesterville, ON  \n",
       "2   None  Montreal, PQ / Chesterville, ON  \n",
       "3    135  Montreal, PQ / Chesterville, ON  \n",
       "4   None  Montreal, PQ / Chesterville, ON  \n",
       "5   None                     New York, NY  \n",
       "6   None                       Hudson, NY  \n",
       "7   None                      Belfast, NI  \n",
       "8   None              Bayside, Queens, NY  \n",
       "9     22              Montevideo, Uruguay  \n",
       "10   124                     New York, NY  \n",
       "11  None                     New York, NY  \n",
       "12  None                    Paris, France  \n",
       "13  None                             None  \n",
       "14  None                    Hessle, Yorks  \n",
       "15  None                     New York, NY  \n",
       "16  None                     Montreal, PQ  \n",
       "17  None                     Montreal, PQ  \n",
       "18  None                             None  \n",
       "19  None                     Winnipeg, MN  "
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "titanic = pd.read_excel('data/titanic.xls')\n",
    "titanic"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Note that all DataFrames that we loaded so far in this Jupyter notebook, i.e., `unemployment`, `countries`, and `titanic`, are stored in the memory, and we can access them when needed once they are loaded. For example, we can check the shape of the `titanic` DataFrame."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(20, 14)"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "titanic.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Create a DataFrame"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Alternatively, we can manually create DataFrames, instead of importing from a file. The following DataFrame called `simple_table` contains information from the `titanic` data. You can notice that the DataFrame is created similarly to creating a dictionary, where the column headers represent keys, and the data in each column are lists of values."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Name</th>\n",
       "      <th>Age</th>\n",
       "      <th>Sex</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Braund, Mr. Owen Harris</td>\n",
       "      <td>22</td>\n",
       "      <td>male</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Allen, Mr. William Henry</td>\n",
       "      <td>35</td>\n",
       "      <td>male</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Bonnell, Miss. Elizabeth</td>\n",
       "      <td>58</td>\n",
       "      <td>female</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                       Name  Age     Sex\n",
       "0   Braund, Mr. Owen Harris   22    male\n",
       "1  Allen, Mr. William Henry   35    male\n",
       "2  Bonnell, Miss. Elizabeth   58  female"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "simple_table = pd.DataFrame({\n",
    "        \"Name\": [\"Braund, Mr. Owen Harris\",\n",
    "                 \"Allen, Mr. William Henry\",\n",
    "                 \"Bonnell, Miss. Elizabeth\"],\n",
    "        \"Age\": [22, 35, 58],\n",
    "        \"Sex\": [\"male\", \"male\", \"female\"]})\n",
    "simple_table"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(3, 3)"
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "simple_table.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can use again the `tolist()` method to convert the values in a `pandas` DataFrame into a list, if we needed the data into a list format."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[['Braund, Mr. Owen Harris', 22, 'male'],\n",
       " ['Allen, Mr. William Henry', 35, 'male'],\n",
       " ['Bonnell, Miss. Elizabeth', 58, 'female']]"
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "simple_table.values.tolist()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 7.3 Rename, Index, and Slice <a id=\"7.3-rename,-index,-and-slice\"/>\n",
    "\n",
    "Let's look again at the *unemployment* DataFrame. You may have noticed that the `month` column actually includes the year and the month added as decimals (e.g., 1993.01 should be year 1993 and month 01)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>country</th>\n",
       "      <th>seasonality</th>\n",
       "      <th>month</th>\n",
       "      <th>unemployment</th>\n",
       "      <th>unemployment_rate</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>at</td>\n",
       "      <td>nsa</td>\n",
       "      <td>1993.01</td>\n",
       "      <td>171000</td>\n",
       "      <td>4.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>at</td>\n",
       "      <td>nsa</td>\n",
       "      <td>1993.02</td>\n",
       "      <td>175000</td>\n",
       "      <td>4.6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>at</td>\n",
       "      <td>nsa</td>\n",
       "      <td>1993.03</td>\n",
       "      <td>166000</td>\n",
       "      <td>4.4</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  country seasonality    month  unemployment  unemployment_rate\n",
       "0      at         nsa  1993.01        171000                4.5\n",
       "1      at         nsa  1993.02        175000                4.6\n",
       "2      at         nsa  1993.03        166000                4.4"
      ]
     },
     "execution_count": 25,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "unemployment.head(3)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's rename the column into *year_month*. The `.rename()` method allows modifying column and/or row names. As you can see in the cell below, we passed a dictionary to the `columns` parameter, with the original name `month` as the key and the new name `year_month` as the value. With this approach, we can rename several columns at the same time by providing the old and new names as keys and values in the dictionary. Importantly, we also set the `inplace` parameter to `True`, which indicates that we want to modify the *original* DataFrame, and not to create a new DataFrame."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>country</th>\n",
       "      <th>seasonality</th>\n",
       "      <th>year_month</th>\n",
       "      <th>unemployment</th>\n",
       "      <th>unemployment_rate</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>at</td>\n",
       "      <td>nsa</td>\n",
       "      <td>1993.01</td>\n",
       "      <td>171000</td>\n",
       "      <td>4.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>at</td>\n",
       "      <td>nsa</td>\n",
       "      <td>1993.02</td>\n",
       "      <td>175000</td>\n",
       "      <td>4.6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>at</td>\n",
       "      <td>nsa</td>\n",
       "      <td>1993.03</td>\n",
       "      <td>166000</td>\n",
       "      <td>4.4</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  country seasonality  year_month  unemployment  unemployment_rate\n",
       "0      at         nsa     1993.01        171000                4.5\n",
       "1      at         nsa     1993.02        175000                4.6\n",
       "2      at         nsa     1993.03        166000                4.4"
      ]
     },
     "execution_count": 26,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "unemployment.rename(columns={'month' : 'year_month'}, inplace=True)\n",
    "unemployment.head(3)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To observe the effect of `inplace=True`, let's run in the next cell another `.rename()` method to change the column `country` to `year`, but this time by omitting `inplace=True`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>year</th>\n",
       "      <th>seasonality</th>\n",
       "      <th>year_month</th>\n",
       "      <th>unemployment</th>\n",
       "      <th>unemployment_rate</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>at</td>\n",
       "      <td>nsa</td>\n",
       "      <td>1993.01</td>\n",
       "      <td>171000</td>\n",
       "      <td>4.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>at</td>\n",
       "      <td>nsa</td>\n",
       "      <td>1993.02</td>\n",
       "      <td>175000</td>\n",
       "      <td>4.6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>at</td>\n",
       "      <td>nsa</td>\n",
       "      <td>1993.03</td>\n",
       "      <td>166000</td>\n",
       "      <td>4.4</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  year seasonality  year_month  unemployment  unemployment_rate\n",
       "0   at         nsa     1993.01        171000                4.5\n",
       "1   at         nsa     1993.02        175000                4.6\n",
       "2   at         nsa     1993.03        166000                4.4"
      ]
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "unemployment.rename(columns={'country' : 'year'}).head(3)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The above code didn't change the original `unemployment` DataFrame, as we can check that in the following cell. Instead, it created a copy of the `unemployment` DataFrame in which it changed the name of the column `country`. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>country</th>\n",
       "      <th>seasonality</th>\n",
       "      <th>year_month</th>\n",
       "      <th>unemployment</th>\n",
       "      <th>unemployment_rate</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>at</td>\n",
       "      <td>nsa</td>\n",
       "      <td>1993.01</td>\n",
       "      <td>171000</td>\n",
       "      <td>4.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>at</td>\n",
       "      <td>nsa</td>\n",
       "      <td>1993.02</td>\n",
       "      <td>175000</td>\n",
       "      <td>4.6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>at</td>\n",
       "      <td>nsa</td>\n",
       "      <td>1993.03</td>\n",
       "      <td>166000</td>\n",
       "      <td>4.4</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  country seasonality  year_month  unemployment  unemployment_rate\n",
       "0      at         nsa     1993.01        171000                4.5\n",
       "1      at         nsa     1993.02        175000                4.6\n",
       "2      at         nsa     1993.03        166000                4.4"
      ]
     },
     "execution_count": 28,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "unemployment.head(3)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "An alternative way to rename the column is shown in the following cell. This code does not use `inplace=True` to modify the original DataFrame, but instead it creates a new DataFrame object with a renamed column and assigns it to the name `unemployment`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>country</th>\n",
       "      <th>seasonality</th>\n",
       "      <th>year_month</th>\n",
       "      <th>temporary_name</th>\n",
       "      <th>unemployment_rate</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>at</td>\n",
       "      <td>nsa</td>\n",
       "      <td>1993.01</td>\n",
       "      <td>171000</td>\n",
       "      <td>4.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>at</td>\n",
       "      <td>nsa</td>\n",
       "      <td>1993.02</td>\n",
       "      <td>175000</td>\n",
       "      <td>4.6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>at</td>\n",
       "      <td>nsa</td>\n",
       "      <td>1993.03</td>\n",
       "      <td>166000</td>\n",
       "      <td>4.4</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  country seasonality  year_month  temporary_name  unemployment_rate\n",
       "0      at         nsa     1993.01          171000                4.5\n",
       "1      at         nsa     1993.02          175000                4.6\n",
       "2      at         nsa     1993.03          166000                4.4"
      ]
     },
     "execution_count": 29,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "unemployment = unemployment.rename(columns={'unemployment' : 'temporary_name'})\n",
    "unemployment.head(3)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's change it back to the original column name."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>country</th>\n",
       "      <th>seasonality</th>\n",
       "      <th>year_month</th>\n",
       "      <th>unemployment</th>\n",
       "      <th>unemployment_rate</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>at</td>\n",
       "      <td>nsa</td>\n",
       "      <td>1993.01</td>\n",
       "      <td>171000</td>\n",
       "      <td>4.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>at</td>\n",
       "      <td>nsa</td>\n",
       "      <td>1993.02</td>\n",
       "      <td>175000</td>\n",
       "      <td>4.6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>at</td>\n",
       "      <td>nsa</td>\n",
       "      <td>1993.03</td>\n",
       "      <td>166000</td>\n",
       "      <td>4.4</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  country seasonality  year_month  unemployment  unemployment_rate\n",
       "0      at         nsa     1993.01        171000                4.5\n",
       "1      at         nsa     1993.02        175000                4.6\n",
       "2      at         nsa     1993.03        166000                4.4"
      ]
     },
     "execution_count": 30,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "unemployment = unemployment.rename(columns={'temporary_name': 'unemployment'})\n",
    "unemployment.head(3)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Selecting Columns"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To select a single column of the DataFrame, we can either use the name of the column enclosed in square brackets `[]` or the dot notation `.` (i.e., via *attribute access*). It is preferable to use the square brackets notation, since a column name might inadvertently have the same name as a built-in `pandas` method."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    1993.01\n",
       "1    1993.02\n",
       "2    1993.03\n",
       "3    1993.04\n",
       "4    1993.05\n",
       "Name: year_month, dtype: float64"
      ]
     },
     "execution_count": 31,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# access with square brackets\n",
    "unemployment['year_month'].head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    1993.01\n",
       "1    1993.02\n",
       "2    1993.03\n",
       "3    1993.04\n",
       "4    1993.05\n",
       "Name: year_month, dtype: float64"
      ]
     },
     "execution_count": 32,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# access with dot notation\n",
    "unemployment.year_month.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "When selecting a single column, we obtain a `pandas` **Series** object, which is a single vector of data with an associated array of index row labels shown in the left-most column.\n",
    "\n",
    "> A **Series** object in `pandas` represents a 1-dimensional vector of data (i.e., a column of data)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If we check the type of the `unemployment` object and `unemployment['year_month']` object, we can see that the first one is DataFrame and the second one is Series."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "pandas.core.frame.DataFrame"
      ]
     },
     "execution_count": 33,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "type(unemployment)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "pandas.core.series.Series"
      ]
     },
     "execution_count": 34,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "type(unemployment['year_month'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`pandas` provide many methods that can be applied to `Series` objects. A few examples are shown below."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "minimum is  1983.01\n",
      "maximum is  2010.12\n",
      "mean value is  1999.4012896711013\n"
     ]
    }
   ],
   "source": [
    "print('minimum is ', unemployment['year_month'].min())\n",
    "print('maximum is ', unemployment['year_month'].max())\n",
    "print('mean value is ', unemployment['year_month'].mean())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To select multiple columns in `pandas`, use a list of column names within the selection brackets `[]`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>country</th>\n",
       "      <th>year_month</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>at</td>\n",
       "      <td>1993.01</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>at</td>\n",
       "      <td>1993.02</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>at</td>\n",
       "      <td>1993.03</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>at</td>\n",
       "      <td>1993.04</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>at</td>\n",
       "      <td>1993.05</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  country  year_month\n",
       "0      at     1993.01\n",
       "1      at     1993.02\n",
       "2      at     1993.03\n",
       "3      at     1993.04\n",
       "4      at     1993.05"
      ]
     },
     "execution_count": 36,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "unemployment[['country','year_month']].head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Selecting Rows"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "One way to select rows is by using the `[]` operator, similar to indexing and slicing in Python lists and other sequence data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>country</th>\n",
       "      <th>seasonality</th>\n",
       "      <th>year_month</th>\n",
       "      <th>unemployment</th>\n",
       "      <th>unemployment_rate</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>at</td>\n",
       "      <td>nsa</td>\n",
       "      <td>1993.01</td>\n",
       "      <td>171000</td>\n",
       "      <td>4.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>at</td>\n",
       "      <td>nsa</td>\n",
       "      <td>1993.02</td>\n",
       "      <td>175000</td>\n",
       "      <td>4.6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>at</td>\n",
       "      <td>nsa</td>\n",
       "      <td>1993.03</td>\n",
       "      <td>166000</td>\n",
       "      <td>4.4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>at</td>\n",
       "      <td>nsa</td>\n",
       "      <td>1993.04</td>\n",
       "      <td>157000</td>\n",
       "      <td>4.1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  country seasonality  year_month  unemployment  unemployment_rate\n",
       "0      at         nsa     1993.01        171000                4.5\n",
       "1      at         nsa     1993.02        175000                4.6\n",
       "2      at         nsa     1993.03        166000                4.4\n",
       "3      at         nsa     1993.04        157000                4.1"
      ]
     },
     "execution_count": 37,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "unemployment[0:4]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Another graphical representation of a DataFrame is shown in the figure below.\n",
    "\n",
    "<img src=\"images/dataframe.png\" width=\"520\">\n",
    "<em>Figure source: Reference [2].</em>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The first column with the indices in `pandas` DataFrames does not need to be a sequence of integers, but it can also contain strings or other numeric data (e.g., dates, years).\n",
    "\n",
    "For instance, let's create a `DataFrame` called *bacteria* to see how indexing with string indices works. We again pass in a dictionary, with the keys corresponding to column names and the values to the data, and in addition we pass a list of strings called `index`. (Compare to the *simple_table* above, which does not use `index` for creating the DataFrame, and in that case, the indices were automatically set to integer numbers.)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>bacteria_counts</th>\n",
       "      <th>other_feature</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Firmicutes</th>\n",
       "      <td>632</td>\n",
       "      <td>438</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Proteobacteria</th>\n",
       "      <td>1638</td>\n",
       "      <td>833</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Actinobacteria</th>\n",
       "      <td>569</td>\n",
       "      <td>234</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Bacteroidetes</th>\n",
       "      <td>115</td>\n",
       "      <td>298</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                bacteria_counts  other_feature\n",
       "Firmicutes                  632            438\n",
       "Proteobacteria             1638            833\n",
       "Actinobacteria              569            234\n",
       "Bacteroidetes               115            298"
      ]
     },
     "execution_count": 38,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "bacteria = pd.DataFrame({'bacteria_counts' : [632, 1638, 569, 115],\n",
    "                         'other_feature' : [438, 833, 234, 298]},\n",
    "                         index=['Firmicutes', 'Proteobacteria', 'Actinobacteria', 'Bacteroidetes'])\n",
    "bacteria"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To return the labels for the columns and indices of a DataFrame in `pandas`, we can use the methods shown in the following celss. Note again that the type of the returned objects is `Index` object."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Index(['bacteria_counts', 'other_feature'], dtype='object')"
      ]
     },
     "execution_count": 39,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "bacteria.columns"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Index(['Firmicutes', 'Proteobacteria', 'Actinobacteria', 'Bacteroidetes'], dtype='object')"
      ]
     },
     "execution_count": 40,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "bacteria.index"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For selecting rows and/or columns in `pandas`, beside the use of square brackets, two other operators are used: `.loc` and `.iloc`. \n",
    "\n",
    "The operator `.loc` accesses rows by using string indices (i.e., string locations), that is, it uses the labels of the rows for indexing.\n",
    "\n",
    "The operator `.iloc`  accesses rows by using integer indices (i.e., integer locations), that is, it uses the integer positions of the rows for indexing. \n",
    "\n",
    "The operators `.loc` and `.iloc` can accept either a single index (e.g., `'f'` or `5`), a list of indices (e.g., `['a','f']` or `[2,5]`), or a slice of indices (e.g., `'a:f'` or `2:5`)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For instance, if we are interested in accessing the row *Actinobacteria*, we can use `.loc` and the index name. This returns the column values for the specified row."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "bacteria_counts    569\n",
       "other_feature      234\n",
       "Name: Actinobacteria, dtype: int64"
      ]
     },
     "execution_count": 41,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "bacteria.loc['Actinobacteria']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The operator `.iloc` does not work with string indices, and it returns an error in the following cell. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "metadata": {},
   "outputs": [
    {
     "ename": "TypeError",
     "evalue": "Cannot index by location index with a non-integer key",
     "output_type": "error",
     "traceback": [
      "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[1;31mTypeError\u001b[0m                                 Traceback (most recent call last)",
      "Cell \u001b[1;32mIn[42], line 1\u001b[0m\n\u001b[1;32m----> 1\u001b[0m \u001b[43mbacteria\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43miloc\u001b[49m\u001b[43m[\u001b[49m\u001b[38;5;124;43m'\u001b[39;49m\u001b[38;5;124;43mActinobacteria\u001b[39;49m\u001b[38;5;124;43m'\u001b[39;49m\u001b[43m]\u001b[49m\n",
      "File \u001b[1;32m~\\anaconda3\\lib\\site-packages\\pandas\\core\\indexing.py:967\u001b[0m, in \u001b[0;36m_LocationIndexer.__getitem__\u001b[1;34m(self, key)\u001b[0m\n\u001b[0;32m    964\u001b[0m axis \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39maxis \u001b[38;5;129;01mor\u001b[39;00m \u001b[38;5;241m0\u001b[39m\n\u001b[0;32m    966\u001b[0m maybe_callable \u001b[38;5;241m=\u001b[39m com\u001b[38;5;241m.\u001b[39mapply_if_callable(key, \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mobj)\n\u001b[1;32m--> 967\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_getitem_axis\u001b[49m\u001b[43m(\u001b[49m\u001b[43mmaybe_callable\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43maxis\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43maxis\u001b[49m\u001b[43m)\u001b[49m\n",
      "File \u001b[1;32m~\\anaconda3\\lib\\site-packages\\pandas\\core\\indexing.py:1520\u001b[0m, in \u001b[0;36m_iLocIndexer._getitem_axis\u001b[1;34m(self, key, axis)\u001b[0m\n\u001b[0;32m   1518\u001b[0m key \u001b[38;5;241m=\u001b[39m item_from_zerodim(key)\n\u001b[0;32m   1519\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m is_integer(key):\n\u001b[1;32m-> 1520\u001b[0m     \u001b[38;5;28;01mraise\u001b[39;00m \u001b[38;5;167;01mTypeError\u001b[39;00m(\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mCannot index by location index with a non-integer key\u001b[39m\u001b[38;5;124m\"\u001b[39m)\n\u001b[0;32m   1522\u001b[0m \u001b[38;5;66;03m# validate the location\u001b[39;00m\n\u001b[0;32m   1523\u001b[0m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_validate_integer(key, axis)\n",
      "\u001b[1;31mTypeError\u001b[0m: Cannot index by location index with a non-integer key"
     ]
    }
   ],
   "source": [
    "bacteria.iloc['Actinobacteria']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To access rows with `.iloc`, we need to provide integer indices. Note that we can still access the row with `iloc`, even though the indices are strings."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "bacteria_counts    569\n",
       "other_feature      234\n",
       "Name: Actinobacteria, dtype: int64"
      ]
     },
     "execution_count": 43,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "bacteria.iloc[2]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In addition, we can also use \"positional indexing\" with square brackets `[]`, as in slicing operations with list and other sequence objects."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 44,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>bacteria_counts</th>\n",
       "      <th>other_feature</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Actinobacteria</th>\n",
       "      <td>569</td>\n",
       "      <td>234</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                bacteria_counts  other_feature\n",
       "Actinobacteria              569            234"
      ]
     },
     "execution_count": 44,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "bacteria[2:3]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "However, `pandas` doesn't support indexing for accessing individual rows, and with positional indexing we need to use a slice (as in the above example `bacteria[2:3]`), otherwise, we will get an error."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 45,
   "metadata": {},
   "outputs": [
    {
     "ename": "KeyError",
     "evalue": "2",
     "output_type": "error",
     "traceback": [
      "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[1;31mKeyError\u001b[0m                                  Traceback (most recent call last)",
      "File \u001b[1;32m~\\anaconda3\\lib\\site-packages\\pandas\\core\\indexes\\base.py:3629\u001b[0m, in \u001b[0;36mIndex.get_loc\u001b[1;34m(self, key, method, tolerance)\u001b[0m\n\u001b[0;32m   3628\u001b[0m \u001b[38;5;28;01mtry\u001b[39;00m:\n\u001b[1;32m-> 3629\u001b[0m     \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_engine\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mget_loc\u001b[49m\u001b[43m(\u001b[49m\u001b[43mcasted_key\u001b[49m\u001b[43m)\u001b[49m\n\u001b[0;32m   3630\u001b[0m \u001b[38;5;28;01mexcept\u001b[39;00m \u001b[38;5;167;01mKeyError\u001b[39;00m \u001b[38;5;28;01mas\u001b[39;00m err:\n",
      "File \u001b[1;32m~\\anaconda3\\lib\\site-packages\\pandas\\_libs\\index.pyx:136\u001b[0m, in \u001b[0;36mpandas._libs.index.IndexEngine.get_loc\u001b[1;34m()\u001b[0m\n",
      "File \u001b[1;32m~\\anaconda3\\lib\\site-packages\\pandas\\_libs\\index.pyx:163\u001b[0m, in \u001b[0;36mpandas._libs.index.IndexEngine.get_loc\u001b[1;34m()\u001b[0m\n",
      "File \u001b[1;32mpandas\\_libs\\hashtable_class_helper.pxi:5198\u001b[0m, in \u001b[0;36mpandas._libs.hashtable.PyObjectHashTable.get_item\u001b[1;34m()\u001b[0m\n",
      "File \u001b[1;32mpandas\\_libs\\hashtable_class_helper.pxi:5206\u001b[0m, in \u001b[0;36mpandas._libs.hashtable.PyObjectHashTable.get_item\u001b[1;34m()\u001b[0m\n",
      "\u001b[1;31mKeyError\u001b[0m: 2",
      "\nThe above exception was the direct cause of the following exception:\n",
      "\u001b[1;31mKeyError\u001b[0m                                  Traceback (most recent call last)",
      "Cell \u001b[1;32mIn[45], line 1\u001b[0m\n\u001b[1;32m----> 1\u001b[0m \u001b[43mbacteria\u001b[49m\u001b[43m[\u001b[49m\u001b[38;5;241;43m2\u001b[39;49m\u001b[43m]\u001b[49m\n",
      "File \u001b[1;32m~\\anaconda3\\lib\\site-packages\\pandas\\core\\frame.py:3505\u001b[0m, in \u001b[0;36mDataFrame.__getitem__\u001b[1;34m(self, key)\u001b[0m\n\u001b[0;32m   3503\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mcolumns\u001b[38;5;241m.\u001b[39mnlevels \u001b[38;5;241m>\u001b[39m \u001b[38;5;241m1\u001b[39m:\n\u001b[0;32m   3504\u001b[0m     \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_getitem_multilevel(key)\n\u001b[1;32m-> 3505\u001b[0m indexer \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mcolumns\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mget_loc\u001b[49m\u001b[43m(\u001b[49m\u001b[43mkey\u001b[49m\u001b[43m)\u001b[49m\n\u001b[0;32m   3506\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m is_integer(indexer):\n\u001b[0;32m   3507\u001b[0m     indexer \u001b[38;5;241m=\u001b[39m [indexer]\n",
      "File \u001b[1;32m~\\anaconda3\\lib\\site-packages\\pandas\\core\\indexes\\base.py:3631\u001b[0m, in \u001b[0;36mIndex.get_loc\u001b[1;34m(self, key, method, tolerance)\u001b[0m\n\u001b[0;32m   3629\u001b[0m     \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_engine\u001b[38;5;241m.\u001b[39mget_loc(casted_key)\n\u001b[0;32m   3630\u001b[0m \u001b[38;5;28;01mexcept\u001b[39;00m \u001b[38;5;167;01mKeyError\u001b[39;00m \u001b[38;5;28;01mas\u001b[39;00m err:\n\u001b[1;32m-> 3631\u001b[0m     \u001b[38;5;28;01mraise\u001b[39;00m \u001b[38;5;167;01mKeyError\u001b[39;00m(key) \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01merr\u001b[39;00m\n\u001b[0;32m   3632\u001b[0m \u001b[38;5;28;01mexcept\u001b[39;00m \u001b[38;5;167;01mTypeError\u001b[39;00m:\n\u001b[0;32m   3633\u001b[0m     \u001b[38;5;66;03m# If we have a listlike key, _check_indexing_error will raise\u001b[39;00m\n\u001b[0;32m   3634\u001b[0m     \u001b[38;5;66;03m#  InvalidIndexError. Otherwise we fall through and re-raise\u001b[39;00m\n\u001b[0;32m   3635\u001b[0m     \u001b[38;5;66;03m#  the TypeError.\u001b[39;00m\n\u001b[0;32m   3636\u001b[0m     \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_check_indexing_error(key)\n",
      "\u001b[1;31mKeyError\u001b[0m: 2"
     ]
    }
   ],
   "source": [
    "bacteria[2]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "There is another important difference between the above two selections, as `.loc` and `.iloc` return a `Series` object because we selected a single label, while `[2:3]` returns a `DataFrame` because we selected a range of positions. Let's check this."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 46,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "pandas.core.series.Series"
      ]
     },
     "execution_count": 46,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "type(bacteria.loc['Actinobacteria'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 47,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "pandas.core.series.Series"
      ]
     },
     "execution_count": 47,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "type(bacteria.iloc[2])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 48,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "pandas.core.frame.DataFrame"
      ]
     },
     "execution_count": 48,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "type(bacteria[2:3])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's return to the *unemployment* data to show how `.iloc` is used, since *unemployment* has integer indices. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 49,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>country</th>\n",
       "      <th>seasonality</th>\n",
       "      <th>year_month</th>\n",
       "      <th>unemployment</th>\n",
       "      <th>unemployment_rate</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>at</td>\n",
       "      <td>nsa</td>\n",
       "      <td>1993.01</td>\n",
       "      <td>171000</td>\n",
       "      <td>4.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>at</td>\n",
       "      <td>nsa</td>\n",
       "      <td>1993.02</td>\n",
       "      <td>175000</td>\n",
       "      <td>4.6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>at</td>\n",
       "      <td>nsa</td>\n",
       "      <td>1993.03</td>\n",
       "      <td>166000</td>\n",
       "      <td>4.4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>at</td>\n",
       "      <td>nsa</td>\n",
       "      <td>1993.04</td>\n",
       "      <td>157000</td>\n",
       "      <td>4.1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  country seasonality  year_month  unemployment  unemployment_rate\n",
       "0      at         nsa     1993.01        171000                4.5\n",
       "1      at         nsa     1993.02        175000                4.6\n",
       "2      at         nsa     1993.03        166000                4.4\n",
       "3      at         nsa     1993.04        157000                4.1"
      ]
     },
     "execution_count": 49,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "unemployment.iloc[0:4]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Note also that we can use `loc` with integer indices as well, however the output is different than `.iloc`. The difference is discussed in the section below. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 50,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>country</th>\n",
       "      <th>seasonality</th>\n",
       "      <th>year_month</th>\n",
       "      <th>unemployment</th>\n",
       "      <th>unemployment_rate</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>at</td>\n",
       "      <td>nsa</td>\n",
       "      <td>1993.01</td>\n",
       "      <td>171000</td>\n",
       "      <td>4.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>at</td>\n",
       "      <td>nsa</td>\n",
       "      <td>1993.02</td>\n",
       "      <td>175000</td>\n",
       "      <td>4.6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>at</td>\n",
       "      <td>nsa</td>\n",
       "      <td>1993.03</td>\n",
       "      <td>166000</td>\n",
       "      <td>4.4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>at</td>\n",
       "      <td>nsa</td>\n",
       "      <td>1993.04</td>\n",
       "      <td>157000</td>\n",
       "      <td>4.1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>at</td>\n",
       "      <td>nsa</td>\n",
       "      <td>1993.05</td>\n",
       "      <td>147000</td>\n",
       "      <td>3.9</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  country seasonality  year_month  unemployment  unemployment_rate\n",
       "0      at         nsa     1993.01        171000                4.5\n",
       "1      at         nsa     1993.02        175000                4.6\n",
       "2      at         nsa     1993.03        166000                4.4\n",
       "3      at         nsa     1993.04        157000                4.1\n",
       "4      at         nsa     1993.05        147000                3.9"
      ]
     },
     "execution_count": 50,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "unemployment.loc[0:4]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Selecting a Specific Value"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Both `.loc` and `.iloc` can be used to select a particular value if they are given two arguments. The first argument is the row name (when using `.loc`) or the row index number (when using `.iloc`), while the second argument is the column name or index number.\n",
    "\n",
    "Using `.loc`, we can select \"Bacteroidetes\" and \"bacteria_counts\" to get the count of Bacteroidetes, as in the next cell below."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 51,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>bacteria_counts</th>\n",
       "      <th>other_feature</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Firmicutes</th>\n",
       "      <td>632</td>\n",
       "      <td>438</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Proteobacteria</th>\n",
       "      <td>1638</td>\n",
       "      <td>833</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Actinobacteria</th>\n",
       "      <td>569</td>\n",
       "      <td>234</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Bacteroidetes</th>\n",
       "      <td>115</td>\n",
       "      <td>298</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                bacteria_counts  other_feature\n",
       "Firmicutes                  632            438\n",
       "Proteobacteria             1638            833\n",
       "Actinobacteria              569            234\n",
       "Bacteroidetes               115            298"
      ]
     },
     "execution_count": 51,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "bacteria"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 52,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "115"
      ]
     },
     "execution_count": 52,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "bacteria.loc['Bacteroidetes']['bacteria_counts']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 53,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "115"
      ]
     },
     "execution_count": 53,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# This is the same as above\n",
    "bacteria.iloc[3][0]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 54,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "115"
      ]
     },
     "execution_count": 54,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# This the same as above\n",
    "bacteria.iloc[3]['bacteria_counts']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Or, for the *unemployment* data:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 55,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1993.01"
      ]
     },
     "execution_count": 55,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# The year_month in the first row\n",
    "unemployment.iloc[0,2]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 56,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1993.01"
      ]
     },
     "execution_count": 56,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# This the same as above\n",
    "unemployment.iloc[0][2]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Selecting Multiple Rows and Columns\n",
    "\n",
    "Both `.loc` and `.iloc` can be used to select subsets of rows and columns at the same time if they are given lists as arguments, or slices for `.iloc`.\n",
    "\n",
    "The following example uses a list with `.iloc` to select specific rows, similar to fancy indexing in NumPy."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 57,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>country</th>\n",
       "      <th>seasonality</th>\n",
       "      <th>year_month</th>\n",
       "      <th>unemployment</th>\n",
       "      <th>unemployment_rate</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>at</td>\n",
       "      <td>nsa</td>\n",
       "      <td>1993.02</td>\n",
       "      <td>175000</td>\n",
       "      <td>4.6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>at</td>\n",
       "      <td>nsa</td>\n",
       "      <td>1993.06</td>\n",
       "      <td>134000</td>\n",
       "      <td>3.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>at</td>\n",
       "      <td>nsa</td>\n",
       "      <td>1993.07</td>\n",
       "      <td>128000</td>\n",
       "      <td>3.4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>22</th>\n",
       "      <td>at</td>\n",
       "      <td>nsa</td>\n",
       "      <td>1994.11</td>\n",
       "      <td>148000</td>\n",
       "      <td>3.9</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   country seasonality  year_month  unemployment  unemployment_rate\n",
       "1       at         nsa     1993.02        175000                4.6\n",
       "5       at         nsa     1993.06        134000                3.5\n",
       "6       at         nsa     1993.07        128000                3.4\n",
       "22      at         nsa     1994.11        148000                3.9"
      ]
     },
     "execution_count": 57,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "unemployment.iloc[[1, 5, 6, 22]]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can also select a range of rows and specify the step value."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 58,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>country</th>\n",
       "      <th>seasonality</th>\n",
       "      <th>year_month</th>\n",
       "      <th>unemployment</th>\n",
       "      <th>unemployment_rate</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>25</th>\n",
       "      <td>at</td>\n",
       "      <td>nsa</td>\n",
       "      <td>1995.02</td>\n",
       "      <td>174000</td>\n",
       "      <td>4.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>30</th>\n",
       "      <td>at</td>\n",
       "      <td>nsa</td>\n",
       "      <td>1995.07</td>\n",
       "      <td>123000</td>\n",
       "      <td>3.3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>35</th>\n",
       "      <td>at</td>\n",
       "      <td>nsa</td>\n",
       "      <td>1995.12</td>\n",
       "      <td>175000</td>\n",
       "      <td>4.7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>40</th>\n",
       "      <td>at</td>\n",
       "      <td>nsa</td>\n",
       "      <td>1996.05</td>\n",
       "      <td>159000</td>\n",
       "      <td>4.3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>45</th>\n",
       "      <td>at</td>\n",
       "      <td>nsa</td>\n",
       "      <td>1996.10</td>\n",
       "      <td>146000</td>\n",
       "      <td>3.9</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   country seasonality  year_month  unemployment  unemployment_rate\n",
       "25      at         nsa     1995.02        174000                4.5\n",
       "30      at         nsa     1995.07        123000                3.3\n",
       "35      at         nsa     1995.12        175000                4.7\n",
       "40      at         nsa     1996.05        159000                4.3\n",
       "45      at         nsa     1996.10        146000                3.9"
      ]
     },
     "execution_count": 58,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "unemployment.iloc[25:50:5]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And we can apply slicing over rows and columns. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 59,
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>country</th>\n",
       "      <th>seasonality</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>at</td>\n",
       "      <td>nsa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>at</td>\n",
       "      <td>nsa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>at</td>\n",
       "      <td>nsa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>at</td>\n",
       "      <td>nsa</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  country seasonality\n",
       "2      at         nsa\n",
       "3      at         nsa\n",
       "4      at         nsa\n",
       "5      at         nsa"
      ]
     },
     "execution_count": 59,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "unemployment.iloc[2:6,0:2]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The following example selects multiple rows with `.loc` based on string indices."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 60,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>bacteria_counts</th>\n",
       "      <th>other_feature</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Firmicutes</th>\n",
       "      <td>632</td>\n",
       "      <td>438</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Actinobacteria</th>\n",
       "      <td>569</td>\n",
       "      <td>234</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                bacteria_counts  other_feature\n",
       "Firmicutes                  632            438\n",
       "Actinobacteria              569            234"
      ]
     },
     "execution_count": 60,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "bacteria.loc[['Firmicutes', 'Actinobacteria']]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can also select a subset of rows and columns with `.loc`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 61,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>bacteria_counts</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Bacteroidetes</th>\n",
       "      <td>115</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Actinobacteria</th>\n",
       "      <td>569</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                bacteria_counts\n",
       "Bacteroidetes               115\n",
       "Actinobacteria              569"
      ]
     },
     "execution_count": 61,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "bacteria.loc[['Bacteroidetes', 'Actinobacteria'], ['bacteria_counts']]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Using `.iloc` on the *unemployment* DataFrame, select the rows starting at row 2 and ending at row 5, and the 0th, 2nd, and 3rd columns."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 62,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>country</th>\n",
       "      <th>year_month</th>\n",
       "      <th>unemployment</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>at</td>\n",
       "      <td>1993.03</td>\n",
       "      <td>166000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>at</td>\n",
       "      <td>1993.04</td>\n",
       "      <td>157000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>at</td>\n",
       "      <td>1993.05</td>\n",
       "      <td>147000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>at</td>\n",
       "      <td>1993.06</td>\n",
       "      <td>134000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  country  year_month  unemployment\n",
       "2      at     1993.03        166000\n",
       "3      at     1993.04        157000\n",
       "4      at     1993.05        147000\n",
       "5      at     1993.06        134000"
      ]
     },
     "execution_count": 62,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "unemployment.iloc[2:6,[0,2,3]]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The same selection can be achieved by using the `.loc` operator and listing the column names."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 63,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>country</th>\n",
       "      <th>year_month</th>\n",
       "      <th>unemployment</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>at</td>\n",
       "      <td>1993.03</td>\n",
       "      <td>166000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>at</td>\n",
       "      <td>1993.04</td>\n",
       "      <td>157000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>at</td>\n",
       "      <td>1993.05</td>\n",
       "      <td>147000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>at</td>\n",
       "      <td>1993.06</td>\n",
       "      <td>134000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>at</td>\n",
       "      <td>1993.07</td>\n",
       "      <td>128000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  country  year_month  unemployment\n",
       "2      at     1993.03        166000\n",
       "3      at     1993.04        157000\n",
       "4      at     1993.05        147000\n",
       "5      at     1993.06        134000\n",
       "6      at     1993.07        128000"
      ]
     },
     "execution_count": 63,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# The same as above\n",
    "unemployment.loc[2:6,['country', 'year_month', 'unemployment']]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Selecting Multiple Rows and Columns Using Conditional Expressions"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can also display values from a DataFrame that satisfy certain criteria using conditional expressions, such as `<`, `>`, `==`, `!=`, etc. \n",
    "\n",
    "One example is shown below where only the rows that have an unemployment rate greater than 15 are shown. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 64,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>country</th>\n",
       "      <th>seasonality</th>\n",
       "      <th>year_month</th>\n",
       "      <th>unemployment</th>\n",
       "      <th>unemployment_rate</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1717</th>\n",
       "      <td>bg</td>\n",
       "      <td>nsa</td>\n",
       "      <td>2000.02</td>\n",
       "      <td>523000</td>\n",
       "      <td>15.4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1718</th>\n",
       "      <td>bg</td>\n",
       "      <td>nsa</td>\n",
       "      <td>2000.03</td>\n",
       "      <td>547000</td>\n",
       "      <td>16.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1719</th>\n",
       "      <td>bg</td>\n",
       "      <td>nsa</td>\n",
       "      <td>2000.04</td>\n",
       "      <td>560000</td>\n",
       "      <td>16.3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1720</th>\n",
       "      <td>bg</td>\n",
       "      <td>nsa</td>\n",
       "      <td>2000.05</td>\n",
       "      <td>561000</td>\n",
       "      <td>16.3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1721</th>\n",
       "      <td>bg</td>\n",
       "      <td>nsa</td>\n",
       "      <td>2000.06</td>\n",
       "      <td>554000</td>\n",
       "      <td>16.2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1722</th>\n",
       "      <td>bg</td>\n",
       "      <td>nsa</td>\n",
       "      <td>2000.07</td>\n",
       "      <td>558000</td>\n",
       "      <td>16.3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1723</th>\n",
       "      <td>bg</td>\n",
       "      <td>nsa</td>\n",
       "      <td>2000.08</td>\n",
       "      <td>569000</td>\n",
       "      <td>16.7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1724</th>\n",
       "      <td>bg</td>\n",
       "      <td>nsa</td>\n",
       "      <td>2000.09</td>\n",
       "      <td>574000</td>\n",
       "      <td>16.8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1725</th>\n",
       "      <td>bg</td>\n",
       "      <td>nsa</td>\n",
       "      <td>2000.10</td>\n",
       "      <td>583000</td>\n",
       "      <td>17.1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1726</th>\n",
       "      <td>bg</td>\n",
       "      <td>nsa</td>\n",
       "      <td>2000.11</td>\n",
       "      <td>597000</td>\n",
       "      <td>17.5</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     country seasonality  year_month  unemployment  unemployment_rate\n",
       "1717      bg         nsa     2000.02        523000               15.4\n",
       "1718      bg         nsa     2000.03        547000               16.0\n",
       "1719      bg         nsa     2000.04        560000               16.3\n",
       "1720      bg         nsa     2000.05        561000               16.3\n",
       "1721      bg         nsa     2000.06        554000               16.2\n",
       "1722      bg         nsa     2000.07        558000               16.3\n",
       "1723      bg         nsa     2000.08        569000               16.7\n",
       "1724      bg         nsa     2000.09        574000               16.8\n",
       "1725      bg         nsa     2000.10        583000               17.1\n",
       "1726      bg         nsa     2000.11        597000               17.5"
      ]
     },
     "execution_count": 64,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "unemployment[unemployment['unemployment_rate'] > 15].head(10)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The following cells present one more example of selecting rows based on conditional statements."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 65,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>age</th>\n",
       "      <th>score</th>\n",
       "      <th>rt</th>\n",
       "      <th>group</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>17</td>\n",
       "      <td>12</td>\n",
       "      <td>3.552</td>\n",
       "      <td>test</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>19</td>\n",
       "      <td>10</td>\n",
       "      <td>1.624</td>\n",
       "      <td>test</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>21</td>\n",
       "      <td>11</td>\n",
       "      <td>6.431</td>\n",
       "      <td>test</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>37</td>\n",
       "      <td>15</td>\n",
       "      <td>7.132</td>\n",
       "      <td>test</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>18</td>\n",
       "      <td>16</td>\n",
       "      <td>2.925</td>\n",
       "      <td>test</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>19</td>\n",
       "      <td>14</td>\n",
       "      <td>4.662</td>\n",
       "      <td>control</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>47</td>\n",
       "      <td>25</td>\n",
       "      <td>3.634</td>\n",
       "      <td>control</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>18</td>\n",
       "      <td>21</td>\n",
       "      <td>3.635</td>\n",
       "      <td>control</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>19</td>\n",
       "      <td>29</td>\n",
       "      <td>5.234</td>\n",
       "      <td>control</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   age  score     rt    group\n",
       "0   17     12  3.552     test\n",
       "1   19     10  1.624     test\n",
       "2   21     11  6.431     test\n",
       "3   37     15  7.132     test\n",
       "4   18     16  2.925     test\n",
       "5   19     14  4.662  control\n",
       "6   47     25  3.634  control\n",
       "7   18     21  3.635  control\n",
       "8   19     29  5.234  control"
      ]
     },
     "execution_count": 65,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "score_df = pd.DataFrame(\n",
    "    {'age': [17, 19, 21, 37, 18, 19, 47, 18, 19],\n",
    "     'score': [12, 10, 11, 15, 16, 14, 25, 21, 29],\n",
    "     'rt': [3.552, 1.624, 6.431, 7.132, 2.925, 4.662, 3.634, 3.635, 5.234],\n",
    "     'group': [\"test\", \"test\", \"test\", \"test\", \"test\", \"control\", \"control\", \"control\", \"control\"]\n",
    "    })\n",
    "\n",
    "score_df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 66,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>age</th>\n",
       "      <th>score</th>\n",
       "      <th>rt</th>\n",
       "      <th>group</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>17</td>\n",
       "      <td>12</td>\n",
       "      <td>3.552</td>\n",
       "      <td>test</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>19</td>\n",
       "      <td>10</td>\n",
       "      <td>1.624</td>\n",
       "      <td>test</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>21</td>\n",
       "      <td>11</td>\n",
       "      <td>6.431</td>\n",
       "      <td>test</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>37</td>\n",
       "      <td>15</td>\n",
       "      <td>7.132</td>\n",
       "      <td>test</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>18</td>\n",
       "      <td>16</td>\n",
       "      <td>2.925</td>\n",
       "      <td>test</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   age  score     rt group\n",
       "0   17     12  3.552  test\n",
       "1   19     10  1.624  test\n",
       "2   21     11  6.431  test\n",
       "3   37     15  7.132  test\n",
       "4   18     16  2.925  test"
      ]
     },
     "execution_count": 66,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# select only rows from the 'test' group\n",
    "df_test = score_df[score_df['group'] == 'test']\n",
    "df_test"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 67,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>age</th>\n",
       "      <th>score</th>\n",
       "      <th>rt</th>\n",
       "      <th>group</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>21</td>\n",
       "      <td>11</td>\n",
       "      <td>6.431</td>\n",
       "      <td>test</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>37</td>\n",
       "      <td>15</td>\n",
       "      <td>7.132</td>\n",
       "      <td>test</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>47</td>\n",
       "      <td>25</td>\n",
       "      <td>3.634</td>\n",
       "      <td>control</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   age  score     rt    group\n",
       "2   21     11  6.431     test\n",
       "3   37     15  7.132     test\n",
       "6   47     25  3.634  control"
      ]
     },
     "execution_count": 67,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# select only rows for age > 19\n",
    "df_adult = score_df[score_df['age']> 19] \n",
    "df_adult"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 68,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>age</th>\n",
       "      <th>score</th>\n",
       "      <th>rt</th>\n",
       "      <th>group</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>21</td>\n",
       "      <td>11</td>\n",
       "      <td>6.431</td>\n",
       "      <td>test</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>37</td>\n",
       "      <td>15</td>\n",
       "      <td>7.132</td>\n",
       "      <td>test</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>47</td>\n",
       "      <td>25</td>\n",
       "      <td>3.634</td>\n",
       "      <td>control</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   age  score     rt    group\n",
       "2   21     11  6.431     test\n",
       "3   37     15  7.132     test\n",
       "6   47     25  3.634  control"
      ]
     },
     "execution_count": 68,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# select only rows for age > 19 and rt \n",
    "adult_and_rt = score_df[(score_df['age'] > 19) & (score_df['rt'] > 3)]\n",
    "adult_and_rt"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 69,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>age</th>\n",
       "      <th>score</th>\n",
       "      <th>rt</th>\n",
       "      <th>group</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>21</td>\n",
       "      <td>11</td>\n",
       "      <td>6.431</td>\n",
       "      <td>test</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>37</td>\n",
       "      <td>15</td>\n",
       "      <td>7.132</td>\n",
       "      <td>test</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   age  score     rt group\n",
       "2   21     11  6.431  test\n",
       "3   37     15  7.132  test"
      ]
     },
     "execution_count": 69,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# select only rows for age and rt in test group\n",
    "adult_and_rt_test = score_df[(score_df['age'] > 19) & (score_df['rt'] > 3) & (score_df['group'] == 'test')]\n",
    "adult_and_rt_test"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Differences between `loc` and `iloc`"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To show the differences between `loc` and `iloc` let's consider the following example."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 70,
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>x</th>\n",
       "      <th>y</th>\n",
       "      <th>z</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>10</td>\n",
       "      <td>20</td>\n",
       "      <td>30</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>20</td>\n",
       "      <td>30</td>\n",
       "      <td>40</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>30</td>\n",
       "      <td>40</td>\n",
       "      <td>50</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>40</td>\n",
       "      <td>50</td>\n",
       "      <td>60</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>50</td>\n",
       "      <td>60</td>\n",
       "      <td>70</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     x   y   z\n",
       "10  10  20  30\n",
       "11  20  30  40\n",
       "12  30  40  50\n",
       "0   40  50  60\n",
       "1   50  60  70"
      ]
     },
     "execution_count": 70,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df1 = pd.DataFrame({'x':[10, 20, 30, 40 ,50],  \n",
    "                    'y':[20, 30, 40, 50, 60],\n",
    "                    'z':[30, 40, 50, 60, 70]},\n",
    "                      index=[10, 11,12, 0, 1])\n",
    "df1"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Note in the following cells that `df1.iloc[0] ` selects the row with index location `0`, whereas `loc` selects the row with index label `0`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 71,
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "x    10\n",
       "y    20\n",
       "z    30\n",
       "Name: 10, dtype: int64"
      ]
     },
     "execution_count": 71,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    " # value at index location 0\n",
    "df1.iloc[0]   "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 72,
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "x    40\n",
       "y    50\n",
       "z    60\n",
       "Name: 0, dtype: int64"
      ]
     },
     "execution_count": 72,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    " # value at index label 0\n",
    "df1.loc[0]   "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Also,  there is a difference in the selected rows when using slicing operations with `iloc` and `loc`. One must be careful when using these operators, and always check the output to ensure it is as expected."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 73,
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>x</th>\n",
       "      <th>y</th>\n",
       "      <th>z</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>10</td>\n",
       "      <td>20</td>\n",
       "      <td>30</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     x   y   z\n",
       "10  10  20  30"
      ]
     },
     "execution_count": 73,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# rows at index location between 0 and 1 (exclusive)\n",
    "df1.iloc[0:1]  "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 74,
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>x</th>\n",
       "      <th>y</th>\n",
       "      <th>z</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>40</td>\n",
       "      <td>50</td>\n",
       "      <td>60</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>50</td>\n",
       "      <td>60</td>\n",
       "      <td>70</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    x   y   z\n",
       "0  40  50  60\n",
       "1  50  60  70"
      ]
     },
     "execution_count": 74,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# rows at index labels between 0 and 1 (inclusive)\n",
    "df1.loc[0:1] "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 7.4 Creating New Columns, Reordering <a id=\"7.4-creating-new-columns,-reordering\"/>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To create a new column in a DataFrame, we simply assign values to the new column, as in: \n",
    "\n",
    "```\n",
    "df['New_Column'] = ...\n",
    "```\n",
    "\n",
    "Often, the new columns are created from existing columns based on certain conditions, or by applying functions to existing columns.\n",
    "\n",
    "```\n",
    "Condition-Based: df['New_Column'] = df['Existing_Column'] > value\n",
    "Function-Based: df['New_Column'] = df['Existing_Column'].apply(function)\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Since the `year_month` column is not shown correctly in the `unemployment` DataFrame, let's try to split it into two separate columns for years and months. \n",
    "\n",
    "In the previous section, we saw that the data type in this column is `float64`. We will first extract the year using the `.astype()` method. This allows for type casting, i.e., using `.astype(int)` we will convert the floating point values into integer numbers (by truncating the decimals).\n",
    "\n",
    "The new column *year* will be added on the right of the DataFrame."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 75,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>country</th>\n",
       "      <th>seasonality</th>\n",
       "      <th>year_month</th>\n",
       "      <th>unemployment</th>\n",
       "      <th>unemployment_rate</th>\n",
       "      <th>year</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>at</td>\n",
       "      <td>nsa</td>\n",
       "      <td>1993.01</td>\n",
       "      <td>171000</td>\n",
       "      <td>4.5</td>\n",
       "      <td>1993</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>at</td>\n",
       "      <td>nsa</td>\n",
       "      <td>1993.02</td>\n",
       "      <td>175000</td>\n",
       "      <td>4.6</td>\n",
       "      <td>1993</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>at</td>\n",
       "      <td>nsa</td>\n",
       "      <td>1993.03</td>\n",
       "      <td>166000</td>\n",
       "      <td>4.4</td>\n",
       "      <td>1993</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>at</td>\n",
       "      <td>nsa</td>\n",
       "      <td>1993.04</td>\n",
       "      <td>157000</td>\n",
       "      <td>4.1</td>\n",
       "      <td>1993</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>at</td>\n",
       "      <td>nsa</td>\n",
       "      <td>1993.05</td>\n",
       "      <td>147000</td>\n",
       "      <td>3.9</td>\n",
       "      <td>1993</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  country seasonality  year_month  unemployment  unemployment_rate  year\n",
       "0      at         nsa     1993.01        171000                4.5  1993\n",
       "1      at         nsa     1993.02        175000                4.6  1993\n",
       "2      at         nsa     1993.03        166000                4.4  1993\n",
       "3      at         nsa     1993.04        157000                4.1  1993\n",
       "4      at         nsa     1993.05        147000                3.9  1993"
      ]
     },
     "execution_count": 75,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "unemployment['year'] = unemployment['year_month'].astype(int)\n",
    "unemployment.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Next, let's create a new column *month*. We will subtract the *year* value from *year_month* to get the decimal portion of the value, and multiply the result by 100 and convert to `int`. Because of the truncating that occurs when casting to `int`, we first need to round the values to the nearest whole number using `round()`. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 76,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>country</th>\n",
       "      <th>seasonality</th>\n",
       "      <th>year_month</th>\n",
       "      <th>unemployment</th>\n",
       "      <th>unemployment_rate</th>\n",
       "      <th>year</th>\n",
       "      <th>month</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>at</td>\n",
       "      <td>nsa</td>\n",
       "      <td>1993.01</td>\n",
       "      <td>171000</td>\n",
       "      <td>4.5</td>\n",
       "      <td>1993</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>at</td>\n",
       "      <td>nsa</td>\n",
       "      <td>1993.02</td>\n",
       "      <td>175000</td>\n",
       "      <td>4.6</td>\n",
       "      <td>1993</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>at</td>\n",
       "      <td>nsa</td>\n",
       "      <td>1993.03</td>\n",
       "      <td>166000</td>\n",
       "      <td>4.4</td>\n",
       "      <td>1993</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>at</td>\n",
       "      <td>nsa</td>\n",
       "      <td>1993.04</td>\n",
       "      <td>157000</td>\n",
       "      <td>4.1</td>\n",
       "      <td>1993</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>at</td>\n",
       "      <td>nsa</td>\n",
       "      <td>1993.05</td>\n",
       "      <td>147000</td>\n",
       "      <td>3.9</td>\n",
       "      <td>1993</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>at</td>\n",
       "      <td>nsa</td>\n",
       "      <td>1993.06</td>\n",
       "      <td>134000</td>\n",
       "      <td>3.5</td>\n",
       "      <td>1993</td>\n",
       "      <td>6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>at</td>\n",
       "      <td>nsa</td>\n",
       "      <td>1993.07</td>\n",
       "      <td>128000</td>\n",
       "      <td>3.4</td>\n",
       "      <td>1993</td>\n",
       "      <td>7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>at</td>\n",
       "      <td>nsa</td>\n",
       "      <td>1993.08</td>\n",
       "      <td>130000</td>\n",
       "      <td>3.4</td>\n",
       "      <td>1993</td>\n",
       "      <td>8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>at</td>\n",
       "      <td>nsa</td>\n",
       "      <td>1993.09</td>\n",
       "      <td>132000</td>\n",
       "      <td>3.5</td>\n",
       "      <td>1993</td>\n",
       "      <td>9</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>at</td>\n",
       "      <td>nsa</td>\n",
       "      <td>1993.10</td>\n",
       "      <td>141000</td>\n",
       "      <td>3.7</td>\n",
       "      <td>1993</td>\n",
       "      <td>10</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>at</td>\n",
       "      <td>nsa</td>\n",
       "      <td>1993.11</td>\n",
       "      <td>156000</td>\n",
       "      <td>4.1</td>\n",
       "      <td>1993</td>\n",
       "      <td>11</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>at</td>\n",
       "      <td>nsa</td>\n",
       "      <td>1993.12</td>\n",
       "      <td>169000</td>\n",
       "      <td>4.4</td>\n",
       "      <td>1993</td>\n",
       "      <td>12</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   country seasonality  year_month  unemployment  unemployment_rate  year  \\\n",
       "0       at         nsa     1993.01        171000                4.5  1993   \n",
       "1       at         nsa     1993.02        175000                4.6  1993   \n",
       "2       at         nsa     1993.03        166000                4.4  1993   \n",
       "3       at         nsa     1993.04        157000                4.1  1993   \n",
       "4       at         nsa     1993.05        147000                3.9  1993   \n",
       "5       at         nsa     1993.06        134000                3.5  1993   \n",
       "6       at         nsa     1993.07        128000                3.4  1993   \n",
       "7       at         nsa     1993.08        130000                3.4  1993   \n",
       "8       at         nsa     1993.09        132000                3.5  1993   \n",
       "9       at         nsa     1993.10        141000                3.7  1993   \n",
       "10      at         nsa     1993.11        156000                4.1  1993   \n",
       "11      at         nsa     1993.12        169000                4.4  1993   \n",
       "\n",
       "    month  \n",
       "0       1  \n",
       "1       2  \n",
       "2       3  \n",
       "3       4  \n",
       "4       5  \n",
       "5       6  \n",
       "6       7  \n",
       "7       8  \n",
       "8       9  \n",
       "9      10  \n",
       "10     11  \n",
       "11     12  "
      ]
     },
     "execution_count": 76,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "unemployment['month'] = ((unemployment['year_month'] - unemployment['year']) * 100).round().astype(int)\n",
    "unemployment.head(12)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now, let's try to reorder the newly created *year* and *month* columns in the DataFrame. For this, we will use the square brackets notation again, passing in a list of column names in the order we would like to see them."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 77,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>country</th>\n",
       "      <th>seasonality</th>\n",
       "      <th>year_month</th>\n",
       "      <th>year</th>\n",
       "      <th>month</th>\n",
       "      <th>unemployment</th>\n",
       "      <th>unemployment_rate</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>at</td>\n",
       "      <td>nsa</td>\n",
       "      <td>1993.01</td>\n",
       "      <td>1993</td>\n",
       "      <td>1</td>\n",
       "      <td>171000</td>\n",
       "      <td>4.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>at</td>\n",
       "      <td>nsa</td>\n",
       "      <td>1993.02</td>\n",
       "      <td>1993</td>\n",
       "      <td>2</td>\n",
       "      <td>175000</td>\n",
       "      <td>4.6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>at</td>\n",
       "      <td>nsa</td>\n",
       "      <td>1993.03</td>\n",
       "      <td>1993</td>\n",
       "      <td>3</td>\n",
       "      <td>166000</td>\n",
       "      <td>4.4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>at</td>\n",
       "      <td>nsa</td>\n",
       "      <td>1993.04</td>\n",
       "      <td>1993</td>\n",
       "      <td>4</td>\n",
       "      <td>157000</td>\n",
       "      <td>4.1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>at</td>\n",
       "      <td>nsa</td>\n",
       "      <td>1993.05</td>\n",
       "      <td>1993</td>\n",
       "      <td>5</td>\n",
       "      <td>147000</td>\n",
       "      <td>3.9</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>at</td>\n",
       "      <td>nsa</td>\n",
       "      <td>1993.06</td>\n",
       "      <td>1993</td>\n",
       "      <td>6</td>\n",
       "      <td>134000</td>\n",
       "      <td>3.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>at</td>\n",
       "      <td>nsa</td>\n",
       "      <td>1993.07</td>\n",
       "      <td>1993</td>\n",
       "      <td>7</td>\n",
       "      <td>128000</td>\n",
       "      <td>3.4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>at</td>\n",
       "      <td>nsa</td>\n",
       "      <td>1993.08</td>\n",
       "      <td>1993</td>\n",
       "      <td>8</td>\n",
       "      <td>130000</td>\n",
       "      <td>3.4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>at</td>\n",
       "      <td>nsa</td>\n",
       "      <td>1993.09</td>\n",
       "      <td>1993</td>\n",
       "      <td>9</td>\n",
       "      <td>132000</td>\n",
       "      <td>3.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>at</td>\n",
       "      <td>nsa</td>\n",
       "      <td>1993.10</td>\n",
       "      <td>1993</td>\n",
       "      <td>10</td>\n",
       "      <td>141000</td>\n",
       "      <td>3.7</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  country seasonality  year_month  year  month  unemployment  \\\n",
       "0      at         nsa     1993.01  1993      1        171000   \n",
       "1      at         nsa     1993.02  1993      2        175000   \n",
       "2      at         nsa     1993.03  1993      3        166000   \n",
       "3      at         nsa     1993.04  1993      4        157000   \n",
       "4      at         nsa     1993.05  1993      5        147000   \n",
       "5      at         nsa     1993.06  1993      6        134000   \n",
       "6      at         nsa     1993.07  1993      7        128000   \n",
       "7      at         nsa     1993.08  1993      8        130000   \n",
       "8      at         nsa     1993.09  1993      9        132000   \n",
       "9      at         nsa     1993.10  1993     10        141000   \n",
       "\n",
       "   unemployment_rate  \n",
       "0                4.5  \n",
       "1                4.6  \n",
       "2                4.4  \n",
       "3                4.1  \n",
       "4                3.9  \n",
       "5                3.5  \n",
       "6                3.4  \n",
       "7                3.4  \n",
       "8                3.5  \n",
       "9                3.7  "
      ]
     },
     "execution_count": 77,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "unemployment = unemployment[['country', 'seasonality',\n",
    "                             'year_month', 'year', 'month',\n",
    "                             'unemployment', 'unemployment_rate']]\n",
    "unemployment.head(10)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here is one more example of creating new columns by applying functions to existing columns."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 78,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Student</th>\n",
       "      <th>Score</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Alice</td>\n",
       "      <td>85</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Bob</td>\n",
       "      <td>80</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Charlie</td>\n",
       "      <td>78</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>David</td>\n",
       "      <td>92</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Eve</td>\n",
       "      <td>68</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   Student  Score\n",
       "0    Alice     85\n",
       "1      Bob     80\n",
       "2  Charlie     78\n",
       "3    David     92\n",
       "4      Eve     68"
      ]
     },
     "execution_count": 78,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "student_df = pd.DataFrame({\n",
    "    'Student': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],\n",
    "    'Score': [85, 80, 78, 92, 68]})\n",
    "student_df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 79,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Student</th>\n",
       "      <th>Score</th>\n",
       "      <th>Updated Score</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Alice</td>\n",
       "      <td>85</td>\n",
       "      <td>95</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Bob</td>\n",
       "      <td>80</td>\n",
       "      <td>90</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Charlie</td>\n",
       "      <td>78</td>\n",
       "      <td>88</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>David</td>\n",
       "      <td>92</td>\n",
       "      <td>102</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Eve</td>\n",
       "      <td>68</td>\n",
       "      <td>78</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   Student  Score  Updated Score\n",
       "0    Alice     85             95\n",
       "1      Bob     80             90\n",
       "2  Charlie     78             88\n",
       "3    David     92            102\n",
       "4      Eve     68             78"
      ]
     },
     "execution_count": 79,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# add 10 marks\n",
    "student_df['Updated Score'] = student_df['Score'] + 10\n",
    "student_df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 80,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Student</th>\n",
       "      <th>Score</th>\n",
       "      <th>Updated Score</th>\n",
       "      <th>Final Score</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Alice</td>\n",
       "      <td>85</td>\n",
       "      <td>95</td>\n",
       "      <td>95</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Bob</td>\n",
       "      <td>80</td>\n",
       "      <td>90</td>\n",
       "      <td>90</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Charlie</td>\n",
       "      <td>78</td>\n",
       "      <td>88</td>\n",
       "      <td>88</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>David</td>\n",
       "      <td>92</td>\n",
       "      <td>102</td>\n",
       "      <td>100</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Eve</td>\n",
       "      <td>68</td>\n",
       "      <td>78</td>\n",
       "      <td>78</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   Student  Score  Updated Score  Final Score\n",
       "0    Alice     85             95           95\n",
       "1      Bob     80             90           90\n",
       "2  Charlie     78             88           88\n",
       "3    David     92            102          100\n",
       "4      Eve     68             78           78"
      ]
     },
     "execution_count": 80,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# esnure scores are not greater than 100\n",
    "student_df['Final Score'] = student_df['Updated Score'].clip(upper=100)\n",
    "student_df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 81,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Student</th>\n",
       "      <th>Score</th>\n",
       "      <th>Updated Score</th>\n",
       "      <th>Final Score</th>\n",
       "      <th>Mean Score</th>\n",
       "      <th>Standard Deviation</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Alice</td>\n",
       "      <td>85</td>\n",
       "      <td>95</td>\n",
       "      <td>95</td>\n",
       "      <td>80.6</td>\n",
       "      <td>8.26</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Bob</td>\n",
       "      <td>80</td>\n",
       "      <td>90</td>\n",
       "      <td>90</td>\n",
       "      <td>80.6</td>\n",
       "      <td>8.26</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Charlie</td>\n",
       "      <td>78</td>\n",
       "      <td>88</td>\n",
       "      <td>88</td>\n",
       "      <td>80.6</td>\n",
       "      <td>8.26</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>David</td>\n",
       "      <td>92</td>\n",
       "      <td>102</td>\n",
       "      <td>100</td>\n",
       "      <td>80.6</td>\n",
       "      <td>8.26</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Eve</td>\n",
       "      <td>68</td>\n",
       "      <td>78</td>\n",
       "      <td>78</td>\n",
       "      <td>80.6</td>\n",
       "      <td>8.26</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   Student  Score  Updated Score  Final Score  Mean Score  Standard Deviation\n",
       "0    Alice     85             95           95        80.6                8.26\n",
       "1      Bob     80             90           90        80.6                8.26\n",
       "2  Charlie     78             88           88        80.6                8.26\n",
       "3    David     92            102          100        80.6                8.26\n",
       "4      Eve     68             78           78        80.6                8.26"
      ]
     },
     "execution_count": 81,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# calculate mean and standard deviation (rounded to 2 decimal places)\n",
    "student_df['Mean Score'] = student_df['Score'].mean()\n",
    "student_df['Standard Deviation'] = round(student_df['Final Score'].std(),2)\n",
    "student_df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 7.5 Removing Columns and Rows<a id=\"7.5-removing-columns-and-rows\"/>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To delete a column from a DataFrame, we can use the the `drop()` method. For instance, in the `countries` DataFrame, we will drop the column `country_group`. With the `drop()` method it is important to specify the `axis` parameter, where `axis=1` refers to columns (`axis=0` refers to rows)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 82,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>country</th>\n",
       "      <th>google_country_code</th>\n",
       "      <th>country_group</th>\n",
       "      <th>name_en</th>\n",
       "      <th>name_fr</th>\n",
       "      <th>name_de</th>\n",
       "      <th>latitude</th>\n",
       "      <th>longitude</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>at</td>\n",
       "      <td>AT</td>\n",
       "      <td>eu</td>\n",
       "      <td>Austria</td>\n",
       "      <td>Autriche</td>\n",
       "      <td>Österreich</td>\n",
       "      <td>47.696554</td>\n",
       "      <td>13.345980</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>be</td>\n",
       "      <td>BE</td>\n",
       "      <td>eu</td>\n",
       "      <td>Belgium</td>\n",
       "      <td>Belgique</td>\n",
       "      <td>Belgien</td>\n",
       "      <td>50.501045</td>\n",
       "      <td>4.476674</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>bg</td>\n",
       "      <td>BG</td>\n",
       "      <td>eu</td>\n",
       "      <td>Bulgaria</td>\n",
       "      <td>Bulgarie</td>\n",
       "      <td>Bulgarien</td>\n",
       "      <td>42.725674</td>\n",
       "      <td>25.482322</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>hr</td>\n",
       "      <td>HR</td>\n",
       "      <td>non-eu</td>\n",
       "      <td>Croatia</td>\n",
       "      <td>Croatie</td>\n",
       "      <td>Kroatien</td>\n",
       "      <td>44.746643</td>\n",
       "      <td>15.340844</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>cy</td>\n",
       "      <td>CY</td>\n",
       "      <td>eu</td>\n",
       "      <td>Cyprus</td>\n",
       "      <td>Chypre</td>\n",
       "      <td>Zypern</td>\n",
       "      <td>35.129141</td>\n",
       "      <td>33.428682</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  country google_country_code country_group   name_en   name_fr     name_de  \\\n",
       "0      at                  AT            eu   Austria  Autriche  Österreich   \n",
       "1      be                  BE            eu   Belgium  Belgique     Belgien   \n",
       "2      bg                  BG            eu  Bulgaria  Bulgarie   Bulgarien   \n",
       "3      hr                  HR        non-eu   Croatia   Croatie    Kroatien   \n",
       "4      cy                  CY            eu    Cyprus    Chypre      Zypern   \n",
       "\n",
       "    latitude  longitude  \n",
       "0  47.696554  13.345980  \n",
       "1  50.501045   4.476674  \n",
       "2  42.725674  25.482322  \n",
       "3  44.746643  15.340844  \n",
       "4  35.129141  33.428682  "
      ]
     },
     "execution_count": 82,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "countries.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 83,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>country</th>\n",
       "      <th>google_country_code</th>\n",
       "      <th>name_en</th>\n",
       "      <th>name_fr</th>\n",
       "      <th>name_de</th>\n",
       "      <th>latitude</th>\n",
       "      <th>longitude</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>at</td>\n",
       "      <td>AT</td>\n",
       "      <td>Austria</td>\n",
       "      <td>Autriche</td>\n",
       "      <td>Österreich</td>\n",
       "      <td>47.696554</td>\n",
       "      <td>13.345980</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>be</td>\n",
       "      <td>BE</td>\n",
       "      <td>Belgium</td>\n",
       "      <td>Belgique</td>\n",
       "      <td>Belgien</td>\n",
       "      <td>50.501045</td>\n",
       "      <td>4.476674</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>bg</td>\n",
       "      <td>BG</td>\n",
       "      <td>Bulgaria</td>\n",
       "      <td>Bulgarie</td>\n",
       "      <td>Bulgarien</td>\n",
       "      <td>42.725674</td>\n",
       "      <td>25.482322</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>hr</td>\n",
       "      <td>HR</td>\n",
       "      <td>Croatia</td>\n",
       "      <td>Croatie</td>\n",
       "      <td>Kroatien</td>\n",
       "      <td>44.746643</td>\n",
       "      <td>15.340844</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>cy</td>\n",
       "      <td>CY</td>\n",
       "      <td>Cyprus</td>\n",
       "      <td>Chypre</td>\n",
       "      <td>Zypern</td>\n",
       "      <td>35.129141</td>\n",
       "      <td>33.428682</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  country google_country_code   name_en   name_fr     name_de   latitude  \\\n",
       "0      at                  AT   Austria  Autriche  Österreich  47.696554   \n",
       "1      be                  BE   Belgium  Belgique     Belgien  50.501045   \n",
       "2      bg                  BG  Bulgaria  Bulgarie   Bulgarien  42.725674   \n",
       "3      hr                  HR   Croatia   Croatie    Kroatien  44.746643   \n",
       "4      cy                  CY    Cyprus    Chypre      Zypern  35.129141   \n",
       "\n",
       "   longitude  \n",
       "0  13.345980  \n",
       "1   4.476674  \n",
       "2  25.482322  \n",
       "3  15.340844  \n",
       "4  33.428682  "
      ]
     },
     "execution_count": 83,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "countries.drop('country_group', axis=1, inplace=True)\n",
    "countries.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "By using the `drop()` method we can remove multiple columns, by listing their labels within a list."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 84,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>country</th>\n",
       "      <th>google_country_code</th>\n",
       "      <th>name_en</th>\n",
       "      <th>name_fr</th>\n",
       "      <th>name_de</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>at</td>\n",
       "      <td>AT</td>\n",
       "      <td>Austria</td>\n",
       "      <td>Autriche</td>\n",
       "      <td>Österreich</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>be</td>\n",
       "      <td>BE</td>\n",
       "      <td>Belgium</td>\n",
       "      <td>Belgique</td>\n",
       "      <td>Belgien</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>bg</td>\n",
       "      <td>BG</td>\n",
       "      <td>Bulgaria</td>\n",
       "      <td>Bulgarie</td>\n",
       "      <td>Bulgarien</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>hr</td>\n",
       "      <td>HR</td>\n",
       "      <td>Croatia</td>\n",
       "      <td>Croatie</td>\n",
       "      <td>Kroatien</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>cy</td>\n",
       "      <td>CY</td>\n",
       "      <td>Cyprus</td>\n",
       "      <td>Chypre</td>\n",
       "      <td>Zypern</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  country google_country_code   name_en   name_fr     name_de\n",
       "0      at                  AT   Austria  Autriche  Österreich\n",
       "1      be                  BE   Belgium  Belgique     Belgien\n",
       "2      bg                  BG  Bulgaria  Bulgarie   Bulgarien\n",
       "3      hr                  HR   Croatia   Croatie    Kroatien\n",
       "4      cy                  CY    Cyprus    Chypre      Zypern"
      ]
     },
     "execution_count": 84,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "countries.drop(['latitude', 'longitude'], axis=1, inplace=True)\n",
    "countries.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Another way to remove columns in `pandas` is by using the `del` keyword."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 85,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>country</th>\n",
       "      <th>name_en</th>\n",
       "      <th>name_fr</th>\n",
       "      <th>name_de</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>at</td>\n",
       "      <td>Austria</td>\n",
       "      <td>Autriche</td>\n",
       "      <td>Österreich</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>be</td>\n",
       "      <td>Belgium</td>\n",
       "      <td>Belgique</td>\n",
       "      <td>Belgien</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>bg</td>\n",
       "      <td>Bulgaria</td>\n",
       "      <td>Bulgarie</td>\n",
       "      <td>Bulgarien</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>hr</td>\n",
       "      <td>Croatia</td>\n",
       "      <td>Croatie</td>\n",
       "      <td>Kroatien</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>cy</td>\n",
       "      <td>Cyprus</td>\n",
       "      <td>Chypre</td>\n",
       "      <td>Zypern</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  country   name_en   name_fr     name_de\n",
       "0      at   Austria  Autriche  Österreich\n",
       "1      be   Belgium  Belgique     Belgien\n",
       "2      bg  Bulgaria  Bulgarie   Bulgarien\n",
       "3      hr   Croatia   Croatie    Kroatien\n",
       "4      cy    Cyprus    Chypre      Zypern"
      ]
     },
     "execution_count": 85,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "del countries['google_country_code']\n",
    "countries.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To remove rows from a DataFrame, we can use the `drop()` method and set the `axis` parameter to 0.\n",
    "\n",
    "Similarly to columns, we can delete a single row, or multiple rows as in the examples below."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 86,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>country</th>\n",
       "      <th>seasonality</th>\n",
       "      <th>year_month</th>\n",
       "      <th>year</th>\n",
       "      <th>month</th>\n",
       "      <th>unemployment</th>\n",
       "      <th>unemployment_rate</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>at</td>\n",
       "      <td>nsa</td>\n",
       "      <td>1993.01</td>\n",
       "      <td>1993</td>\n",
       "      <td>1</td>\n",
       "      <td>171000</td>\n",
       "      <td>4.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>at</td>\n",
       "      <td>nsa</td>\n",
       "      <td>1993.02</td>\n",
       "      <td>1993</td>\n",
       "      <td>2</td>\n",
       "      <td>175000</td>\n",
       "      <td>4.6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>at</td>\n",
       "      <td>nsa</td>\n",
       "      <td>1993.03</td>\n",
       "      <td>1993</td>\n",
       "      <td>3</td>\n",
       "      <td>166000</td>\n",
       "      <td>4.4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>at</td>\n",
       "      <td>nsa</td>\n",
       "      <td>1993.05</td>\n",
       "      <td>1993</td>\n",
       "      <td>5</td>\n",
       "      <td>147000</td>\n",
       "      <td>3.9</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>at</td>\n",
       "      <td>nsa</td>\n",
       "      <td>1993.06</td>\n",
       "      <td>1993</td>\n",
       "      <td>6</td>\n",
       "      <td>134000</td>\n",
       "      <td>3.5</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  country seasonality  year_month  year  month  unemployment  \\\n",
       "0      at         nsa     1993.01  1993      1        171000   \n",
       "1      at         nsa     1993.02  1993      2        175000   \n",
       "2      at         nsa     1993.03  1993      3        166000   \n",
       "4      at         nsa     1993.05  1993      5        147000   \n",
       "5      at         nsa     1993.06  1993      6        134000   \n",
       "\n",
       "   unemployment_rate  \n",
       "0                4.5  \n",
       "1                4.6  \n",
       "2                4.4  \n",
       "4                3.9  \n",
       "5                3.5  "
      ]
     },
     "execution_count": 86,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "unemployment.drop(3, axis=0, inplace=True)\n",
    "unemployment.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 87,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>country</th>\n",
       "      <th>seasonality</th>\n",
       "      <th>year_month</th>\n",
       "      <th>year</th>\n",
       "      <th>month</th>\n",
       "      <th>unemployment</th>\n",
       "      <th>unemployment_rate</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>at</td>\n",
       "      <td>nsa</td>\n",
       "      <td>1993.01</td>\n",
       "      <td>1993</td>\n",
       "      <td>1</td>\n",
       "      <td>171000</td>\n",
       "      <td>4.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>at</td>\n",
       "      <td>nsa</td>\n",
       "      <td>1993.02</td>\n",
       "      <td>1993</td>\n",
       "      <td>2</td>\n",
       "      <td>175000</td>\n",
       "      <td>4.6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>at</td>\n",
       "      <td>nsa</td>\n",
       "      <td>1993.03</td>\n",
       "      <td>1993</td>\n",
       "      <td>3</td>\n",
       "      <td>166000</td>\n",
       "      <td>4.4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>at</td>\n",
       "      <td>nsa</td>\n",
       "      <td>1993.05</td>\n",
       "      <td>1993</td>\n",
       "      <td>5</td>\n",
       "      <td>147000</td>\n",
       "      <td>3.9</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>at</td>\n",
       "      <td>nsa</td>\n",
       "      <td>1993.06</td>\n",
       "      <td>1993</td>\n",
       "      <td>6</td>\n",
       "      <td>134000</td>\n",
       "      <td>3.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>at</td>\n",
       "      <td>nsa</td>\n",
       "      <td>1993.08</td>\n",
       "      <td>1993</td>\n",
       "      <td>8</td>\n",
       "      <td>130000</td>\n",
       "      <td>3.4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>at</td>\n",
       "      <td>nsa</td>\n",
       "      <td>1993.10</td>\n",
       "      <td>1993</td>\n",
       "      <td>10</td>\n",
       "      <td>141000</td>\n",
       "      <td>3.7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>at</td>\n",
       "      <td>nsa</td>\n",
       "      <td>1993.11</td>\n",
       "      <td>1993</td>\n",
       "      <td>11</td>\n",
       "      <td>156000</td>\n",
       "      <td>4.1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>at</td>\n",
       "      <td>nsa</td>\n",
       "      <td>1993.12</td>\n",
       "      <td>1993</td>\n",
       "      <td>12</td>\n",
       "      <td>169000</td>\n",
       "      <td>4.4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>at</td>\n",
       "      <td>nsa</td>\n",
       "      <td>1994.01</td>\n",
       "      <td>1994</td>\n",
       "      <td>1</td>\n",
       "      <td>180000</td>\n",
       "      <td>4.7</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   country seasonality  year_month  year  month  unemployment  \\\n",
       "0       at         nsa     1993.01  1993      1        171000   \n",
       "1       at         nsa     1993.02  1993      2        175000   \n",
       "2       at         nsa     1993.03  1993      3        166000   \n",
       "4       at         nsa     1993.05  1993      5        147000   \n",
       "5       at         nsa     1993.06  1993      6        134000   \n",
       "7       at         nsa     1993.08  1993      8        130000   \n",
       "9       at         nsa     1993.10  1993     10        141000   \n",
       "10      at         nsa     1993.11  1993     11        156000   \n",
       "11      at         nsa     1993.12  1993     12        169000   \n",
       "12      at         nsa     1994.01  1994      1        180000   \n",
       "\n",
       "    unemployment_rate  \n",
       "0                 4.5  \n",
       "1                 4.6  \n",
       "2                 4.4  \n",
       "4                 3.9  \n",
       "5                 3.5  \n",
       "7                 3.4  \n",
       "9                 3.7  \n",
       "10                4.1  \n",
       "11                4.4  \n",
       "12                4.7  "
      ]
     },
     "execution_count": 87,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "unemployment.drop([6,8], axis=0, inplace=True)\n",
    "unemployment.head(10)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 88,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(20793, 7)"
      ]
     },
     "execution_count": 88,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Check the shape of the modified DataFrame\n",
    "unemployment.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Another common way to remove columns or rows is based on a certain condition. For example, the following condition removes all rows corresponding to the data after 2006."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 89,
   "metadata": {},
   "outputs": [],
   "source": [
    "unemployment = unemployment[unemployment['year'] < 2006]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 90,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(15528, 7)"
      ]
     },
     "execution_count": 90,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Check the shape again\n",
    "unemployment.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 7.6 Merging DataFrames<a id=\"7.6-merging-dataframes\"/>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Merging DataFrames in `pandas` is a common operation for combining data from multiple DataFrames based on a common key or index.\n",
    "\n",
    "For instance, if we examine the `unemployment` DataFrame we can notice that we don't exactly know what the values in the `country` column refer to. We can correct that by obtaining the country names from the `countries` DataFrame that we imported earlier. \n",
    "\n",
    "We can see in the `countries` data that *at* stands for Austria. This DataFrame even provides the country names in three different languages (name_en, name_fr, name_de)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 91,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>country</th>\n",
       "      <th>name_en</th>\n",
       "      <th>name_fr</th>\n",
       "      <th>name_de</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>at</td>\n",
       "      <td>Austria</td>\n",
       "      <td>Autriche</td>\n",
       "      <td>Österreich</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>be</td>\n",
       "      <td>Belgium</td>\n",
       "      <td>Belgique</td>\n",
       "      <td>Belgien</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>bg</td>\n",
       "      <td>Bulgaria</td>\n",
       "      <td>Bulgarie</td>\n",
       "      <td>Bulgarien</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>hr</td>\n",
       "      <td>Croatia</td>\n",
       "      <td>Croatie</td>\n",
       "      <td>Kroatien</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>cy</td>\n",
       "      <td>Cyprus</td>\n",
       "      <td>Chypre</td>\n",
       "      <td>Zypern</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  country   name_en   name_fr     name_de\n",
       "0      at   Austria  Autriche  Österreich\n",
       "1      be   Belgium  Belgique     Belgien\n",
       "2      bg  Bulgaria  Bulgarie   Bulgarien\n",
       "3      hr   Croatia   Croatie    Kroatien\n",
       "4      cy    Cyprus    Chypre      Zypern"
      ]
     },
     "execution_count": 91,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "countries.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Because the data we need is stored in two separate files, we will merge the two DataFrames. The `country` column is shown in both DataFrames, so it is a good option for joining the data. However, we don't need all columns in the `countries` DataFrame, and therefore, we will create a new DataFrame that contains only the columns that we need. To select certain columns to retain, we can use the bracket notation that we used earlier to reorder the columns."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 92,
   "metadata": {},
   "outputs": [],
   "source": [
    "country_names = countries[['country', 'name_en']]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 93,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>country</th>\n",
       "      <th>name_en</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>at</td>\n",
       "      <td>Austria</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>be</td>\n",
       "      <td>Belgium</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>bg</td>\n",
       "      <td>Bulgaria</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>hr</td>\n",
       "      <td>Croatia</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>cy</td>\n",
       "      <td>Cyprus</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  country   name_en\n",
       "0      at   Austria\n",
       "1      be   Belgium\n",
       "2      bg  Bulgaria\n",
       "3      hr   Croatia\n",
       "4      cy    Cyprus"
      ]
     },
     "execution_count": 93,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "country_names.head(5)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For merging DataFrames, `pandas` include the `merge` method, which has the following syntax, where the parameter `on` lists the column for matching the DataFrames. This operation is similar to inner join in SQL and it combines rows which have matching keys in both DataFrames. We can also specify the type of join (e.g., inner, left, right, outer) by providing value for the optional `how` parameter.\n",
    "\n",
    "```\n",
    "pd.merge(first_file, second_file, on=['column_name'], how=['join type (default is 'inner')])\n",
    "```\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 94,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>country</th>\n",
       "      <th>seasonality</th>\n",
       "      <th>year_month</th>\n",
       "      <th>year</th>\n",
       "      <th>month</th>\n",
       "      <th>unemployment</th>\n",
       "      <th>unemployment_rate</th>\n",
       "      <th>name_en</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>at</td>\n",
       "      <td>nsa</td>\n",
       "      <td>1993.01</td>\n",
       "      <td>1993</td>\n",
       "      <td>1</td>\n",
       "      <td>171000</td>\n",
       "      <td>4.5</td>\n",
       "      <td>Austria</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>at</td>\n",
       "      <td>nsa</td>\n",
       "      <td>1993.02</td>\n",
       "      <td>1993</td>\n",
       "      <td>2</td>\n",
       "      <td>175000</td>\n",
       "      <td>4.6</td>\n",
       "      <td>Austria</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>at</td>\n",
       "      <td>nsa</td>\n",
       "      <td>1993.03</td>\n",
       "      <td>1993</td>\n",
       "      <td>3</td>\n",
       "      <td>166000</td>\n",
       "      <td>4.4</td>\n",
       "      <td>Austria</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>at</td>\n",
       "      <td>nsa</td>\n",
       "      <td>1993.05</td>\n",
       "      <td>1993</td>\n",
       "      <td>5</td>\n",
       "      <td>147000</td>\n",
       "      <td>3.9</td>\n",
       "      <td>Austria</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>at</td>\n",
       "      <td>nsa</td>\n",
       "      <td>1993.06</td>\n",
       "      <td>1993</td>\n",
       "      <td>6</td>\n",
       "      <td>134000</td>\n",
       "      <td>3.5</td>\n",
       "      <td>Austria</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  country seasonality  year_month  year  month  unemployment  \\\n",
       "0      at         nsa     1993.01  1993      1        171000   \n",
       "1      at         nsa     1993.02  1993      2        175000   \n",
       "2      at         nsa     1993.03  1993      3        166000   \n",
       "3      at         nsa     1993.05  1993      5        147000   \n",
       "4      at         nsa     1993.06  1993      6        134000   \n",
       "\n",
       "   unemployment_rate  name_en  \n",
       "0                4.5  Austria  \n",
       "1                4.6  Austria  \n",
       "2                4.4  Austria  \n",
       "3                3.9  Austria  \n",
       "4                3.5  Austria  "
      ]
     },
     "execution_count": 94,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "unemployment = pd.merge(unemployment, country_names, on=['country'])\n",
    "unemployment.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If we want to merge two files using multiple columns that exist in both files, we can pass a list of column names to the `on` parameter."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Combining DataFrames with `join()` and `concat()`"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Another similar method to `merge` that is used for combining DataFrames in `pandas` is `join()`. The `join()` function is often used for merging DataFrame based on index alignment. \n",
    "\n",
    "The following example joins the DataFrame `grades_df` to the `student_df` by matching the index positions between the DataFrames. This is equivalent to a left join operation. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 95,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Student</th>\n",
       "      <th>Score</th>\n",
       "      <th>Updated Score</th>\n",
       "      <th>Final Score</th>\n",
       "      <th>Mean Score</th>\n",
       "      <th>Standard Deviation</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Alice</td>\n",
       "      <td>85</td>\n",
       "      <td>95</td>\n",
       "      <td>95</td>\n",
       "      <td>80.6</td>\n",
       "      <td>8.26</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Bob</td>\n",
       "      <td>80</td>\n",
       "      <td>90</td>\n",
       "      <td>90</td>\n",
       "      <td>80.6</td>\n",
       "      <td>8.26</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Charlie</td>\n",
       "      <td>78</td>\n",
       "      <td>88</td>\n",
       "      <td>88</td>\n",
       "      <td>80.6</td>\n",
       "      <td>8.26</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>David</td>\n",
       "      <td>92</td>\n",
       "      <td>102</td>\n",
       "      <td>100</td>\n",
       "      <td>80.6</td>\n",
       "      <td>8.26</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Eve</td>\n",
       "      <td>68</td>\n",
       "      <td>78</td>\n",
       "      <td>78</td>\n",
       "      <td>80.6</td>\n",
       "      <td>8.26</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   Student  Score  Updated Score  Final Score  Mean Score  Standard Deviation\n",
       "0    Alice     85             95           95        80.6                8.26\n",
       "1      Bob     80             90           90        80.6                8.26\n",
       "2  Charlie     78             88           88        80.6                8.26\n",
       "3    David     92            102          100        80.6                8.26\n",
       "4      Eve     68             78           78        80.6                8.26"
      ]
     },
     "execution_count": 95,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "student_df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 96,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Grade</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>A</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>B</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>A</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>C</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  Grade\n",
       "1     A\n",
       "2     B\n",
       "3     A\n",
       "4     C"
      ]
     },
     "execution_count": 96,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "grades_df = pd.DataFrame({'Grade': ['A', 'B', 'A', 'C']}, index=[1, 2, 3, 4])\n",
    "grades_df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 97,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Student</th>\n",
       "      <th>Score</th>\n",
       "      <th>Updated Score</th>\n",
       "      <th>Final Score</th>\n",
       "      <th>Mean Score</th>\n",
       "      <th>Standard Deviation</th>\n",
       "      <th>Grade</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Alice</td>\n",
       "      <td>85</td>\n",
       "      <td>95</td>\n",
       "      <td>95</td>\n",
       "      <td>80.6</td>\n",
       "      <td>8.26</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Bob</td>\n",
       "      <td>80</td>\n",
       "      <td>90</td>\n",
       "      <td>90</td>\n",
       "      <td>80.6</td>\n",
       "      <td>8.26</td>\n",
       "      <td>A</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Charlie</td>\n",
       "      <td>78</td>\n",
       "      <td>88</td>\n",
       "      <td>88</td>\n",
       "      <td>80.6</td>\n",
       "      <td>8.26</td>\n",
       "      <td>B</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>David</td>\n",
       "      <td>92</td>\n",
       "      <td>102</td>\n",
       "      <td>100</td>\n",
       "      <td>80.6</td>\n",
       "      <td>8.26</td>\n",
       "      <td>A</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Eve</td>\n",
       "      <td>68</td>\n",
       "      <td>78</td>\n",
       "      <td>78</td>\n",
       "      <td>80.6</td>\n",
       "      <td>8.26</td>\n",
       "      <td>C</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   Student  Score  Updated Score  Final Score  Mean Score  Standard Deviation  \\\n",
       "0    Alice     85             95           95        80.6                8.26   \n",
       "1      Bob     80             90           90        80.6                8.26   \n",
       "2  Charlie     78             88           88        80.6                8.26   \n",
       "3    David     92            102          100        80.6                8.26   \n",
       "4      Eve     68             78           78        80.6                8.26   \n",
       "\n",
       "  Grade  \n",
       "0   NaN  \n",
       "1     A  \n",
       "2     B  \n",
       "3     A  \n",
       "4     C  "
      ]
     },
     "execution_count": 97,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "joined_df = student_df.join(grades_df)\n",
    "joined_df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "However, `join()` also allows to specify the type of join operation with the `how` argument, similar to `merge()`. In the example below, an inner join operation is illustrated, which returns the rows that have matching indices in both DataFrames.  "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 98,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Student</th>\n",
       "      <th>Score</th>\n",
       "      <th>Updated Score</th>\n",
       "      <th>Final Score</th>\n",
       "      <th>Mean Score</th>\n",
       "      <th>Standard Deviation</th>\n",
       "      <th>Grade</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Bob</td>\n",
       "      <td>80</td>\n",
       "      <td>90</td>\n",
       "      <td>90</td>\n",
       "      <td>80.6</td>\n",
       "      <td>8.26</td>\n",
       "      <td>A</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Charlie</td>\n",
       "      <td>78</td>\n",
       "      <td>88</td>\n",
       "      <td>88</td>\n",
       "      <td>80.6</td>\n",
       "      <td>8.26</td>\n",
       "      <td>B</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>David</td>\n",
       "      <td>92</td>\n",
       "      <td>102</td>\n",
       "      <td>100</td>\n",
       "      <td>80.6</td>\n",
       "      <td>8.26</td>\n",
       "      <td>A</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Eve</td>\n",
       "      <td>68</td>\n",
       "      <td>78</td>\n",
       "      <td>78</td>\n",
       "      <td>80.6</td>\n",
       "      <td>8.26</td>\n",
       "      <td>C</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   Student  Score  Updated Score  Final Score  Mean Score  Standard Deviation  \\\n",
       "1      Bob     80             90           90        80.6                8.26   \n",
       "2  Charlie     78             88           88        80.6                8.26   \n",
       "3    David     92            102          100        80.6                8.26   \n",
       "4      Eve     68             78           78        80.6                8.26   \n",
       "\n",
       "  Grade  \n",
       "1     A  \n",
       "2     B  \n",
       "3     A  \n",
       "4     C  "
      ]
     },
     "execution_count": 98,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "joined_df = student_df.join(grades_df, how='inner')\n",
    "joined_df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Note again the `join()` is primarily designed for merging DataFrames based on their indices, and it does not support merging DatFrames on specific columns. For that purpose, we should use the `merge()` method instead.\n",
    "\n",
    "Similarly, `concat()` allows to combine two DataFrames in `pandas` along a particular axis (either rows or columns). By default, it includes all columns or rows, filling missing values with NaN. However, the `concat` function can also accept a `join` argument to specify the type of join operation. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 99,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Student</th>\n",
       "      <th>Score</th>\n",
       "      <th>Updated Score</th>\n",
       "      <th>Final Score</th>\n",
       "      <th>Mean Score</th>\n",
       "      <th>Standard Deviation</th>\n",
       "      <th>Grade</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Alice</td>\n",
       "      <td>85</td>\n",
       "      <td>95</td>\n",
       "      <td>95</td>\n",
       "      <td>80.6</td>\n",
       "      <td>8.26</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Bob</td>\n",
       "      <td>80</td>\n",
       "      <td>90</td>\n",
       "      <td>90</td>\n",
       "      <td>80.6</td>\n",
       "      <td>8.26</td>\n",
       "      <td>A</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Charlie</td>\n",
       "      <td>78</td>\n",
       "      <td>88</td>\n",
       "      <td>88</td>\n",
       "      <td>80.6</td>\n",
       "      <td>8.26</td>\n",
       "      <td>B</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>David</td>\n",
       "      <td>92</td>\n",
       "      <td>102</td>\n",
       "      <td>100</td>\n",
       "      <td>80.6</td>\n",
       "      <td>8.26</td>\n",
       "      <td>A</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Eve</td>\n",
       "      <td>68</td>\n",
       "      <td>78</td>\n",
       "      <td>78</td>\n",
       "      <td>80.6</td>\n",
       "      <td>8.26</td>\n",
       "      <td>C</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   Student  Score  Updated Score  Final Score  Mean Score  Standard Deviation  \\\n",
       "0    Alice     85             95           95        80.6                8.26   \n",
       "1      Bob     80             90           90        80.6                8.26   \n",
       "2  Charlie     78             88           88        80.6                8.26   \n",
       "3    David     92            102          100        80.6                8.26   \n",
       "4      Eve     68             78           78        80.6                8.26   \n",
       "\n",
       "  Grade  \n",
       "0   NaN  \n",
       "1     A  \n",
       "2     B  \n",
       "3     A  \n",
       "4     C  "
      ]
     },
     "execution_count": 99,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Concatenate along columns (axis=1)\n",
    "joined_df_2 = pd.concat([student_df, grades_df], axis=1)\n",
    "joined_df_2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 100,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Student</th>\n",
       "      <th>Score</th>\n",
       "      <th>Updated Score</th>\n",
       "      <th>Final Score</th>\n",
       "      <th>Mean Score</th>\n",
       "      <th>Standard Deviation</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Alice</td>\n",
       "      <td>85</td>\n",
       "      <td>95.0</td>\n",
       "      <td>95.0</td>\n",
       "      <td>80.6</td>\n",
       "      <td>8.26</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Bob</td>\n",
       "      <td>80</td>\n",
       "      <td>90.0</td>\n",
       "      <td>90.0</td>\n",
       "      <td>80.6</td>\n",
       "      <td>8.26</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Charlie</td>\n",
       "      <td>78</td>\n",
       "      <td>88.0</td>\n",
       "      <td>88.0</td>\n",
       "      <td>80.6</td>\n",
       "      <td>8.26</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>David</td>\n",
       "      <td>92</td>\n",
       "      <td>102.0</td>\n",
       "      <td>100.0</td>\n",
       "      <td>80.6</td>\n",
       "      <td>8.26</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Eve</td>\n",
       "      <td>68</td>\n",
       "      <td>78.0</td>\n",
       "      <td>78.0</td>\n",
       "      <td>80.6</td>\n",
       "      <td>8.26</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>George</td>\n",
       "      <td>69</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Ann</td>\n",
       "      <td>82</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   Student  Score  Updated Score  Final Score  Mean Score  Standard Deviation\n",
       "0    Alice     85           95.0         95.0        80.6                8.26\n",
       "1      Bob     80           90.0         90.0        80.6                8.26\n",
       "2  Charlie     78           88.0         88.0        80.6                8.26\n",
       "3    David     92          102.0        100.0        80.6                8.26\n",
       "4      Eve     68           78.0         78.0        80.6                8.26\n",
       "0   George     69            NaN          NaN         NaN                 NaN\n",
       "1      Ann     82            NaN          NaN         NaN                 NaN"
      ]
     },
     "execution_count": 100,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Concatenate along rows (axis=0)\n",
    "student_df_2 = pd.DataFrame({'Student': ['George', 'Ann'], 'Score': [69, 82]})\n",
    "joined_df_3 = pd.concat([student_df, student_df_2], axis=0)\n",
    "joined_df_3"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In the combined DataFrame `joined_df_3` above, notice that the indices of the added rows begin at 0. To reset the indices in a `pandas` DataFrame, we can use, well, the `reset_index()` method."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 101,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>index</th>\n",
       "      <th>Student</th>\n",
       "      <th>Score</th>\n",
       "      <th>Updated Score</th>\n",
       "      <th>Final Score</th>\n",
       "      <th>Mean Score</th>\n",
       "      <th>Standard Deviation</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0</td>\n",
       "      <td>Alice</td>\n",
       "      <td>85</td>\n",
       "      <td>95.0</td>\n",
       "      <td>95.0</td>\n",
       "      <td>80.6</td>\n",
       "      <td>8.26</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1</td>\n",
       "      <td>Bob</td>\n",
       "      <td>80</td>\n",
       "      <td>90.0</td>\n",
       "      <td>90.0</td>\n",
       "      <td>80.6</td>\n",
       "      <td>8.26</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2</td>\n",
       "      <td>Charlie</td>\n",
       "      <td>78</td>\n",
       "      <td>88.0</td>\n",
       "      <td>88.0</td>\n",
       "      <td>80.6</td>\n",
       "      <td>8.26</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>3</td>\n",
       "      <td>David</td>\n",
       "      <td>92</td>\n",
       "      <td>102.0</td>\n",
       "      <td>100.0</td>\n",
       "      <td>80.6</td>\n",
       "      <td>8.26</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>4</td>\n",
       "      <td>Eve</td>\n",
       "      <td>68</td>\n",
       "      <td>78.0</td>\n",
       "      <td>78.0</td>\n",
       "      <td>80.6</td>\n",
       "      <td>8.26</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>0</td>\n",
       "      <td>George</td>\n",
       "      <td>69</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>1</td>\n",
       "      <td>Ann</td>\n",
       "      <td>82</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   index  Student  Score  Updated Score  Final Score  Mean Score  \\\n",
       "0      0    Alice     85           95.0         95.0        80.6   \n",
       "1      1      Bob     80           90.0         90.0        80.6   \n",
       "2      2  Charlie     78           88.0         88.0        80.6   \n",
       "3      3    David     92          102.0        100.0        80.6   \n",
       "4      4      Eve     68           78.0         78.0        80.6   \n",
       "5      0   George     69            NaN          NaN         NaN   \n",
       "6      1      Ann     82            NaN          NaN         NaN   \n",
       "\n",
       "   Standard Deviation  \n",
       "0                8.26  \n",
       "1                8.26  \n",
       "2                8.26  \n",
       "3                8.26  \n",
       "4                8.26  \n",
       "5                 NaN  \n",
       "6                 NaN  "
      ]
     },
     "execution_count": 101,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "joined_df_3 = joined_df_3.reset_index()\n",
    "joined_df_3"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For more information on merging DataFrames, check the pandas [documentation](http://pandas.pydata.org/pandas-docs/stable/merging.html#database-style-dataframe-joining-merging). Also, we will explain more about merging operations in the lecture on Databases and SQL.\n",
    "\n",
    "Here is a figure that depicts inner, right, left, and outer join operations. \n",
    "\n",
    "<img src=\"images/join_operations.png\" heigth=\"400\">\n",
    "<em>Figure: Join Operations.</em>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 7.7 Calculating Unique and Missing Values <a id=\"7.7-calculating-unique-and-missing-values\"/>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In the *unemployment* DataFrame, we might want to know for which countries we have data available. To extract this information, we can use the `.unique()` method. Note that the countries are listed in the right-most column `name-en` so we will use it to find the unique elements in that column."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 102,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>country</th>\n",
       "      <th>seasonality</th>\n",
       "      <th>year_month</th>\n",
       "      <th>year</th>\n",
       "      <th>month</th>\n",
       "      <th>unemployment</th>\n",
       "      <th>unemployment_rate</th>\n",
       "      <th>name_en</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>at</td>\n",
       "      <td>nsa</td>\n",
       "      <td>1993.01</td>\n",
       "      <td>1993</td>\n",
       "      <td>1</td>\n",
       "      <td>171000</td>\n",
       "      <td>4.5</td>\n",
       "      <td>Austria</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>at</td>\n",
       "      <td>nsa</td>\n",
       "      <td>1993.02</td>\n",
       "      <td>1993</td>\n",
       "      <td>2</td>\n",
       "      <td>175000</td>\n",
       "      <td>4.6</td>\n",
       "      <td>Austria</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>at</td>\n",
       "      <td>nsa</td>\n",
       "      <td>1993.03</td>\n",
       "      <td>1993</td>\n",
       "      <td>3</td>\n",
       "      <td>166000</td>\n",
       "      <td>4.4</td>\n",
       "      <td>Austria</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>at</td>\n",
       "      <td>nsa</td>\n",
       "      <td>1993.05</td>\n",
       "      <td>1993</td>\n",
       "      <td>5</td>\n",
       "      <td>147000</td>\n",
       "      <td>3.9</td>\n",
       "      <td>Austria</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>at</td>\n",
       "      <td>nsa</td>\n",
       "      <td>1993.06</td>\n",
       "      <td>1993</td>\n",
       "      <td>6</td>\n",
       "      <td>134000</td>\n",
       "      <td>3.5</td>\n",
       "      <td>Austria</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  country seasonality  year_month  year  month  unemployment  \\\n",
       "0      at         nsa     1993.01  1993      1        171000   \n",
       "1      at         nsa     1993.02  1993      2        175000   \n",
       "2      at         nsa     1993.03  1993      3        166000   \n",
       "3      at         nsa     1993.05  1993      5        147000   \n",
       "4      at         nsa     1993.06  1993      6        134000   \n",
       "\n",
       "   unemployment_rate  name_en  \n",
       "0                4.5  Austria  \n",
       "1                4.6  Austria  \n",
       "2                4.4  Austria  \n",
       "3                3.9  Austria  \n",
       "4                3.5  Austria  "
      ]
     },
     "execution_count": 102,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "unemployment.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 103,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array(['Austria', 'Belgium', 'Bulgaria', 'Cyprus', 'Czech Republic',\n",
       "       'Germany (including  former GDR from 1991)', 'Denmark', 'Estonia',\n",
       "       'Spain', 'Finland', 'France', 'Greece', 'Croatia', 'Hungary',\n",
       "       'Ireland', 'Italy', 'Lithuania', 'Luxembourg', 'Latvia', 'Malta',\n",
       "       'Netherlands', 'Norway', 'Poland', 'Portugal', 'Romania', 'Sweden',\n",
       "       'Slovenia', 'Slovakia', 'Turkey', 'United Kingdom'], dtype=object)"
      ]
     },
     "execution_count": 103,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "unemployment['name_en'].unique()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To get a count of the **number of unique countries,** we can use the `.nunique()` method."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 104,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "30"
      ]
     },
     "execution_count": 104,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "unemployment['name_en'].nunique()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Or, we can also use `len()` to get the number of items in the above array."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 105,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "30"
      ]
     },
     "execution_count": 105,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "len(unemployment['name_en'].unique())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If we are interested to know **how many rows** there are per country, `pandas` has the `.value_counts()` method that returns the counts for the unique values in a column."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 106,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "United Kingdom                               828\n",
       "France                                       828\n",
       "Sweden                                       828\n",
       "Portugal                                     828\n",
       "Netherlands                                  828\n",
       "Luxembourg                                   828\n",
       "Denmark                                      828\n",
       "Belgium                                      828\n",
       "Spain                                        828\n",
       "Ireland                                      828\n",
       "Italy                                        804\n",
       "Finland                                      648\n",
       "Norway                                       612\n",
       "Austria                                      465\n",
       "Hungary                                      396\n",
       "Slovakia                                     396\n",
       "Slovenia                                     396\n",
       "Bulgaria                                     396\n",
       "Malta                                        396\n",
       "Poland                                       396\n",
       "Germany (including  former GDR from 1991)    336\n",
       "Czech Republic                               288\n",
       "Latvia                                       288\n",
       "Lithuania                                    288\n",
       "Greece                                       279\n",
       "Romania                                      252\n",
       "Cyprus                                       216\n",
       "Estonia                                      216\n",
       "Croatia                                      144\n",
       "Turkey                                        36\n",
       "Name: name_en, dtype: int64"
      ]
     },
     "execution_count": 106,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "unemployment['name_en'].value_counts()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "By default, the output is sorted by values in descending order. If we would like it sorted by index (or, by country name in alphabetical order in this case), we can append the `.sort_index()` method."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 107,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Austria                                      465\n",
       "Belgium                                      828\n",
       "Bulgaria                                     396\n",
       "Croatia                                      144\n",
       "Cyprus                                       216\n",
       "Czech Republic                               288\n",
       "Denmark                                      828\n",
       "Estonia                                      216\n",
       "Finland                                      648\n",
       "France                                       828\n",
       "Germany (including  former GDR from 1991)    336\n",
       "Greece                                       279\n",
       "Hungary                                      396\n",
       "Ireland                                      828\n",
       "Italy                                        804\n",
       "Latvia                                       288\n",
       "Lithuania                                    288\n",
       "Luxembourg                                   828\n",
       "Malta                                        396\n",
       "Netherlands                                  828\n",
       "Norway                                       612\n",
       "Poland                                       396\n",
       "Portugal                                     828\n",
       "Romania                                      252\n",
       "Slovakia                                     396\n",
       "Slovenia                                     396\n",
       "Spain                                        828\n",
       "Sweden                                       828\n",
       "Turkey                                        36\n",
       "United Kingdom                               828\n",
       "Name: name_en, dtype: int64"
      ]
     },
     "execution_count": 107,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "unemployment['name_en'].value_counts().sort_index()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As we noticed earlier, there are missing values in the `unemployment_rate` column. To find out **how many unemployment rate values are missing** we will use the `.isnull()` method, which returns a corresponding boolean value for each missing entry in the `unemployment_rate` column. As we know, in Python `True` is equivalent to 1 and `False` is equivalent to 0. Thus, when we add `.sum()`, we obtain a count for the total number of missing values in the `unemployment_rate` column."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 108,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "825"
      ]
     },
     "execution_count": 108,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "unemployment['unemployment_rate'].isnull().sum()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### GroupBy"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The `groupby()` method in `pandas` is used to group data based on one or more columns. It is similar to the GroupBy function in SQL, and allows to apply operations or filtering on grouped data. \n",
    "\n",
    "If we would like to know how many missing values for the `unemployment_rate` column there are for each *country*, we can first create a new column in the `DataFrame` that has boolean True or False for the `unemployment_rate` column. This is the last column to the right below, in which `False` means that the value is not missing."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 109,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>country</th>\n",
       "      <th>seasonality</th>\n",
       "      <th>year_month</th>\n",
       "      <th>year</th>\n",
       "      <th>month</th>\n",
       "      <th>unemployment</th>\n",
       "      <th>unemployment_rate</th>\n",
       "      <th>name_en</th>\n",
       "      <th>unemployment_rate_null</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>at</td>\n",
       "      <td>nsa</td>\n",
       "      <td>1993.01</td>\n",
       "      <td>1993</td>\n",
       "      <td>1</td>\n",
       "      <td>171000</td>\n",
       "      <td>4.5</td>\n",
       "      <td>Austria</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>at</td>\n",
       "      <td>nsa</td>\n",
       "      <td>1993.02</td>\n",
       "      <td>1993</td>\n",
       "      <td>2</td>\n",
       "      <td>175000</td>\n",
       "      <td>4.6</td>\n",
       "      <td>Austria</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>at</td>\n",
       "      <td>nsa</td>\n",
       "      <td>1993.03</td>\n",
       "      <td>1993</td>\n",
       "      <td>3</td>\n",
       "      <td>166000</td>\n",
       "      <td>4.4</td>\n",
       "      <td>Austria</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>at</td>\n",
       "      <td>nsa</td>\n",
       "      <td>1993.05</td>\n",
       "      <td>1993</td>\n",
       "      <td>5</td>\n",
       "      <td>147000</td>\n",
       "      <td>3.9</td>\n",
       "      <td>Austria</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>at</td>\n",
       "      <td>nsa</td>\n",
       "      <td>1993.06</td>\n",
       "      <td>1993</td>\n",
       "      <td>6</td>\n",
       "      <td>134000</td>\n",
       "      <td>3.5</td>\n",
       "      <td>Austria</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  country seasonality  year_month  year  month  unemployment  \\\n",
       "0      at         nsa     1993.01  1993      1        171000   \n",
       "1      at         nsa     1993.02  1993      2        175000   \n",
       "2      at         nsa     1993.03  1993      3        166000   \n",
       "3      at         nsa     1993.05  1993      5        147000   \n",
       "4      at         nsa     1993.06  1993      6        134000   \n",
       "\n",
       "   unemployment_rate  name_en  unemployment_rate_null  \n",
       "0                4.5  Austria                   False  \n",
       "1                4.6  Austria                   False  \n",
       "2                4.4  Austria                   False  \n",
       "3                3.9  Austria                   False  \n",
       "4                3.5  Austria                   False  "
      ]
     },
     "execution_count": 109,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "unemployment['unemployment_rate_null'] = unemployment['unemployment_rate'].isnull()\n",
    "unemployment.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To count the **number of missing values for each country,** we can use the `.groupby()` method to group the data by the country `name_en` column included in the parentheses, and perform the `.sum()` operation over the `unemployment_rate_null` column. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 110,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "name_en\n",
       "Austria                                        0\n",
       "Belgium                                        0\n",
       "Bulgaria                                     180\n",
       "Croatia                                       96\n",
       "Cyprus                                         0\n",
       "Czech Republic                                 0\n",
       "Denmark                                        0\n",
       "Estonia                                        0\n",
       "Finland                                        0\n",
       "France                                         0\n",
       "Germany (including  former GDR from 1991)      0\n",
       "Greece                                         0\n",
       "Hungary                                       36\n",
       "Ireland                                        0\n",
       "Italy                                          0\n",
       "Latvia                                         0\n",
       "Lithuania                                      0\n",
       "Luxembourg                                     0\n",
       "Malta                                        180\n",
       "Netherlands                                    0\n",
       "Norway                                         0\n",
       "Poland                                        72\n",
       "Portugal                                       0\n",
       "Romania                                        0\n",
       "Slovakia                                     108\n",
       "Slovenia                                      36\n",
       "Spain                                        117\n",
       "Sweden                                         0\n",
       "Turkey                                         0\n",
       "United Kingdom                                 0\n",
       "Name: unemployment_rate_null, dtype: int64"
      ]
     },
     "execution_count": 110,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "unemployment.groupby('name_en')['unemployment_rate_null'].sum()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Also, we can use `.groupby` to group data by multiple columns. For example, we can check the missing values for unemployment both by country and year, and we can apply functions such as `.sum()` to the grouped objects, as shown below."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 111,
   "metadata": {},
   "outputs": [],
   "source": [
    "grouped = unemployment.groupby(['name_en', 'year'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 112,
   "metadata": {},
   "outputs": [],
   "source": [
    "grouped_sum = grouped['unemployment_rate_null'].sum()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 113,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>unemployment_rate_null</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>name_en</th>\n",
       "      <th>year</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"5\" valign=\"top\">Austria</th>\n",
       "      <th>1993</th>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1994</th>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1995</th>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1996</th>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1997</th>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "              unemployment_rate_null\n",
       "name_en year                        \n",
       "Austria 1993                       0\n",
       "        1994                       0\n",
       "        1995                       0\n",
       "        1996                       0\n",
       "        1997                       0"
      ]
     },
     "execution_count": 113,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Convert to a DataFrame\n",
    "grouped_sum_df = grouped_sum.to_frame()\n",
    "grouped_sum_df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "One more simple example of using `groupby` in `pandas` is shown below."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 114,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Department</th>\n",
       "      <th>Employee</th>\n",
       "      <th>Salary</th>\n",
       "      <th>JobTitle</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>HR</td>\n",
       "      <td>Alice</td>\n",
       "      <td>70000</td>\n",
       "      <td>Employee</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>IT</td>\n",
       "      <td>Bob</td>\n",
       "      <td>80000</td>\n",
       "      <td>Developer</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>HR</td>\n",
       "      <td>Charlie</td>\n",
       "      <td>75000</td>\n",
       "      <td>Manager</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>IT</td>\n",
       "      <td>David</td>\n",
       "      <td>85000</td>\n",
       "      <td>Developer</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>IT</td>\n",
       "      <td>Edward</td>\n",
       "      <td>90000</td>\n",
       "      <td>Manager</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>HR</td>\n",
       "      <td>Fiona</td>\n",
       "      <td>71000</td>\n",
       "      <td>Employee</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  Department Employee  Salary   JobTitle\n",
       "0         HR    Alice   70000   Employee\n",
       "1         IT      Bob   80000  Developer\n",
       "2         HR  Charlie   75000    Manager\n",
       "3         IT    David   85000  Developer\n",
       "4         IT   Edward   90000    Manager\n",
       "5         HR    Fiona   71000   Employee"
      ]
     },
     "execution_count": 114,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "company_df = pd.DataFrame({\n",
    "    'Department': ['HR', 'IT', 'HR', 'IT', 'IT', 'HR'],\n",
    "    'Employee': ['Alice', 'Bob', 'Charlie', 'David', 'Edward', 'Fiona'],\n",
    "    'Salary': [70000, 80000, 75000, 85000, 90000, 71000],\n",
    "    'JobTitle' : ['Employee', 'Developer', 'Manager', 'Developer', 'Manager', 'Employee']\n",
    "})\n",
    "\n",
    "company_df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Group by the column `'Department` and calculate the mean salary for each department. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 115,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Department\n",
       "HR    72000.0\n",
       "IT    85000.0\n",
       "Name: Salary, dtype: float64"
      ]
     },
     "execution_count": 115,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "company_df.groupby('Department')['Salary'].mean()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can perform multiple operations on the grouped data. The following example calculates the mean, sum, max, and min for the grouped data. These operations for calculating data statistics in `pandas` are referred to as **aggregate functions**. They are listed inside the `agg()` method, allowing to apply multiple operations at once. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 116,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead tr th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe thead tr:last-of-type th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th colspan=\"4\" halign=\"left\">Salary</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th>mean</th>\n",
       "      <th>sum</th>\n",
       "      <th>max</th>\n",
       "      <th>min</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Department</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>HR</th>\n",
       "      <td>72000.0</td>\n",
       "      <td>216000</td>\n",
       "      <td>75000</td>\n",
       "      <td>70000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>IT</th>\n",
       "      <td>85000.0</td>\n",
       "      <td>255000</td>\n",
       "      <td>90000</td>\n",
       "      <td>80000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             Salary                      \n",
       "               mean     sum    max    min\n",
       "Department                               \n",
       "HR          72000.0  216000  75000  70000\n",
       "IT          85000.0  255000  90000  80000"
      ]
     },
     "execution_count": 116,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "company_df.groupby('Department').agg({\n",
    "    'Salary': ['mean', 'sum', 'max', 'min']})"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can group the data by `'Department'` and `'JobTitle'` and calculate the mean salary."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 117,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Department  JobTitle \n",
       "HR          Employee     70500.0\n",
       "            Manager      75000.0\n",
       "IT          Developer    82500.0\n",
       "            Manager      90000.0\n",
       "Name: Salary, dtype: float64"
      ]
     },
     "execution_count": 117,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "company_df.groupby(['Department', 'JobTitle'])['Salary'].mean()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 7.8 Dealing With Missing Values: Boolean Indexing <a id=\"7.8-dealing-with-missing-values:-boolean-indexing\"/>\n",
    "\n",
    "Two main options for dealing with missing values in a DataFrame include:\n",
    "\n",
    "* Fill the missing values with some other values.\n",
    "* Remove the observations with missing values.\n",
    "    \n",
    "Here we will adopt the second approach and **exclude missing values** from our primary analyses. Additional examples on dealing with missing values will be presented in the lecture on Data Exploration and Preprocessing.\n",
    "\n",
    "To select only the rows with the missing data for `'unemployment_rate'`, we will use *boolean indexing* to filter the data. Recall from the previous section that `unemployment['unemployment_rate'].isnull()` produces an array of Boolean values, which we used when counting the number of missing values, shown in the next cell."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 118,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    False\n",
       "1    False\n",
       "2    False\n",
       "3    False\n",
       "4    False\n",
       "5    False\n",
       "6    False\n",
       "7    False\n",
       "8    False\n",
       "9    False\n",
       "Name: unemployment_rate, dtype: bool"
      ]
     },
     "execution_count": 118,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "unemployment['unemployment_rate'].isnull()[:10]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We will first save the missing data into a new DataFrame, in case we need that data later. To create a new DataFrame that we will call `unemployment_rate_missing`, we will index `unemployment` with the Boolean array above. This returns only the rows where the value in the array is `True`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 119,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>country</th>\n",
       "      <th>seasonality</th>\n",
       "      <th>year_month</th>\n",
       "      <th>year</th>\n",
       "      <th>month</th>\n",
       "      <th>unemployment</th>\n",
       "      <th>unemployment_rate</th>\n",
       "      <th>name_en</th>\n",
       "      <th>unemployment_rate_null</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1293</th>\n",
       "      <td>bg</td>\n",
       "      <td>nsa</td>\n",
       "      <td>1995.01</td>\n",
       "      <td>1995</td>\n",
       "      <td>1</td>\n",
       "      <td>391000</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Bulgaria</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1294</th>\n",
       "      <td>bg</td>\n",
       "      <td>nsa</td>\n",
       "      <td>1995.02</td>\n",
       "      <td>1995</td>\n",
       "      <td>2</td>\n",
       "      <td>387000</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Bulgaria</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1295</th>\n",
       "      <td>bg</td>\n",
       "      <td>nsa</td>\n",
       "      <td>1995.03</td>\n",
       "      <td>1995</td>\n",
       "      <td>3</td>\n",
       "      <td>378000</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Bulgaria</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1296</th>\n",
       "      <td>bg</td>\n",
       "      <td>nsa</td>\n",
       "      <td>1995.04</td>\n",
       "      <td>1995</td>\n",
       "      <td>4</td>\n",
       "      <td>365000</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Bulgaria</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1297</th>\n",
       "      <td>bg</td>\n",
       "      <td>nsa</td>\n",
       "      <td>1995.05</td>\n",
       "      <td>1995</td>\n",
       "      <td>5</td>\n",
       "      <td>346000</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Bulgaria</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     country seasonality  year_month  year  month  unemployment  \\\n",
       "1293      bg         nsa     1995.01  1995      1        391000   \n",
       "1294      bg         nsa     1995.02  1995      2        387000   \n",
       "1295      bg         nsa     1995.03  1995      3        378000   \n",
       "1296      bg         nsa     1995.04  1995      4        365000   \n",
       "1297      bg         nsa     1995.05  1995      5        346000   \n",
       "\n",
       "      unemployment_rate   name_en  unemployment_rate_null  \n",
       "1293                NaN  Bulgaria                    True  \n",
       "1294                NaN  Bulgaria                    True  \n",
       "1295                NaN  Bulgaria                    True  \n",
       "1296                NaN  Bulgaria                    True  \n",
       "1297                NaN  Bulgaria                    True  "
      ]
     },
     "execution_count": 119,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "unemployment_rate_missing = unemployment[unemployment['unemployment_rate'].isnull()]\n",
    "unemployment_rate_missing.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "It is also possible to specify multiple conditions using the `&` operator, but each condition needs to be inside of parentheses. \n",
    "\n",
    "Now, to remove the missing data in `unemployment`, we can use the `.dropna()` method. This method drops all observations for which `unemployment_rate == NaN`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 120,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>country</th>\n",
       "      <th>seasonality</th>\n",
       "      <th>year_month</th>\n",
       "      <th>year</th>\n",
       "      <th>month</th>\n",
       "      <th>unemployment</th>\n",
       "      <th>unemployment_rate</th>\n",
       "      <th>name_en</th>\n",
       "      <th>unemployment_rate_null</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>at</td>\n",
       "      <td>nsa</td>\n",
       "      <td>1993.01</td>\n",
       "      <td>1993</td>\n",
       "      <td>1</td>\n",
       "      <td>171000</td>\n",
       "      <td>4.5</td>\n",
       "      <td>Austria</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>at</td>\n",
       "      <td>nsa</td>\n",
       "      <td>1993.02</td>\n",
       "      <td>1993</td>\n",
       "      <td>2</td>\n",
       "      <td>175000</td>\n",
       "      <td>4.6</td>\n",
       "      <td>Austria</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>at</td>\n",
       "      <td>nsa</td>\n",
       "      <td>1993.03</td>\n",
       "      <td>1993</td>\n",
       "      <td>3</td>\n",
       "      <td>166000</td>\n",
       "      <td>4.4</td>\n",
       "      <td>Austria</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>at</td>\n",
       "      <td>nsa</td>\n",
       "      <td>1993.05</td>\n",
       "      <td>1993</td>\n",
       "      <td>5</td>\n",
       "      <td>147000</td>\n",
       "      <td>3.9</td>\n",
       "      <td>Austria</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>at</td>\n",
       "      <td>nsa</td>\n",
       "      <td>1993.06</td>\n",
       "      <td>1993</td>\n",
       "      <td>6</td>\n",
       "      <td>134000</td>\n",
       "      <td>3.5</td>\n",
       "      <td>Austria</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  country seasonality  year_month  year  month  unemployment  \\\n",
       "0      at         nsa     1993.01  1993      1        171000   \n",
       "1      at         nsa     1993.02  1993      2        175000   \n",
       "2      at         nsa     1993.03  1993      3        166000   \n",
       "3      at         nsa     1993.05  1993      5        147000   \n",
       "4      at         nsa     1993.06  1993      6        134000   \n",
       "\n",
       "   unemployment_rate  name_en  unemployment_rate_null  \n",
       "0                4.5  Austria                   False  \n",
       "1                4.6  Austria                   False  \n",
       "2                4.4  Austria                   False  \n",
       "3                3.9  Austria                   False  \n",
       "4                3.5  Austria                   False  "
      ]
     },
     "execution_count": 120,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "unemployment.dropna(subset=['unemployment_rate'], inplace=True)\n",
    "unemployment.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 121,
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(14703, 9)"
      ]
     },
     "execution_count": 121,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Check the shape of the modified DataFrame\n",
    "unemployment.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Sorting Values\n",
    "\n",
    "If we want to know what the highest unemployment rates were, we can use the `.sort_values()` method to *sort the data.*\n",
    "\n",
    "The code in the next cell sorted the data in *descending* order, and printed the first 10 rows."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 122,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>country</th>\n",
       "      <th>seasonality</th>\n",
       "      <th>year_month</th>\n",
       "      <th>year</th>\n",
       "      <th>month</th>\n",
       "      <th>unemployment</th>\n",
       "      <th>unemployment_rate</th>\n",
       "      <th>name_en</th>\n",
       "      <th>unemployment_rate_null</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>11677</th>\n",
       "      <td>pl</td>\n",
       "      <td>nsa</td>\n",
       "      <td>2004.02</td>\n",
       "      <td>2004</td>\n",
       "      <td>2</td>\n",
       "      <td>3531000</td>\n",
       "      <td>20.9</td>\n",
       "      <td>Poland</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11665</th>\n",
       "      <td>pl</td>\n",
       "      <td>nsa</td>\n",
       "      <td>2003.02</td>\n",
       "      <td>2003</td>\n",
       "      <td>2</td>\n",
       "      <td>3460000</td>\n",
       "      <td>20.7</td>\n",
       "      <td>Poland</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11676</th>\n",
       "      <td>pl</td>\n",
       "      <td>nsa</td>\n",
       "      <td>2004.01</td>\n",
       "      <td>2004</td>\n",
       "      <td>1</td>\n",
       "      <td>3520000</td>\n",
       "      <td>20.7</td>\n",
       "      <td>Poland</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11664</th>\n",
       "      <td>pl</td>\n",
       "      <td>nsa</td>\n",
       "      <td>2003.01</td>\n",
       "      <td>2003</td>\n",
       "      <td>1</td>\n",
       "      <td>3466000</td>\n",
       "      <td>20.6</td>\n",
       "      <td>Poland</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11678</th>\n",
       "      <td>pl</td>\n",
       "      <td>nsa</td>\n",
       "      <td>2004.03</td>\n",
       "      <td>2004</td>\n",
       "      <td>3</td>\n",
       "      <td>3475000</td>\n",
       "      <td>20.6</td>\n",
       "      <td>Poland</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11654</th>\n",
       "      <td>pl</td>\n",
       "      <td>nsa</td>\n",
       "      <td>2002.03</td>\n",
       "      <td>2002</td>\n",
       "      <td>3</td>\n",
       "      <td>3509000</td>\n",
       "      <td>20.5</td>\n",
       "      <td>Poland</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11653</th>\n",
       "      <td>pl</td>\n",
       "      <td>nsa</td>\n",
       "      <td>2002.02</td>\n",
       "      <td>2002</td>\n",
       "      <td>2</td>\n",
       "      <td>3492000</td>\n",
       "      <td>20.4</td>\n",
       "      <td>Poland</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11924</th>\n",
       "      <td>pl</td>\n",
       "      <td>trend</td>\n",
       "      <td>2002.09</td>\n",
       "      <td>2002</td>\n",
       "      <td>9</td>\n",
       "      <td>3500000</td>\n",
       "      <td>20.4</td>\n",
       "      <td>Poland</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11925</th>\n",
       "      <td>pl</td>\n",
       "      <td>trend</td>\n",
       "      <td>2002.10</td>\n",
       "      <td>2002</td>\n",
       "      <td>10</td>\n",
       "      <td>3483000</td>\n",
       "      <td>20.4</td>\n",
       "      <td>Poland</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11923</th>\n",
       "      <td>pl</td>\n",
       "      <td>trend</td>\n",
       "      <td>2002.08</td>\n",
       "      <td>2002</td>\n",
       "      <td>8</td>\n",
       "      <td>3503000</td>\n",
       "      <td>20.4</td>\n",
       "      <td>Poland</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      country seasonality  year_month  year  month  unemployment  \\\n",
       "11677      pl         nsa     2004.02  2004      2       3531000   \n",
       "11665      pl         nsa     2003.02  2003      2       3460000   \n",
       "11676      pl         nsa     2004.01  2004      1       3520000   \n",
       "11664      pl         nsa     2003.01  2003      1       3466000   \n",
       "11678      pl         nsa     2004.03  2004      3       3475000   \n",
       "11654      pl         nsa     2002.03  2002      3       3509000   \n",
       "11653      pl         nsa     2002.02  2002      2       3492000   \n",
       "11924      pl       trend     2002.09  2002      9       3500000   \n",
       "11925      pl       trend     2002.10  2002     10       3483000   \n",
       "11923      pl       trend     2002.08  2002      8       3503000   \n",
       "\n",
       "       unemployment_rate name_en  unemployment_rate_null  \n",
       "11677               20.9  Poland                   False  \n",
       "11665               20.7  Poland                   False  \n",
       "11676               20.7  Poland                   False  \n",
       "11664               20.6  Poland                   False  \n",
       "11678               20.6  Poland                   False  \n",
       "11654               20.5  Poland                   False  \n",
       "11653               20.4  Poland                   False  \n",
       "11924               20.4  Poland                   False  \n",
       "11925               20.4  Poland                   False  \n",
       "11923               20.4  Poland                   False  "
      ]
     },
     "execution_count": 122,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "unemployment.sort_values('unemployment_rate', ascending=False)[:10]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here is another example for sorting the `score_df` DataFrame by `'age'`. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 123,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>age</th>\n",
       "      <th>score</th>\n",
       "      <th>rt</th>\n",
       "      <th>group</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>17</td>\n",
       "      <td>12</td>\n",
       "      <td>3.552</td>\n",
       "      <td>test</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>19</td>\n",
       "      <td>10</td>\n",
       "      <td>1.624</td>\n",
       "      <td>test</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>21</td>\n",
       "      <td>11</td>\n",
       "      <td>6.431</td>\n",
       "      <td>test</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>37</td>\n",
       "      <td>15</td>\n",
       "      <td>7.132</td>\n",
       "      <td>test</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>18</td>\n",
       "      <td>16</td>\n",
       "      <td>2.925</td>\n",
       "      <td>test</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>19</td>\n",
       "      <td>14</td>\n",
       "      <td>4.662</td>\n",
       "      <td>control</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>47</td>\n",
       "      <td>25</td>\n",
       "      <td>3.634</td>\n",
       "      <td>control</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>18</td>\n",
       "      <td>21</td>\n",
       "      <td>3.635</td>\n",
       "      <td>control</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>19</td>\n",
       "      <td>29</td>\n",
       "      <td>5.234</td>\n",
       "      <td>control</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   age  score     rt    group\n",
       "0   17     12  3.552     test\n",
       "1   19     10  1.624     test\n",
       "2   21     11  6.431     test\n",
       "3   37     15  7.132     test\n",
       "4   18     16  2.925     test\n",
       "5   19     14  4.662  control\n",
       "6   47     25  3.634  control\n",
       "7   18     21  3.635  control\n",
       "8   19     29  5.234  control"
      ]
     },
     "execution_count": 123,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "score_df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 124,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>age</th>\n",
       "      <th>score</th>\n",
       "      <th>rt</th>\n",
       "      <th>group</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>17</td>\n",
       "      <td>12</td>\n",
       "      <td>3.552</td>\n",
       "      <td>test</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>18</td>\n",
       "      <td>16</td>\n",
       "      <td>2.925</td>\n",
       "      <td>test</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>18</td>\n",
       "      <td>21</td>\n",
       "      <td>3.635</td>\n",
       "      <td>control</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>19</td>\n",
       "      <td>10</td>\n",
       "      <td>1.624</td>\n",
       "      <td>test</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>19</td>\n",
       "      <td>14</td>\n",
       "      <td>4.662</td>\n",
       "      <td>control</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>19</td>\n",
       "      <td>29</td>\n",
       "      <td>5.234</td>\n",
       "      <td>control</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>21</td>\n",
       "      <td>11</td>\n",
       "      <td>6.431</td>\n",
       "      <td>test</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>37</td>\n",
       "      <td>15</td>\n",
       "      <td>7.132</td>\n",
       "      <td>test</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>47</td>\n",
       "      <td>25</td>\n",
       "      <td>3.634</td>\n",
       "      <td>control</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   age  score     rt    group\n",
       "0   17     12  3.552     test\n",
       "4   18     16  2.925     test\n",
       "7   18     21  3.635  control\n",
       "1   19     10  1.624     test\n",
       "5   19     14  4.662  control\n",
       "8   19     29  5.234  control\n",
       "2   21     11  6.431     test\n",
       "3   37     15  7.132     test\n",
       "6   47     25  3.634  control"
      ]
     },
     "execution_count": 124,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "score_df.sort_values('age')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The syntax shown in the next cell can also be used. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 125,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>age</th>\n",
       "      <th>score</th>\n",
       "      <th>rt</th>\n",
       "      <th>group</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>19</td>\n",
       "      <td>10</td>\n",
       "      <td>1.624</td>\n",
       "      <td>test</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>21</td>\n",
       "      <td>11</td>\n",
       "      <td>6.431</td>\n",
       "      <td>test</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>17</td>\n",
       "      <td>12</td>\n",
       "      <td>3.552</td>\n",
       "      <td>test</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>19</td>\n",
       "      <td>14</td>\n",
       "      <td>4.662</td>\n",
       "      <td>control</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>37</td>\n",
       "      <td>15</td>\n",
       "      <td>7.132</td>\n",
       "      <td>test</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>18</td>\n",
       "      <td>16</td>\n",
       "      <td>2.925</td>\n",
       "      <td>test</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>18</td>\n",
       "      <td>21</td>\n",
       "      <td>3.635</td>\n",
       "      <td>control</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>47</td>\n",
       "      <td>25</td>\n",
       "      <td>3.634</td>\n",
       "      <td>control</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>19</td>\n",
       "      <td>29</td>\n",
       "      <td>5.234</td>\n",
       "      <td>control</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   age  score     rt    group\n",
       "1   19     10  1.624     test\n",
       "2   21     11  6.431     test\n",
       "0   17     12  3.552     test\n",
       "5   19     14  4.662  control\n",
       "3   37     15  7.132     test\n",
       "4   18     16  2.925     test\n",
       "7   18     21  3.635  control\n",
       "6   47     25  3.634  control\n",
       "8   19     29  5.234  control"
      ]
     },
     "execution_count": 125,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "score_df.sort_values(by=['score'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In the following code, we sort the data by two columns. Note that the data is first sorted by `'age'`, and for the rows where the age is the same (e.g., 18 or 19), the data is sorted in ascending order based on `'score'`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 126,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>age</th>\n",
       "      <th>score</th>\n",
       "      <th>rt</th>\n",
       "      <th>group</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>17</td>\n",
       "      <td>12</td>\n",
       "      <td>3.552</td>\n",
       "      <td>test</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>18</td>\n",
       "      <td>16</td>\n",
       "      <td>2.925</td>\n",
       "      <td>test</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>18</td>\n",
       "      <td>21</td>\n",
       "      <td>3.635</td>\n",
       "      <td>control</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>19</td>\n",
       "      <td>10</td>\n",
       "      <td>1.624</td>\n",
       "      <td>test</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>19</td>\n",
       "      <td>14</td>\n",
       "      <td>4.662</td>\n",
       "      <td>control</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>19</td>\n",
       "      <td>29</td>\n",
       "      <td>5.234</td>\n",
       "      <td>control</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>21</td>\n",
       "      <td>11</td>\n",
       "      <td>6.431</td>\n",
       "      <td>test</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>37</td>\n",
       "      <td>15</td>\n",
       "      <td>7.132</td>\n",
       "      <td>test</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>47</td>\n",
       "      <td>25</td>\n",
       "      <td>3.634</td>\n",
       "      <td>control</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   age  score     rt    group\n",
       "0   17     12  3.552     test\n",
       "4   18     16  2.925     test\n",
       "7   18     21  3.635  control\n",
       "1   19     10  1.624     test\n",
       "5   19     14  4.662  control\n",
       "8   19     29  5.234  control\n",
       "2   21     11  6.431     test\n",
       "3   37     15  7.132     test\n",
       "6   47     25  3.634  control"
      ]
     },
     "execution_count": 126,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "score_df.sort_values(['age', 'score'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Several additional functionalities of `pandas` will be described in the next lectures."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 7.9 Exporting A DataFrame to csv <a id=\"7.9-exporting-a-dataFrame-to-csv\"/>\n",
    "\n",
    "To save the last DataFrame as a .csv file, we can use the `.to_csv()` method."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 127,
   "metadata": {},
   "outputs": [],
   "source": [
    "unemployment.to_csv('data/unemployment.csv')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The file will be saved in the `data` directory. \n",
    "\n",
    "By default, this method writes the indices in the column 0 (i.e., row labels). We probably don't want a column 0 with indices to be added, and we can set `index` to `False`. We can also specify the type of delimiter that we want to use, such as commas `(,)`, pipes (`|`), semicolons (`;`), tabs (`\\t`), etc."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 128,
   "metadata": {},
   "outputs": [],
   "source": [
    "unemployment.to_csv('data/unemployment.csv', index=False, sep=',')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## References <a id=\"references\"/>\n",
    "\n",
    "1. Introduction to Pandas, Python Data Wrangling by D-Lab at UC Berkley, available at: [https://github.com/dlab-berkeley/introduction-to-pandas](https://github.com/dlab-berkeley/introduction-to-pandas).\n",
    "2. Pandas documentation, available at: [https://pandas.pydata.org/pandas-docs/stable/](https://pandas.pydata.org/pandas-docs/stable/).\n",
    "3. Learning Statistics with Python - Data Wrangling, available at: [https://ethanweed.github.io/pythonbook/03.03-pragmatic_matters.html](https://ethanweed.github.io/pythonbook/03.03-pragmatic_matters.html)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "[BACK TO TOP](#top)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}