{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "0",
   "metadata": {},
   "source": [
    "# pymovements in 10 minutes"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1",
   "metadata": {},
   "source": [
    "## What you will learn in this tutorial:\n",
    "\n",
    "* how to download one of the publicly available datasets\n",
    "* how to load a subset of the data into your memory\n",
    "* how to transform pixel coordinates into degrees of visual angle\n",
    "* how to transform positional data into velocity data\n",
    "* how to detect fixations by using the I-VT algorithm\n",
    "* how to detect saccades by using the microsaccades algorithm\n",
    "* how to compute additional event properties for your analysis\n",
    "* how to save your preprocessed data\n",
    "* how to plot the main saccadic sequence from your data"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2",
   "metadata": {},
   "source": [
    "## Downloading one of the public datasets"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3",
   "metadata": {},
   "source": [
    "We import `pymovements` as the alias `pm` for convenience."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4",
   "metadata": {},
   "outputs": [],
   "source": [
    "import polars as pl\n",
    "\n",
    "import pymovements as pm"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5",
   "metadata": {},
   "source": [
    " "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6",
   "metadata": {},
   "source": [
    "pymovements provides a library of publicly available datasets.\n",
    "\n",
    "You can browse through the available dataset definitions here:\n",
    "[Datasets](https://pymovements.readthedocs.io/en/latest/reference/pymovements.datasets.html#module-pymovements.datasets)\n",
    "\n",
    "For this tutorial we will limit ourselves to the `ToyDataset` due to its minimal space requirements.\n",
    "\n",
    "Other datasets can be downloaded by simply replacing `ToyDataset` with one of the other available datasets.\n",
    "\n",
    "We can initialize and download by passing the desired dataset name as a string argument.\n",
    "\n",
    "Additionally we need the root directory path of your data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7",
   "metadata": {},
   "outputs": [],
   "source": [
    "dataset = pm.Dataset('ToyDataset', path='data/ToyDataset')\n",
    "dataset.download()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8",
   "metadata": {},
   "source": [
    "Our downloaded dataset will be placed in new a directory with the name of the dataset:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9",
   "metadata": {},
   "outputs": [],
   "source": [
    "dataset.path"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "10",
   "metadata": {},
   "source": [
    "Archive files are automatically extracted into the path specified by `Dataset.paths.raw`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "11",
   "metadata": {},
   "outputs": [],
   "source": [
    "dataset.paths.raw"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "12",
   "metadata": {},
   "source": [
    "## Loading in your data into memory"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "13",
   "metadata": {},
   "source": [
    "Next we load our dataset into memory to be able to work with it:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "14",
   "metadata": {},
   "outputs": [],
   "source": [
    "dataset.load()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "15",
   "metadata": {},
   "source": [
    "This way we fill two attributes with data.\n",
    "First we have the `fileinfo` attribute which holds all the basic information for files:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "16",
   "metadata": {},
   "outputs": [],
   "source": [
    "dataset.fileinfo.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "17",
   "metadata": {},
   "source": [
    "We notice that for each filepath a `text_id` and `page_id` is specified.\n",
    "\n",
    "We have also loaded our gaze data into the dataframes in the `gaze` attribute:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "18",
   "metadata": {},
   "outputs": [],
   "source": [
    "dataset.gaze[0].frame.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "19",
   "metadata": {},
   "source": [
    "Apart from some trial identifier columns we see the columns `time` and `pixel`.\n",
    "\n",
    "The last two columns refer to the pixel coordinates at the timestep specified by `time`.\n",
    "\n",
    "\n",
    "We are also able to just take a subset of the data by specifying values of the fileinfo columns.\n",
    "The key refers to the column in the `fileinfo` dataframe.\n",
    "The values in the dictionary can be of type `bool`, `int`,  `float` or `str`, but also lists and ranges \n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "20",
   "metadata": {},
   "outputs": [],
   "source": [
    "subset = {\n",
    "    'text_id': 0,\n",
    "    'page_id': [0, 1],\n",
    "}\n",
    "dataset.load(subset=subset)\n",
    "\n",
    "dataset.fileinfo"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "21",
   "metadata": {},
   "source": [
    "Now we selected only a small subset of our data."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "22",
   "metadata": {},
   "source": [
    "## Preprocessing raw gaze data\n",
    "\n",
    "We now want to preprocess our gaze data by transforming pixel coordinates into degrees of visual angle and then computing velocity data from our positional data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "23",
   "metadata": {},
   "outputs": [],
   "source": [
    "dataset.pix2deg()\n",
    "\n",
    "dataset.gaze[0].frame.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "24",
   "metadata": {},
   "source": [
    "We notice that a new column has appeared: `position`.\n",
    "This column specifies the position coordinates in degrees of visual angle (dva)."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "25",
   "metadata": {},
   "source": [
    "For transforming our positional data into velocity data we will use the *Savitzky-Golay* differentiation filter.\n",
    "\n",
    "We can also specify some additional parameters for this method:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "26",
   "metadata": {},
   "outputs": [],
   "source": [
    "dataset.pos2vel(method='savitzky_golay', degree=2, window_length=7)\n",
    "\n",
    "dataset.gaze[0].frame.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "27",
   "metadata": {},
   "source": [
    "There is also the more general apply() method, which can be used to apply both transformation and event detection methods."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "28",
   "metadata": {},
   "outputs": [],
   "source": [
    "dataset.apply('pos2acc', degree=2, window_length=7)\n",
    "\n",
    "dataset.gaze[0].frame.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "29",
   "metadata": {},
   "source": [
    "## Detecting events"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "30",
   "metadata": {},
   "source": [
    "Now let's detect some events.\n",
    "\n",
    "First we will detect fixations using the I-VT algorithm using its default parameters:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "31",
   "metadata": {},
   "outputs": [],
   "source": [
    "dataset.detect_events('ivt')\n",
    "\n",
    "dataset.events[0].frame.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "32",
   "metadata": {},
   "source": [
    "Next we detect some saccades. This time we don't use the default parameters but specify our own:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "33",
   "metadata": {},
   "outputs": [],
   "source": [
    "dataset.detect_events('microsaccades', minimum_duration=8)\n",
    "\n",
    "dataset.events[0].frame.filter(pl.col('name') == 'saccade').head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "34",
   "metadata": {},
   "source": [
    "We can also use the more general interface of the apply() method:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "35",
   "metadata": {},
   "outputs": [],
   "source": [
    "dataset.apply('idt', dispersion_threshold=2.7, name='fixation.ivt')\n",
    "\n",
    "dataset.events[0].frame.filter(pl.col('name') == 'fixation.ivt').head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "36",
   "metadata": {},
   "source": [
    "## Computing event properties"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "37",
   "metadata": {},
   "source": [
    "The event dataframe currently only holds the `name`, `onset`, `offset` and `duration` of an event (additionally we have some more identifier columns at the beginning).\n",
    "\n",
    "We now want to compute some additional properties for each event.\n",
    "Event properties are things like peak velocity, amplitude and dispersion during an event.\n",
    "\n",
    "We start out with computing the dispersion:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "38",
   "metadata": {},
   "outputs": [],
   "source": [
    "dataset.compute_event_properties(\"dispersion\")\n",
    "\n",
    "dataset.events[0].frame.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "39",
   "metadata": {},
   "source": [
    "We notice that a new column with the name `dispersion` has appeared in the event dataframe.\n",
    "\n",
    "We can also pass a list of properties to compute all of our desired properties in a single run.\n",
    "Let's add the amplitude and peak velocity:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "40",
   "metadata": {},
   "outputs": [],
   "source": [
    "dataset.compute_event_properties([\"amplitude\", \"peak_velocity\"])\n",
    "\n",
    "dataset.events[0].frame.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "41",
   "metadata": {},
   "source": [
    "## Plotting our data"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "42",
   "metadata": {},
   "source": [
    "*pymovements* provides a range of plotting functions.\n",
    "\n",
    "You can browse through the available plotting functions here:\n",
    "[Plotting](https://pymovements.readthedocs.io/en/latest/reference/pymovements.plotting.html#module-pymovements.plotting)\n",
    "\n",
    "In this this tutorial we will plot the saccadic main sequence of our data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "43",
   "metadata": {},
   "outputs": [],
   "source": [
    "pm.plotting.main_sequence_plot(dataset.events[0])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "44",
   "metadata": {},
   "source": [
    "## Saving and loading your dataframes\n",
    "\n",
    "If we want to save interim results we can simply use the `save()` method like this:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "45",
   "metadata": {},
   "outputs": [],
   "source": [
    "dataset.save()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "46",
   "metadata": {},
   "source": [
    "Let's test this out by initializing a new `PublicDataset` object in the same directory and loading in the preprocessed gaze and event data.\n",
    "\n",
    "This time we don't need to download anything."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "47",
   "metadata": {},
   "outputs": [],
   "source": [
    "preprocessed_dataset = pm.Dataset('ToyDataset', path='data/ToyDataset')\n",
    "\n",
    "dataset.load(events=True, preprocessed=True, subset=subset)\n",
    "\n",
    "display(dataset.gaze[0])\n",
    "display(dataset.events[0])"
   ]
  }
 ],
 "metadata": {},
 "nbformat": 4,
 "nbformat_minor": 5
}