{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "011263bd",
   "metadata": {},
   "source": [
    "# Saving and Loading Preprocessed Data"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7d010630",
   "metadata": {},
   "source": [
    "## What you will learn in this tutorial:\n",
    "\n",
    "* how to save your preprocessed data\n",
    "* how to load your preprocessed data"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bacf49e8",
   "metadata": {},
   "source": [
    "## Preparations"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d973b685",
   "metadata": {},
   "source": [
    "We import `pymovements` as the alias `pm` for convenience."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "33a914a5",
   "metadata": {},
   "outputs": [],
   "source": [
    "import pymovements as pm"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "26acca82",
   "metadata": {},
   "source": [
    "Let's start by downloading and extracting our `ToyDataset`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "375b5f97",
   "metadata": {},
   "outputs": [],
   "source": [
    "dataset = pm.datasets.ToyDataset(root='data/')\n",
    "dataset.download()\n",
    "dataset.extract()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e1cfa35f",
   "metadata": {},
   "source": [
    "Now let's load in the data and do some preprocessing:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5436312c",
   "metadata": {},
   "outputs": [],
   "source": [
    "dataset.load()\n",
    "dataset.pix2deg()\n",
    "dataset.pos2vel()\n",
    "\n",
    "dataset.gaze[0].frame.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7c8b2dee",
   "metadata": {},
   "source": [
    "We have now added some additional columns for degrees in visual angle and velocity."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "99ffa1b3",
   "metadata": {},
   "source": [
    "## Saving"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b757a5ec",
   "metadata": {},
   "source": [
    "Saving your preprocessed data is as simple as:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1cbab15d",
   "metadata": {},
   "outputs": [],
   "source": [
    "dataset.save_preprocessed()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "deb9f895",
   "metadata": {},
   "source": [
    "All of the preprocessed data is saved into this directory:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5aa54e2d",
   "metadata": {},
   "outputs": [],
   "source": [
    "dataset.preprocessed_rootpath"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "23ca845a",
   "metadata": {},
   "source": [
    "Let's confirm it by printing all the new files in this directory:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "743090d8",
   "metadata": {},
   "outputs": [],
   "source": [
    "print(list(dataset.preprocessed_rootpath.glob('*/*/*')))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "90645ecc",
   "metadata": {},
   "source": [
    "All of the files have been saved into the `preprocessed_rootpath` as `feather` files.\n",
    "\n",
    "If we want to save the data into an alternative directory and also use a different file format like `csv` we can use the following:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "24c0e22e",
   "metadata": {},
   "outputs": [],
   "source": [
    "dataset.save_preprocessed(preprocessed_dirname='preprocessed_csv', extension='csv')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f5e59018",
   "metadata": {},
   "source": [
    "Let's confirm again by printing all the new files in this alternative directory:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5abbb196",
   "metadata": {},
   "outputs": [],
   "source": [
    "alternative_dirpath = dataset.path / 'preprocessed_csv'\n",
    "print(list(alternative_dirpath.glob('*/*/*')))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "991cea37",
   "metadata": {},
   "source": [
    "## Loading"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "aba6cb0f",
   "metadata": {},
   "source": [
    "Now let's imagine that this preprocessing and saving was done in another file and we only want to load the preprocessed data.\n",
    "\n",
    "We simulate this by initializing a new dataset. We don't need to download any additional data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2238eab6",
   "metadata": {},
   "outputs": [],
   "source": [
    "preprocessed_dataset = pm.datasets.ToyDataset(root='data/')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0dfc1f7a",
   "metadata": {},
   "source": [
    "The preprocessed data can now simply be loaded by setting `preprocessed` to `True`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f1a3ec33",
   "metadata": {},
   "outputs": [],
   "source": [
    "preprocessed_dataset.load(preprocessed=True)\n",
    "\n",
    "dataset.gaze[0].frame.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "32618a98",
   "metadata": {},
   "source": [
    "By default, the `preprocessed` directory and the `feather` extension will be chosen."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3c4a5132",
   "metadata": {},
   "source": [
    "In case of alternative directory names or other file formats you can use the following:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e017f202",
   "metadata": {},
   "outputs": [],
   "source": [
    "preprocessed_dataset.load(\n",
    "    preprocessed=True,\n",
    "    preprocessed_dirname='preprocessed_csv',\n",
    "    extension='csv',\n",
    ")\n",
    "dataset.gaze[0].frame.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "62c67791",
   "metadata": {},
   "source": [
    " "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "eeea42fc",
   "metadata": {},
   "source": [
    "## What you have learned in this tutorial:\n",
    "\n",
    "* saving your preprocesed data using `Dataset.save_preprocessed()`\n",
    "* load your preprocesed data using `Dataset.load(preprocessed=True)`\n",
    "* using custom directory names by specifying `preprocessed_dirname`\n",
    "* using other file formats than the default `feather` format by specifying `extension`"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.9"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}