{ "cells": [ { "cell_type": "markdown", "id": "011263bd", "metadata": {}, "source": [ "# Saving and Loading Preprocessed Data" ] }, { "cell_type": "markdown", "id": "7d010630", "metadata": {}, "source": [ "## What you will learn in this tutorial:\n", "\n", "* how to save your preprocessed data\n", "* how to load your preprocessed data" ] }, { "cell_type": "markdown", "id": "bacf49e8", "metadata": {}, "source": [ "## Preparations" ] }, { "cell_type": "markdown", "id": "d973b685", "metadata": {}, "source": [ "We import `pymovements` as the alias `pm` for convenience." ] }, { "cell_type": "code", "execution_count": null, "id": "33a914a5", "metadata": {}, "outputs": [], "source": [ "import pymovements as pm" ] }, { "cell_type": "markdown", "id": "26acca82", "metadata": {}, "source": [ "Let's start by downloading and extracting our `ToyDataset`:" ] }, { "cell_type": "code", "execution_count": null, "id": "375b5f97", "metadata": {}, "outputs": [], "source": [ "dataset = pm.datasets.ToyDataset(root='data/')\n", "dataset.download()\n", "dataset.extract()" ] }, { "cell_type": "markdown", "id": "e1cfa35f", "metadata": {}, "source": [ "Now let's load in the data and do some preprocessing:" ] }, { "cell_type": "code", "execution_count": null, "id": "5436312c", "metadata": {}, "outputs": [], "source": [ "dataset.load()\n", "dataset.pix2deg()\n", "dataset.pos2vel()\n", "\n", "dataset.gaze[0].frame.head()" ] }, { "cell_type": "markdown", "id": "7c8b2dee", "metadata": {}, "source": [ "We have now added some additional columns for degrees in visual angle and velocity." ] }, { "cell_type": "markdown", "id": "99ffa1b3", "metadata": {}, "source": [ "## Saving" ] }, { "cell_type": "markdown", "id": "b757a5ec", "metadata": {}, "source": [ "Saving your preprocessed data is as simple as:" ] }, { "cell_type": "code", "execution_count": null, "id": "1cbab15d", "metadata": {}, "outputs": [], "source": [ "dataset.save_preprocessed()" ] }, { "cell_type": "markdown", "id": "deb9f895", "metadata": {}, "source": [ "All of the preprocessed data is saved into this directory:" ] }, { "cell_type": "code", "execution_count": null, "id": "5aa54e2d", "metadata": {}, "outputs": [], "source": [ "dataset.preprocessed_rootpath" ] }, { "cell_type": "markdown", "id": "23ca845a", "metadata": {}, "source": [ "Let's confirm it by printing all the new files in this directory:" ] }, { "cell_type": "code", "execution_count": null, "id": "743090d8", "metadata": {}, "outputs": [], "source": [ "print(list(dataset.preprocessed_rootpath.glob('*/*/*')))" ] }, { "cell_type": "markdown", "id": "90645ecc", "metadata": {}, "source": [ "All of the files have been saved into the `preprocessed_rootpath` as `feather` files.\n", "\n", "If we want to save the data into an alternative directory and also use a different file format like `csv` we can use the following:" ] }, { "cell_type": "code", "execution_count": null, "id": "24c0e22e", "metadata": {}, "outputs": [], "source": [ "dataset.save_preprocessed(preprocessed_dirname='preprocessed_csv', extension='csv')" ] }, { "cell_type": "markdown", "id": "f5e59018", "metadata": {}, "source": [ "Let's confirm again by printing all the new files in this alternative directory:" ] }, { "cell_type": "code", "execution_count": null, "id": "5abbb196", "metadata": {}, "outputs": [], "source": [ "alternative_dirpath = dataset.path / 'preprocessed_csv'\n", "print(list(alternative_dirpath.glob('*/*/*')))" ] }, { "cell_type": "markdown", "id": "991cea37", "metadata": {}, "source": [ "## Loading" ] }, { "cell_type": "markdown", "id": "aba6cb0f", "metadata": {}, "source": [ "Now let's imagine that this preprocessing and saving was done in another file and we only want to load the preprocessed data.\n", "\n", "We simulate this by initializing a new dataset. We don't need to download any additional data." ] }, { "cell_type": "code", "execution_count": null, "id": "2238eab6", "metadata": {}, "outputs": [], "source": [ "preprocessed_dataset = pm.datasets.ToyDataset(root='data/')" ] }, { "cell_type": "markdown", "id": "0dfc1f7a", "metadata": {}, "source": [ "The preprocessed data can now simply be loaded by setting `preprocessed` to `True`:" ] }, { "cell_type": "code", "execution_count": null, "id": "f1a3ec33", "metadata": {}, "outputs": [], "source": [ "preprocessed_dataset.load(preprocessed=True)\n", "\n", "dataset.gaze[0].frame.head()" ] }, { "cell_type": "markdown", "id": "32618a98", "metadata": {}, "source": [ "By default, the `preprocessed` directory and the `feather` extension will be chosen." ] }, { "cell_type": "markdown", "id": "3c4a5132", "metadata": {}, "source": [ "In case of alternative directory names or other file formats you can use the following:" ] }, { "cell_type": "code", "execution_count": null, "id": "e017f202", "metadata": {}, "outputs": [], "source": [ "preprocessed_dataset.load(\n", " preprocessed=True,\n", " preprocessed_dirname='preprocessed_csv',\n", " extension='csv',\n", ")\n", "dataset.gaze[0].frame.head()" ] }, { "cell_type": "markdown", "id": "62c67791", "metadata": {}, "source": [ " " ] }, { "cell_type": "markdown", "id": "eeea42fc", "metadata": {}, "source": [ "## What you have learned in this tutorial:\n", "\n", "* saving your preprocesed data using `Dataset.save_preprocessed()`\n", "* load your preprocesed data using `Dataset.load(preprocessed=True)`\n", "* using custom directory names by specifying `preprocessed_dirname`\n", "* using other file formats than the default `feather` format by specifying `extension`" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.9" } }, "nbformat": 4, "nbformat_minor": 5 }