Saving and Loading Preprocessed Data#
What you will learn in this tutorial:#
how to save your preprocessed data
how to load your preprocessed data
Preparations#
We import pymovements as the alias pm for convenience.
[1]:
import pymovements as pm
/home/docs/checkouts/readthedocs.org/user_builds/pymovements/envs/v0.7.0/lib/python3.9/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
from .autonotebook import tqdm as notebook_tqdm
Let’s start by downloading and extracting our ToyDataset:
[2]:
dataset = pm.datasets.ToyDataset(root='data/')
dataset.download()
dataset.extract()
Using already downloaded and verified file: data/ToyDataset/downloads/pymovements-toy-dataset.zip
Now let’s load in the data and do some preprocessing:
[3]:
dataset.load()
dataset.pix2deg()
dataset.pos2vel()
dataset.gaze[0].frame.head()
100%|██████████| 20/20 [00:00<00:00, 202.53it/s]
100%|██████████| 20/20 [00:00<00:00, 795.99it/s]
100%|██████████| 20/20 [00:00<00:00, 662.99it/s]
[3]:
| text_id | page_id | time | x_right_pix | y_right_pix | y_right_pos | x_right_pos | y_right_vel | x_right_vel |
|---|---|---|---|---|---|---|---|---|
| i64 | i64 | f64 | f64 | f64 | f64 | f64 | f64 | f64 |
| 0 | 1 | 1.988145e6 | 206.8 | 152.4 | -12.005591 | -7.528075 | -3.589697 | 1.221164 |
| 0 | 1 | 1.988146e6 | 206.9 | 152.1 | -12.01277 | -7.525633 | -7.179203 | 2.442343 |
| 0 | 1 | 1.988147e6 | 207.0 | 151.8 | -12.019949 | -7.52319 | -5.184827 | 1.628238 |
| 0 | 1 | 1.988148e6 | 207.1 | 151.7 | -12.022342 | -7.520748 | -4.386968 | 0.407059 |
| 0 | 1 | 1.988149e6 | 207.0 | 151.5 | -12.027128 | -7.52319 | -3.190445 | 0.407069 |
We have now added some additional columns for degrees in visual angle and velocity.
Saving#
Saving your preprocessed data is as simple as:
[4]:
dataset.save_preprocessed()
100%|██████████| 20/20 [00:00<00:00, 587.75it/s]
All of the preprocessed data is saved into this directory:
[5]:
dataset.preprocessed_rootpath
[5]:
PosixPath('data/ToyDataset/preprocessed')
Let’s confirm it by printing all the new files in this directory:
[6]:
print(list(dataset.preprocessed_rootpath.glob('*/*/*')))
[PosixPath('data/ToyDataset/preprocessed/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_0_1.feather'), PosixPath('data/ToyDataset/preprocessed/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_1_3.feather'), PosixPath('data/ToyDataset/preprocessed/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_2_2.feather'), PosixPath('data/ToyDataset/preprocessed/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_3_3.feather'), PosixPath('data/ToyDataset/preprocessed/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_0_3.feather'), PosixPath('data/ToyDataset/preprocessed/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_2_5.feather'), PosixPath('data/ToyDataset/preprocessed/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_0_4.feather'), PosixPath('data/ToyDataset/preprocessed/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_2_4.feather'), PosixPath('data/ToyDataset/preprocessed/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_3_2.feather'), PosixPath('data/ToyDataset/preprocessed/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_2_1.feather'), PosixPath('data/ToyDataset/preprocessed/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_1_2.feather'), PosixPath('data/ToyDataset/preprocessed/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_3_4.feather'), PosixPath('data/ToyDataset/preprocessed/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_3_5.feather'), PosixPath('data/ToyDataset/preprocessed/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_1_1.feather'), PosixPath('data/ToyDataset/preprocessed/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_2_3.feather'), PosixPath('data/ToyDataset/preprocessed/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_1_5.feather'), PosixPath('data/ToyDataset/preprocessed/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_0_5.feather'), PosixPath('data/ToyDataset/preprocessed/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_0_2.feather'), PosixPath('data/ToyDataset/preprocessed/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_3_1.feather'), PosixPath('data/ToyDataset/preprocessed/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_1_4.feather')]
All of the files have been saved into the preprocessed_rootpath as feather files.
If we want to save the data into an alternative directory and also use a different file format like csv we can use the following:
[7]:
dataset.save_preprocessed(preprocessed_dirname='preprocessed_csv', extension='csv')
100%|██████████| 20/20 [00:00<00:00, 56.87it/s]
Let’s confirm again by printing all the new files in this alternative directory:
[8]:
alternative_dirpath = dataset.path / 'preprocessed_csv'
print(list(alternative_dirpath.glob('*/*/*')))
[PosixPath('data/ToyDataset/preprocessed_csv/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_3_2.csv'), PosixPath('data/ToyDataset/preprocessed_csv/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_0_4.csv'), PosixPath('data/ToyDataset/preprocessed_csv/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_0_5.csv'), PosixPath('data/ToyDataset/preprocessed_csv/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_0_1.csv'), PosixPath('data/ToyDataset/preprocessed_csv/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_1_2.csv'), PosixPath('data/ToyDataset/preprocessed_csv/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_3_5.csv'), PosixPath('data/ToyDataset/preprocessed_csv/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_2_2.csv'), PosixPath('data/ToyDataset/preprocessed_csv/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_2_5.csv'), PosixPath('data/ToyDataset/preprocessed_csv/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_1_4.csv'), PosixPath('data/ToyDataset/preprocessed_csv/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_1_5.csv'), PosixPath('data/ToyDataset/preprocessed_csv/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_0_2.csv'), PosixPath('data/ToyDataset/preprocessed_csv/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_3_3.csv'), PosixPath('data/ToyDataset/preprocessed_csv/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_1_3.csv'), PosixPath('data/ToyDataset/preprocessed_csv/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_0_3.csv'), PosixPath('data/ToyDataset/preprocessed_csv/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_2_3.csv'), PosixPath('data/ToyDataset/preprocessed_csv/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_2_1.csv'), PosixPath('data/ToyDataset/preprocessed_csv/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_3_4.csv'), PosixPath('data/ToyDataset/preprocessed_csv/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_1_1.csv'), PosixPath('data/ToyDataset/preprocessed_csv/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_2_4.csv'), PosixPath('data/ToyDataset/preprocessed_csv/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_3_1.csv')]
Loading#
Now let’s imagine that this preprocessing and saving was done in another file and we only want to load the preprocessed data.
We simulate this by initializing a new dataset. We don’t need to download any additional data.
[9]:
preprocessed_dataset = pm.datasets.ToyDataset(root='data/')
The preprocessed data can now simply be loaded by setting preprocessed to True:
[10]:
preprocessed_dataset.load(preprocessed=True)
dataset.gaze[0].frame.head()
100%|██████████| 20/20 [00:00<00:00, 1329.54it/s]
[10]:
| text_id | page_id | time | x_right_pix | y_right_pix | y_right_pos | x_right_pos | y_right_vel | x_right_vel |
|---|---|---|---|---|---|---|---|---|
| i64 | i64 | f64 | f64 | f64 | f64 | f64 | f64 | f64 |
| 0 | 1 | 1.988145e6 | 206.8 | 152.4 | -12.005591 | -7.528075 | -3.589697 | 1.221164 |
| 0 | 1 | 1.988146e6 | 206.9 | 152.1 | -12.01277 | -7.525633 | -7.179203 | 2.442343 |
| 0 | 1 | 1.988147e6 | 207.0 | 151.8 | -12.019949 | -7.52319 | -5.184827 | 1.628238 |
| 0 | 1 | 1.988148e6 | 207.1 | 151.7 | -12.022342 | -7.520748 | -4.386968 | 0.407059 |
| 0 | 1 | 1.988149e6 | 207.0 | 151.5 | -12.027128 | -7.52319 | -3.190445 | 0.407069 |
By default, the preprocessed directory and the feather extension will be chosen.
In case of alternative directory names or other file formats you can use the following:
[11]:
preprocessed_dataset.load(
preprocessed=True,
preprocessed_dirname='preprocessed_csv',
extension='csv',
)
dataset.gaze[0].frame.head()
100%|██████████| 20/20 [00:00<00:00, 92.94it/s]
[11]:
| text_id | page_id | time | x_right_pix | y_right_pix | y_right_pos | x_right_pos | y_right_vel | x_right_vel |
|---|---|---|---|---|---|---|---|---|
| i64 | i64 | f64 | f64 | f64 | f64 | f64 | f64 | f64 |
| 0 | 1 | 1.988145e6 | 206.8 | 152.4 | -12.005591 | -7.528075 | -3.589697 | 1.221164 |
| 0 | 1 | 1.988146e6 | 206.9 | 152.1 | -12.01277 | -7.525633 | -7.179203 | 2.442343 |
| 0 | 1 | 1.988147e6 | 207.0 | 151.8 | -12.019949 | -7.52319 | -5.184827 | 1.628238 |
| 0 | 1 | 1.988148e6 | 207.1 | 151.7 | -12.022342 | -7.520748 | -4.386968 | 0.407059 |
| 0 | 1 | 1.988149e6 | 207.0 | 151.5 | -12.027128 | -7.52319 | -3.190445 | 0.407069 |
What you have learned in this tutorial:#
saving your preprocesed data using
Dataset.save_preprocessed()load your preprocesed data using
Dataset.load(preprocessed=True)using custom directory names by specifying
preprocessed_dirnameusing other file formats than the default
featherformat by specifyingextension