Saving and Loading Preprocessed Data#

What you will learn in this tutorial:#

  • how to save your preprocessed data

  • how to load your preprocessed data

Preparations#

We import pymovements as the alias pm for convenience.

[1]:
import pymovements as pm
/home/docs/checkouts/readthedocs.org/user_builds/pymovements/envs/v0.21.0/lib/python3.9/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm

Let’s start by downloading our ToyDataset and loading in its data:

[2]:
dataset = pm.Dataset('ToyDataset', path='data/ToyDataset')
dataset.download()
dataset.load()
Using already downloaded and verified file: data/ToyDataset/downloads/pymovements-toy-dataset.zip
Extracting pymovements-toy-dataset.zip to data/ToyDataset/raw
100%|██████████| 23/23 [00:00<00:00, 321.68it/s]
100%|██████████| 20/20 [00:00<00:00, 47.15it/s]
[2]:
<pymovements.dataset.dataset.Dataset at 0x7cb1fdb73640>

Now let’s load in the data and do some preprocessing:

[3]:
dataset.pix2deg()
dataset.pos2vel()

dataset.gaze[0]
100%|██████████| 20/20 [00:00<00:00, 22.54it/s]
100%|██████████| 20/20 [00:00<00:00, 46.36it/s]
[3]:
Experiment(screen=Screen(width_px=1280, height_px=1024, width_cm=38, height_cm=30.2, distance_cm=68, origin='upper left'), eyetracker=EyeTracker(sampling_rate=1000, left=None, right=None, model=None, version=None, vendor=None, mount=None))
shape: (17_223, 8)
┌─────────┬───────────┬───────────┬─────────┬─────────┬───────────┬────────────────┬───────────────┐
│ time    ┆ stimuli_x ┆ stimuli_y ┆ text_id ┆ page_id ┆ pixel     ┆ position       ┆ velocity      │
│ ---     ┆ ---       ┆ ---       ┆ ---     ┆ ---     ┆ ---       ┆ ---            ┆ ---           │
│ i64     ┆ f64       ┆ f64       ┆ i64     ┆ i64     ┆ list[f64] ┆ list[f64]      ┆ list[f64]     │
╞═════════╪═══════════╪═══════════╪═════════╪═════════╪═══════════╪════════════════╪═══════════════╡
│ 1988145 ┆ -1.0      ┆ -1.0      ┆ 0       ┆ 1       ┆ [206.8,   ┆ [-10.697598,   ┆ [null, null]  │
│         ┆           ┆           ┆         ┆         ┆ 152.4]    ┆ -8.852399]     ┆               │
│ 1988146 ┆ -1.0      ┆ -1.0      ┆ 0       ┆ 1       ┆ [206.9,   ┆ [-10.695183,   ┆ [null, null]  │
│         ┆           ┆           ┆         ┆         ┆ 152.1]    ┆ -8.859678]     ┆               │
│ 1988147 ┆ -1.0      ┆ -1.0      ┆ 0       ┆ 1       ┆ [207.0,   ┆ [-10.692768,   ┆ [1.610194,    │
│         ┆           ┆           ┆         ┆         ┆ 151.8]    ┆ -8.866956]     ┆ -5.256267]    │
│ 1988148 ┆ -1.0      ┆ -1.0      ┆ 0       ┆ 1       ┆ [207.1,   ┆ [-10.690352,   ┆ [0.402548,    │
│         ┆           ┆           ┆         ┆         ┆ 151.7]    ┆ -8.869381]     ┆ -4.447465]    │
│ 1988149 ┆ -1.0      ┆ -1.0      ┆ 0       ┆ 1       ┆ [207.0,   ┆ [-10.692768,   ┆ [0.402561,    │
│         ┆           ┆           ┆         ┆         ┆ 151.5]    ┆ -8.874233]     ┆ -3.234462]    │
│ …       ┆ …         ┆ …         ┆ …       ┆ …       ┆ …         ┆ …              ┆ …             │
│ 2005363 ┆ -1.0      ┆ -1.0      ┆ 0       ┆ 1       ┆ [361.0,   ┆ [-6.932438,    ┆ [-63.266374,  │
│         ┆           ┆           ┆         ┆         ┆ 415.4]    ┆ -2.386672]     ┆ -21.085616]   │
│ 2005364 ┆ -1.0      ┆ -1.0      ┆ 0       ┆ 1       ┆ [358.0,   ┆ [-7.006376,    ┆ [-63.249652,  │
│         ┆           ┆           ┆         ┆         ┆ 414.5]    ┆ -2.408998]     ┆ -19.431326]   │
│ 2005365 ┆ -1.0      ┆ -1.0      ┆ 0       ┆ 1       ┆ [355.8,   ┆ [-7.060582,    ┆ [-60.359624,  │
│         ┆           ┆           ┆         ┆         ┆ 413.8]    ┆ -2.426362]     ┆ -15.710061]   │
│ 2005366 ┆ -1.0      ┆ -1.0      ┆ 0       ┆ 1       ┆ [353.1,   ┆ [-7.12709,     ┆ [null, null]  │
│         ┆           ┆           ┆         ┆         ┆ 413.2]    ┆ -2.441245]     ┆               │
│ 2005367 ┆ -1.0      ┆ -1.0      ┆ 0       ┆ 1       ┆ [351.2,   ┆ [-7.173881,    ┆ [null, null]  │
│         ┆           ┆           ┆         ┆         ┆ 412.9]    ┆ -2.448686]     ┆               │
└─────────┴───────────┴───────────┴─────────┴─────────┴───────────┴────────────────┴───────────────┘

We have now added some additional columns for degrees in visual angle and velocity.

Saving#

Saving your preprocessed data is as simple as:

[4]:
dataset.save_preprocessed()
100%|██████████| 20/20 [00:00<00:00, 254.97it/s]
[4]:
<pymovements.dataset.dataset.Dataset at 0x7cb1fdb73640>

All of the preprocessed data is saved into this directory:

[5]:
dataset.paths.preprocessed
[5]:
PosixPath('data/ToyDataset/preprocessed')

Let’s confirm it by printing all the new files in this directory:

[6]:
print(list(dataset.paths.preprocessed.glob('*/*/*')))
[PosixPath('data/ToyDataset/preprocessed/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_1_1.feather'), PosixPath('data/ToyDataset/preprocessed/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_2_2.feather'), PosixPath('data/ToyDataset/preprocessed/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_2_3.feather'), PosixPath('data/ToyDataset/preprocessed/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_0_3.feather'), PosixPath('data/ToyDataset/preprocessed/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_1_4.feather'), PosixPath('data/ToyDataset/preprocessed/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_2_5.feather'), PosixPath('data/ToyDataset/preprocessed/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_3_2.feather'), PosixPath('data/ToyDataset/preprocessed/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_1_5.feather'), PosixPath('data/ToyDataset/preprocessed/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_0_4.feather'), PosixPath('data/ToyDataset/preprocessed/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_3_3.feather'), PosixPath('data/ToyDataset/preprocessed/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_3_5.feather'), PosixPath('data/ToyDataset/preprocessed/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_3_4.feather'), PosixPath('data/ToyDataset/preprocessed/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_2_1.feather'), PosixPath('data/ToyDataset/preprocessed/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_0_1.feather'), PosixPath('data/ToyDataset/preprocessed/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_3_1.feather'), PosixPath('data/ToyDataset/preprocessed/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_1_2.feather'), PosixPath('data/ToyDataset/preprocessed/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_0_2.feather'), PosixPath('data/ToyDataset/preprocessed/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_1_3.feather'), PosixPath('data/ToyDataset/preprocessed/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_2_4.feather'), PosixPath('data/ToyDataset/preprocessed/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_0_5.feather')]

All of the files have been saved into the Dataset.paths.preprocessed as feather files.

If we want to save the data into an alternative directory and also use a different file format like csv we can use the following:

[7]:
dataset.save_preprocessed(preprocessed_dirname='preprocessed_csv', extension='csv')
100%|██████████| 20/20 [00:00<00:00, 60.92it/s]
[7]:
<pymovements.dataset.dataset.Dataset at 0x7cb1fdb73640>

Let’s confirm again by printing all the new files in this alternative directory:

[8]:
alternative_dirpath = dataset.path / 'preprocessed_csv'
print(list(alternative_dirpath.glob('*/*/*')))
[PosixPath('data/ToyDataset/preprocessed_csv/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_3_1.csv'), PosixPath('data/ToyDataset/preprocessed_csv/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_0_2.csv'), PosixPath('data/ToyDataset/preprocessed_csv/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_0_3.csv'), PosixPath('data/ToyDataset/preprocessed_csv/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_1_4.csv'), PosixPath('data/ToyDataset/preprocessed_csv/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_1_3.csv'), PosixPath('data/ToyDataset/preprocessed_csv/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_0_4.csv'), PosixPath('data/ToyDataset/preprocessed_csv/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_0_5.csv'), PosixPath('data/ToyDataset/preprocessed_csv/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_2_5.csv'), PosixPath('data/ToyDataset/preprocessed_csv/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_1_5.csv'), PosixPath('data/ToyDataset/preprocessed_csv/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_3_4.csv'), PosixPath('data/ToyDataset/preprocessed_csv/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_2_2.csv'), PosixPath('data/ToyDataset/preprocessed_csv/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_2_1.csv'), PosixPath('data/ToyDataset/preprocessed_csv/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_0_1.csv'), PosixPath('data/ToyDataset/preprocessed_csv/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_3_2.csv'), PosixPath('data/ToyDataset/preprocessed_csv/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_1_1.csv'), PosixPath('data/ToyDataset/preprocessed_csv/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_3_3.csv'), PosixPath('data/ToyDataset/preprocessed_csv/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_1_2.csv'), PosixPath('data/ToyDataset/preprocessed_csv/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_3_5.csv'), PosixPath('data/ToyDataset/preprocessed_csv/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_2_4.csv'), PosixPath('data/ToyDataset/preprocessed_csv/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_2_3.csv')]

Loading#

Now let’s imagine that this preprocessing and saving was done in another file and we only want to load the preprocessed data.

We simulate this by initializing a new dataset object. We don’t need to download any additional data.

[9]:
events_dataset = pm.Dataset('ToyDataset', path='data/ToyDataset')

The preprocessed data can now simply be loaded by setting preprocessed to True:

[10]:
events_dataset.load(preprocessed=True)

events_dataset.gaze[0]
100%|██████████| 20/20 [00:00<00:00, 506.60it/s]
[10]:
Experiment(screen=Screen(width_px=1280, height_px=1024, width_cm=38, height_cm=30.2, distance_cm=68, origin='upper left'), eyetracker=EyeTracker(sampling_rate=1000, left=None, right=None, model=None, version=None, vendor=None, mount=None))
shape: (17_223, 8)
┌─────────┬───────────┬───────────┬───────────┬────────────────┬───────────────┬─────────┬─────────┐
│ time    ┆ stimuli_x ┆ stimuli_y ┆ pixel     ┆ position       ┆ velocity      ┆ text_id ┆ page_id │
│ ---     ┆ ---       ┆ ---       ┆ ---       ┆ ---            ┆ ---           ┆ ---     ┆ ---     │
│ i64     ┆ f64       ┆ f64       ┆ list[f64] ┆ list[f64]      ┆ list[f64]     ┆ i64     ┆ i64     │
╞═════════╪═══════════╪═══════════╪═══════════╪════════════════╪═══════════════╪═════════╪═════════╡
│ 1988145 ┆ -1.0      ┆ -1.0      ┆ [206.8,   ┆ [-10.697598,   ┆ [null, null]  ┆ 0       ┆ 1       │
│         ┆           ┆           ┆ 152.4]    ┆ -8.852399]     ┆               ┆         ┆         │
│ 1988146 ┆ -1.0      ┆ -1.0      ┆ [206.9,   ┆ [-10.695183,   ┆ [null, null]  ┆ 0       ┆ 1       │
│         ┆           ┆           ┆ 152.1]    ┆ -8.859678]     ┆               ┆         ┆         │
│ 1988147 ┆ -1.0      ┆ -1.0      ┆ [207.0,   ┆ [-10.692768,   ┆ [1.610194,    ┆ 0       ┆ 1       │
│         ┆           ┆           ┆ 151.8]    ┆ -8.866956]     ┆ -5.256267]    ┆         ┆         │
│ 1988148 ┆ -1.0      ┆ -1.0      ┆ [207.1,   ┆ [-10.690352,   ┆ [0.402548,    ┆ 0       ┆ 1       │
│         ┆           ┆           ┆ 151.7]    ┆ -8.869381]     ┆ -4.447465]    ┆         ┆         │
│ 1988149 ┆ -1.0      ┆ -1.0      ┆ [207.0,   ┆ [-10.692768,   ┆ [0.402561,    ┆ 0       ┆ 1       │
│         ┆           ┆           ┆ 151.5]    ┆ -8.874233]     ┆ -3.234462]    ┆         ┆         │
│ …       ┆ …         ┆ …         ┆ …         ┆ …              ┆ …             ┆ …       ┆ …       │
│ 2005363 ┆ -1.0      ┆ -1.0      ┆ [361.0,   ┆ [-6.932438,    ┆ [-63.266374,  ┆ 0       ┆ 1       │
│         ┆           ┆           ┆ 415.4]    ┆ -2.386672]     ┆ -21.085616]   ┆         ┆         │
│ 2005364 ┆ -1.0      ┆ -1.0      ┆ [358.0,   ┆ [-7.006376,    ┆ [-63.249652,  ┆ 0       ┆ 1       │
│         ┆           ┆           ┆ 414.5]    ┆ -2.408998]     ┆ -19.431326]   ┆         ┆         │
│ 2005365 ┆ -1.0      ┆ -1.0      ┆ [355.8,   ┆ [-7.060582,    ┆ [-60.359624,  ┆ 0       ┆ 1       │
│         ┆           ┆           ┆ 413.8]    ┆ -2.426362]     ┆ -15.710061]   ┆         ┆         │
│ 2005366 ┆ -1.0      ┆ -1.0      ┆ [353.1,   ┆ [-7.12709,     ┆ [null, null]  ┆ 0       ┆ 1       │
│         ┆           ┆           ┆ 413.2]    ┆ -2.441245]     ┆               ┆         ┆         │
│ 2005367 ┆ -1.0      ┆ -1.0      ┆ [351.2,   ┆ [-7.173881,    ┆ [null, null]  ┆ 0       ┆ 1       │
│         ┆           ┆           ┆ 412.9]    ┆ -2.448686]     ┆               ┆         ┆         │
└─────────┴───────────┴───────────┴───────────┴────────────────┴───────────────┴─────────┴─────────┘

By default, the preprocessed directory and the feather extension will be chosen.

In case of alternative directory names or other file formats you can use the following:

[11]:
events_dataset.load(
    preprocessed=True,
    preprocessed_dirname='preprocessed_csv',
    extension='csv',
)
events_dataset.gaze[0]
100%|██████████| 20/20 [00:00<00:00, 20.32it/s]
[11]:
shape: (17_223, 8)
┌─────────┬───────────┬───────────┬─────────┬─────────┬───────────┬────────────────┬───────────────┐
│ time    ┆ stimuli_x ┆ stimuli_y ┆ text_id ┆ page_id ┆ pixel     ┆ position       ┆ velocity      │
│ ---     ┆ ---       ┆ ---       ┆ ---     ┆ ---     ┆ ---       ┆ ---            ┆ ---           │
│ i64     ┆ f64       ┆ f64       ┆ i64     ┆ i64     ┆ list[f64] ┆ list[f64]      ┆ list[f64]     │
╞═════════╪═══════════╪═══════════╪═════════╪═════════╪═══════════╪════════════════╪═══════════════╡
│ 1988145 ┆ -1.0      ┆ -1.0      ┆ 0       ┆ 1       ┆ [206.8,   ┆ [-10.697598,   ┆ [null, null]  │
│         ┆           ┆           ┆         ┆         ┆ 152.4]    ┆ -8.852399]     ┆               │
│ 1988146 ┆ -1.0      ┆ -1.0      ┆ 0       ┆ 1       ┆ [206.9,   ┆ [-10.695183,   ┆ [null, null]  │
│         ┆           ┆           ┆         ┆         ┆ 152.1]    ┆ -8.859678]     ┆               │
│ 1988147 ┆ -1.0      ┆ -1.0      ┆ 0       ┆ 1       ┆ [207.0,   ┆ [-10.692768,   ┆ [1.610194,    │
│         ┆           ┆           ┆         ┆         ┆ 151.8]    ┆ -8.866956]     ┆ -5.256267]    │
│ 1988148 ┆ -1.0      ┆ -1.0      ┆ 0       ┆ 1       ┆ [207.1,   ┆ [-10.690352,   ┆ [0.402548,    │
│         ┆           ┆           ┆         ┆         ┆ 151.7]    ┆ -8.869381]     ┆ -4.447465]    │
│ 1988149 ┆ -1.0      ┆ -1.0      ┆ 0       ┆ 1       ┆ [207.0,   ┆ [-10.692768,   ┆ [0.402561,    │
│         ┆           ┆           ┆         ┆         ┆ 151.5]    ┆ -8.874233]     ┆ -3.234462]    │
│ …       ┆ …         ┆ …         ┆ …       ┆ …       ┆ …         ┆ …              ┆ …             │
│ 2005363 ┆ -1.0      ┆ -1.0      ┆ 0       ┆ 1       ┆ [361.0,   ┆ [-6.932438,    ┆ [-63.266374,  │
│         ┆           ┆           ┆         ┆         ┆ 415.4]    ┆ -2.386672]     ┆ -21.085616]   │
│ 2005364 ┆ -1.0      ┆ -1.0      ┆ 0       ┆ 1       ┆ [358.0,   ┆ [-7.006376,    ┆ [-63.249652,  │
│         ┆           ┆           ┆         ┆         ┆ 414.5]    ┆ -2.408998]     ┆ -19.431326]   │
│ 2005365 ┆ -1.0      ┆ -1.0      ┆ 0       ┆ 1       ┆ [355.8,   ┆ [-7.060582,    ┆ [-60.359624,  │
│         ┆           ┆           ┆         ┆         ┆ 413.8]    ┆ -2.426362]     ┆ -15.710061]   │
│ 2005366 ┆ -1.0      ┆ -1.0      ┆ 0       ┆ 1       ┆ [353.1,   ┆ [-7.12709,     ┆ [null, null]  │
│         ┆           ┆           ┆         ┆         ┆ 413.2]    ┆ -2.441245]     ┆               │
│ 2005367 ┆ -1.0      ┆ -1.0      ┆ 0       ┆ 1       ┆ [351.2,   ┆ [-7.173881,    ┆ [null, null]  │
│         ┆           ┆           ┆         ┆         ┆ 412.9]    ┆ -2.448686]     ┆               │
└─────────┴───────────┴───────────┴─────────┴─────────┴───────────┴────────────────┴───────────────┘

What you have learned in this tutorial:#

  • saving your preprocesed data using Dataset.save_preprocessed()

  • load your preprocesed data using Dataset.load(preprocessed=True)

  • using custom directory names by specifying preprocessed_dirname

  • using other file formats than the default feather format by specifying extension