Saving and Loading Preprocessed Data#

What you will learn in this tutorial:#

  • how to save your preprocessed data

  • how to load your preprocessed data

Preparations#

We import pymovements as the alias pm for convenience.

[1]:
import pymovements as pm
/home/docs/checkouts/readthedocs.org/user_builds/pymovements/envs/v0.21.2/lib/python3.9/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm

Let’s start by downloading our ToyDataset and loading in its data:

[2]:
dataset = pm.Dataset('ToyDataset', path='data/ToyDataset')
dataset.download()
dataset.load()
INFO:pymovements.dataset.dataset_download:You are downloading the pymovements Toy Dataset. Please be aware that pymovements does not
host or distribute any dataset resources and only provides a convenient interface to
download the public dataset resources that were published by their respective authors.

Please cite the referenced publication if you intend to use the dataset in your research.

Using already downloaded and verified file: data/ToyDataset/downloads/pymovements-toy-dataset.zip
Extracting pymovements-toy-dataset.zip to data/ToyDataset/raw
100%|██████████| 23/23 [00:00<00:00, 302.86it/s]
100%|██████████| 20/20 [00:00<00:00, 57.88it/s]
[2]:
<pymovements.dataset.dataset.Dataset at 0x746f34f72730>

Now let’s load in the data and do some preprocessing:

[3]:
dataset.pix2deg()
dataset.pos2vel()

dataset.gaze[0]
100%|██████████| 20/20 [00:00<00:00, 28.68it/s]
100%|██████████| 20/20 [00:00<00:00, 31.41it/s]
[3]:
Experiment(screen=Screen(width_px=1280, height_px=1024, width_cm=38, height_cm=30.2, distance_cm=68, origin='upper left'), eyetracker=EyeTracker(sampling_rate=1000, left=None, right=None, model=None, version=None, vendor=None, mount=None))
shape: (17_223, 8)
┌─────────┬───────────┬───────────┬─────────┬─────────┬───────────┬────────────────┬───────────────┐
│ time    ┆ stimuli_x ┆ stimuli_y ┆ text_id ┆ page_id ┆ pixel     ┆ position       ┆ velocity      │
│ ---     ┆ ---       ┆ ---       ┆ ---     ┆ ---     ┆ ---       ┆ ---            ┆ ---           │
│ i64     ┆ f64       ┆ f64       ┆ i64     ┆ i64     ┆ list[f64] ┆ list[f64]      ┆ list[f64]     │
╞═════════╪═══════════╪═══════════╪═════════╪═════════╪═══════════╪════════════════╪═══════════════╡
│ 1988145 ┆ -1.0      ┆ -1.0      ┆ 0       ┆ 1       ┆ [206.8,   ┆ [-10.697598,   ┆ [null, null]  │
│         ┆           ┆           ┆         ┆         ┆ 152.4]    ┆ -8.852399]     ┆               │
│ 1988146 ┆ -1.0      ┆ -1.0      ┆ 0       ┆ 1       ┆ [206.9,   ┆ [-10.695183,   ┆ [null, null]  │
│         ┆           ┆           ┆         ┆         ┆ 152.1]    ┆ -8.859678]     ┆               │
│ 1988147 ┆ -1.0      ┆ -1.0      ┆ 0       ┆ 1       ┆ [207.0,   ┆ [-10.692768,   ┆ [1.610194,    │
│         ┆           ┆           ┆         ┆         ┆ 151.8]    ┆ -8.866956]     ┆ -5.256267]    │
│ 1988148 ┆ -1.0      ┆ -1.0      ┆ 0       ┆ 1       ┆ [207.1,   ┆ [-10.690352,   ┆ [0.402548,    │
│         ┆           ┆           ┆         ┆         ┆ 151.7]    ┆ -8.869381]     ┆ -4.447465]    │
│ 1988149 ┆ -1.0      ┆ -1.0      ┆ 0       ┆ 1       ┆ [207.0,   ┆ [-10.692768,   ┆ [0.402561,    │
│         ┆           ┆           ┆         ┆         ┆ 151.5]    ┆ -8.874233]     ┆ -3.234462]    │
│ …       ┆ …         ┆ …         ┆ …       ┆ …       ┆ …         ┆ …              ┆ …             │
│ 2005363 ┆ -1.0      ┆ -1.0      ┆ 0       ┆ 1       ┆ [361.0,   ┆ [-6.932438,    ┆ [-63.266374,  │
│         ┆           ┆           ┆         ┆         ┆ 415.4]    ┆ -2.386672]     ┆ -21.085616]   │
│ 2005364 ┆ -1.0      ┆ -1.0      ┆ 0       ┆ 1       ┆ [358.0,   ┆ [-7.006376,    ┆ [-63.249652,  │
│         ┆           ┆           ┆         ┆         ┆ 414.5]    ┆ -2.408998]     ┆ -19.431326]   │
│ 2005365 ┆ -1.0      ┆ -1.0      ┆ 0       ┆ 1       ┆ [355.8,   ┆ [-7.060582,    ┆ [-60.359624,  │
│         ┆           ┆           ┆         ┆         ┆ 413.8]    ┆ -2.426362]     ┆ -15.710061]   │
│ 2005366 ┆ -1.0      ┆ -1.0      ┆ 0       ┆ 1       ┆ [353.1,   ┆ [-7.12709,     ┆ [null, null]  │
│         ┆           ┆           ┆         ┆         ┆ 413.2]    ┆ -2.441245]     ┆               │
│ 2005367 ┆ -1.0      ┆ -1.0      ┆ 0       ┆ 1       ┆ [351.2,   ┆ [-7.173881,    ┆ [null, null]  │
│         ┆           ┆           ┆         ┆         ┆ 412.9]    ┆ -2.448686]     ┆               │
└─────────┴───────────┴───────────┴─────────┴─────────┴───────────┴────────────────┴───────────────┘

We have now added some additional columns for degrees in visual angle and velocity.

Saving#

Saving your preprocessed data is as simple as:

[4]:
dataset.save_preprocessed()
100%|██████████| 20/20 [00:00<00:00, 121.95it/s]
[4]:
<pymovements.dataset.dataset.Dataset at 0x746f34f72730>

All of the preprocessed data is saved into this directory:

[5]:
dataset.paths.preprocessed
[5]:
PosixPath('data/ToyDataset/preprocessed')

Let’s confirm it by printing all the new files in this directory:

[6]:
print(list(dataset.paths.preprocessed.glob('*/*/*')))
[PosixPath('data/ToyDataset/preprocessed/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_1_1.feather'), PosixPath('data/ToyDataset/preprocessed/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_2_2.feather'), PosixPath('data/ToyDataset/preprocessed/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_2_3.feather'), PosixPath('data/ToyDataset/preprocessed/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_0_3.feather'), PosixPath('data/ToyDataset/preprocessed/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_1_4.feather'), PosixPath('data/ToyDataset/preprocessed/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_2_5.feather'), PosixPath('data/ToyDataset/preprocessed/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_3_2.feather'), PosixPath('data/ToyDataset/preprocessed/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_1_5.feather'), PosixPath('data/ToyDataset/preprocessed/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_0_4.feather'), PosixPath('data/ToyDataset/preprocessed/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_3_3.feather'), PosixPath('data/ToyDataset/preprocessed/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_3_5.feather'), PosixPath('data/ToyDataset/preprocessed/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_3_4.feather'), PosixPath('data/ToyDataset/preprocessed/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_2_1.feather'), PosixPath('data/ToyDataset/preprocessed/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_0_1.feather'), PosixPath('data/ToyDataset/preprocessed/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_3_1.feather'), PosixPath('data/ToyDataset/preprocessed/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_1_2.feather'), PosixPath('data/ToyDataset/preprocessed/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_0_2.feather'), PosixPath('data/ToyDataset/preprocessed/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_1_3.feather'), PosixPath('data/ToyDataset/preprocessed/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_2_4.feather'), PosixPath('data/ToyDataset/preprocessed/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_0_5.feather')]

All of the files have been saved into the Dataset.paths.preprocessed as feather files.

If we want to save the data into an alternative directory and also use a different file format like csv we can use the following:

[7]:
dataset.save_preprocessed(preprocessed_dirname='preprocessed_csv', extension='csv')
100%|██████████| 20/20 [00:00<00:00, 32.77it/s]
[7]:
<pymovements.dataset.dataset.Dataset at 0x746f34f72730>

Let’s confirm again by printing all the new files in this alternative directory:

[8]:
alternative_dirpath = dataset.path / 'preprocessed_csv'
print(list(alternative_dirpath.glob('*/*/*')))
[PosixPath('data/ToyDataset/preprocessed_csv/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_3_1.csv'), PosixPath('data/ToyDataset/preprocessed_csv/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_0_2.csv'), PosixPath('data/ToyDataset/preprocessed_csv/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_0_3.csv'), PosixPath('data/ToyDataset/preprocessed_csv/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_1_4.csv'), PosixPath('data/ToyDataset/preprocessed_csv/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_1_3.csv'), PosixPath('data/ToyDataset/preprocessed_csv/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_0_4.csv'), PosixPath('data/ToyDataset/preprocessed_csv/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_0_5.csv'), PosixPath('data/ToyDataset/preprocessed_csv/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_2_5.csv'), PosixPath('data/ToyDataset/preprocessed_csv/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_1_5.csv'), PosixPath('data/ToyDataset/preprocessed_csv/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_3_4.csv'), PosixPath('data/ToyDataset/preprocessed_csv/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_2_2.csv'), PosixPath('data/ToyDataset/preprocessed_csv/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_2_1.csv'), PosixPath('data/ToyDataset/preprocessed_csv/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_0_1.csv'), PosixPath('data/ToyDataset/preprocessed_csv/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_3_2.csv'), PosixPath('data/ToyDataset/preprocessed_csv/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_1_1.csv'), PosixPath('data/ToyDataset/preprocessed_csv/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_3_3.csv'), PosixPath('data/ToyDataset/preprocessed_csv/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_1_2.csv'), PosixPath('data/ToyDataset/preprocessed_csv/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_3_5.csv'), PosixPath('data/ToyDataset/preprocessed_csv/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_2_4.csv'), PosixPath('data/ToyDataset/preprocessed_csv/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_2_3.csv')]

Loading#

Now let’s imagine that this preprocessing and saving was done in another file and we only want to load the preprocessed data.

We simulate this by initializing a new dataset object. We don’t need to download any additional data.

[9]:
events_dataset = pm.Dataset('ToyDataset', path='data/ToyDataset')

The preprocessed data can now simply be loaded by setting preprocessed to True:

[10]:
events_dataset.load(preprocessed=True)

events_dataset.gaze[0]
100%|██████████| 20/20 [00:00<00:00, 254.44it/s]
[10]:
Experiment(screen=Screen(width_px=1280, height_px=1024, width_cm=38, height_cm=30.2, distance_cm=68, origin='upper left'), eyetracker=EyeTracker(sampling_rate=1000, left=None, right=None, model=None, version=None, vendor=None, mount=None))
shape: (17_223, 8)
┌─────────┬───────────┬───────────┬───────────┬────────────────┬───────────────┬─────────┬─────────┐
│ time    ┆ stimuli_x ┆ stimuli_y ┆ pixel     ┆ position       ┆ velocity      ┆ text_id ┆ page_id │
│ ---     ┆ ---       ┆ ---       ┆ ---       ┆ ---            ┆ ---           ┆ ---     ┆ ---     │
│ i64     ┆ f64       ┆ f64       ┆ list[f64] ┆ list[f64]      ┆ list[f64]     ┆ i64     ┆ i64     │
╞═════════╪═══════════╪═══════════╪═══════════╪════════════════╪═══════════════╪═════════╪═════════╡
│ 1988145 ┆ -1.0      ┆ -1.0      ┆ [206.8,   ┆ [-10.697598,   ┆ [null, null]  ┆ 0       ┆ 1       │
│         ┆           ┆           ┆ 152.4]    ┆ -8.852399]     ┆               ┆         ┆         │
│ 1988146 ┆ -1.0      ┆ -1.0      ┆ [206.9,   ┆ [-10.695183,   ┆ [null, null]  ┆ 0       ┆ 1       │
│         ┆           ┆           ┆ 152.1]    ┆ -8.859678]     ┆               ┆         ┆         │
│ 1988147 ┆ -1.0      ┆ -1.0      ┆ [207.0,   ┆ [-10.692768,   ┆ [1.610194,    ┆ 0       ┆ 1       │
│         ┆           ┆           ┆ 151.8]    ┆ -8.866956]     ┆ -5.256267]    ┆         ┆         │
│ 1988148 ┆ -1.0      ┆ -1.0      ┆ [207.1,   ┆ [-10.690352,   ┆ [0.402548,    ┆ 0       ┆ 1       │
│         ┆           ┆           ┆ 151.7]    ┆ -8.869381]     ┆ -4.447465]    ┆         ┆         │
│ 1988149 ┆ -1.0      ┆ -1.0      ┆ [207.0,   ┆ [-10.692768,   ┆ [0.402561,    ┆ 0       ┆ 1       │
│         ┆           ┆           ┆ 151.5]    ┆ -8.874233]     ┆ -3.234462]    ┆         ┆         │
│ …       ┆ …         ┆ …         ┆ …         ┆ …              ┆ …             ┆ …       ┆ …       │
│ 2005363 ┆ -1.0      ┆ -1.0      ┆ [361.0,   ┆ [-6.932438,    ┆ [-63.266374,  ┆ 0       ┆ 1       │
│         ┆           ┆           ┆ 415.4]    ┆ -2.386672]     ┆ -21.085616]   ┆         ┆         │
│ 2005364 ┆ -1.0      ┆ -1.0      ┆ [358.0,   ┆ [-7.006376,    ┆ [-63.249652,  ┆ 0       ┆ 1       │
│         ┆           ┆           ┆ 414.5]    ┆ -2.408998]     ┆ -19.431326]   ┆         ┆         │
│ 2005365 ┆ -1.0      ┆ -1.0      ┆ [355.8,   ┆ [-7.060582,    ┆ [-60.359624,  ┆ 0       ┆ 1       │
│         ┆           ┆           ┆ 413.8]    ┆ -2.426362]     ┆ -15.710061]   ┆         ┆         │
│ 2005366 ┆ -1.0      ┆ -1.0      ┆ [353.1,   ┆ [-7.12709,     ┆ [null, null]  ┆ 0       ┆ 1       │
│         ┆           ┆           ┆ 413.2]    ┆ -2.441245]     ┆               ┆         ┆         │
│ 2005367 ┆ -1.0      ┆ -1.0      ┆ [351.2,   ┆ [-7.173881,    ┆ [null, null]  ┆ 0       ┆ 1       │
│         ┆           ┆           ┆ 412.9]    ┆ -2.448686]     ┆               ┆         ┆         │
└─────────┴───────────┴───────────┴───────────┴────────────────┴───────────────┴─────────┴─────────┘

By default, the preprocessed directory and the feather extension will be chosen.

In case of alternative directory names or other file formats you can use the following:

[11]:
events_dataset.load(
    preprocessed=True,
    preprocessed_dirname='preprocessed_csv',
    extension='csv',
)
events_dataset.gaze[0]
100%|██████████| 20/20 [00:01<00:00, 11.54it/s]
[11]:
shape: (17_223, 8)
┌─────────┬───────────┬───────────┬─────────┬─────────┬───────────┬────────────────┬───────────────┐
│ time    ┆ stimuli_x ┆ stimuli_y ┆ text_id ┆ page_id ┆ pixel     ┆ position       ┆ velocity      │
│ ---     ┆ ---       ┆ ---       ┆ ---     ┆ ---     ┆ ---       ┆ ---            ┆ ---           │
│ i64     ┆ f64       ┆ f64       ┆ i64     ┆ i64     ┆ list[f64] ┆ list[f64]      ┆ list[f64]     │
╞═════════╪═══════════╪═══════════╪═════════╪═════════╪═══════════╪════════════════╪═══════════════╡
│ 1988145 ┆ -1.0      ┆ -1.0      ┆ 0       ┆ 1       ┆ [206.8,   ┆ [-10.697598,   ┆ [null, null]  │
│         ┆           ┆           ┆         ┆         ┆ 152.4]    ┆ -8.852399]     ┆               │
│ 1988146 ┆ -1.0      ┆ -1.0      ┆ 0       ┆ 1       ┆ [206.9,   ┆ [-10.695183,   ┆ [null, null]  │
│         ┆           ┆           ┆         ┆         ┆ 152.1]    ┆ -8.859678]     ┆               │
│ 1988147 ┆ -1.0      ┆ -1.0      ┆ 0       ┆ 1       ┆ [207.0,   ┆ [-10.692768,   ┆ [1.610194,    │
│         ┆           ┆           ┆         ┆         ┆ 151.8]    ┆ -8.866956]     ┆ -5.256267]    │
│ 1988148 ┆ -1.0      ┆ -1.0      ┆ 0       ┆ 1       ┆ [207.1,   ┆ [-10.690352,   ┆ [0.402548,    │
│         ┆           ┆           ┆         ┆         ┆ 151.7]    ┆ -8.869381]     ┆ -4.447465]    │
│ 1988149 ┆ -1.0      ┆ -1.0      ┆ 0       ┆ 1       ┆ [207.0,   ┆ [-10.692768,   ┆ [0.402561,    │
│         ┆           ┆           ┆         ┆         ┆ 151.5]    ┆ -8.874233]     ┆ -3.234462]    │
│ …       ┆ …         ┆ …         ┆ …       ┆ …       ┆ …         ┆ …              ┆ …             │
│ 2005363 ┆ -1.0      ┆ -1.0      ┆ 0       ┆ 1       ┆ [361.0,   ┆ [-6.932438,    ┆ [-63.266374,  │
│         ┆           ┆           ┆         ┆         ┆ 415.4]    ┆ -2.386672]     ┆ -21.085616]   │
│ 2005364 ┆ -1.0      ┆ -1.0      ┆ 0       ┆ 1       ┆ [358.0,   ┆ [-7.006376,    ┆ [-63.249652,  │
│         ┆           ┆           ┆         ┆         ┆ 414.5]    ┆ -2.408998]     ┆ -19.431326]   │
│ 2005365 ┆ -1.0      ┆ -1.0      ┆ 0       ┆ 1       ┆ [355.8,   ┆ [-7.060582,    ┆ [-60.359624,  │
│         ┆           ┆           ┆         ┆         ┆ 413.8]    ┆ -2.426362]     ┆ -15.710061]   │
│ 2005366 ┆ -1.0      ┆ -1.0      ┆ 0       ┆ 1       ┆ [353.1,   ┆ [-7.12709,     ┆ [null, null]  │
│         ┆           ┆           ┆         ┆         ┆ 413.2]    ┆ -2.441245]     ┆               │
│ 2005367 ┆ -1.0      ┆ -1.0      ┆ 0       ┆ 1       ┆ [351.2,   ┆ [-7.173881,    ┆ [null, null]  │
│         ┆           ┆           ┆         ┆         ┆ 412.9]    ┆ -2.448686]     ┆               │
└─────────┴───────────┴───────────┴─────────┴─────────┴───────────┴────────────────┴───────────────┘

What you have learned in this tutorial:#

  • saving your preprocesed data using Dataset.save_preprocessed()

  • load your preprocesed data using Dataset.load(preprocessed=True)

  • using custom directory names by specifying preprocessed_dirname

  • using other file formats than the default feather format by specifying extension