Working with Local Dataset#

In this tutorial, we will show how to use your own local dataset with the Dataset class. The Dataset class can help you to manage and process your eyetracking data.

Preparations#

We import pymovements as the alias pm for convenience.

[1]:

import pymovements as pm

/home/docs/checkouts/readthedocs.org/user_builds/pymovements/envs/v0.13.0/lib/python3.9/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm

For demonstration purposes, we will use the raw data provided by the Toy dataset, a sample dataset that comes with pymovements.

We will download the resources of this dataset the directory to simulate a local dataset for you. All downloaded archive files are automatically extracted and then removed. The directory of the dataset will be data/my_dataset.

After that we won’t use the python class anymore and delete the object (the files on your system will stay in place). Don’t worry if you’re confused about these lines as they are not relevant to your use case.

Just keep in mind that we now have some files with gaze data in the directory data/my_dataset.

[2]:

toy_dataset = pm.Dataset('ToyDataset', path='data/my_dataset')
toy_dataset.download(remove_finished=True)

del toy_dataset

Downloading http://github.com/aeye-lab/pymovements-toy-dataset/zipball/6cb5d663317bf418cec0c9abe1dde5085a8a8ebd/ to data/my_dataset/downloads/pymovements-toy-dataset.zip

pymovements-toy-dataset.zip: 100%|██████████| 3.06M/3.06M [00:00<00:00, 23.3MB/s]

Checking integrity of pymovements-toy-dataset.zip
Extracting pymovements-toy-dataset.zip to data/my_dataset/raw

Define your Experiment#

To use the Dataset class, we first need to create an Experiment instance. This class represents the properties of the experiment, such as the screen dimensions and sampling rate.

[3]:

experiment = pm.gaze.Experiment(
    screen_width_px=1280,
    screen_height_px=1024,
    screen_width_cm=38,
    screen_height_cm=30.2,
    distance_cm=68,
    origin='lower left',
    sampling_rate=1000,
)

Parameters for File Parsing#

We also define a filename_format which is a pattern expression used to match and extract values from filenames of data files in the dataset. For example, r'trial_{text_id:d}_{page_id:d}.csv' will match filenames that follow the pattern trial_{text_id}_{page_id}.csv and extract the values of text_id and page_id for each file.

[4]:

filename_format = r'trial_{text_id:d}_{page_id:d}.csv'

Both values of text_id and page_id are numeric. We can use a map to define the casting of these values.

[5]:

filename_format_dtypes = {
    'text_id': int,
    'page_id': int,
}

We can also adjust how the CSV files are read.

The column_map dictionary maps the original column names in the CSV files to the desired column names. Here the original column names are ‘timestamp’, ‘x’, and ‘y’, and the desired column names are ‘time’, ‘x_pix’, and ‘y_pix’, respectively.

[6]:

column_map = {
    'timestamp': 'time',
    'x': 'x_pix',
    'y': 'y_pix',
}

Here, we specify that the separator in the CSV files is a tab (‘:nbsphinx-math:`t’`).

[7]:

custom_read_kwargs = {
    'separator': '\t',
}

Define and load the Dataset#

Next we use all these definitions and create a DatasetDefinition by passing in the root directory, Experiment instance, and other optional parameters such as the filename regular expression and custom CSV reading parameters.

[8]:

dataset_definition = pm.DatasetDefinition(
    name='my_dataset',
    experiment=experiment,
    filename_format=filename_format,
    filename_format_dtypes=filename_format_dtypes,
    column_map=column_map,
    custom_read_kwargs=custom_read_kwargs,
)

Finally we create a Dataset instance by using the DatasetDefinition and specifying the directory path.

[9]:

dataset = pm.Dataset(
    definition=dataset_definition,
    path='data/my_dataset/',
)

If we have a root data directory which holds all your local datasets we can further need to define the paths of the dataset.

The dataset, raw, preprocessed, and events parameters define the names of the directories for the dataset, raw data, preprocessed data, and events data, respectively.

[10]:

dataset_paths = pm.DatasetPaths(
    root='data/',
    raw='raw',
    preprocessed='preprocessed',
    events='events',
)

dataset = pm.Dataset(
    definition=dataset_definition,
    path=dataset_paths,
)

Now let’s load the dataset into memory. Here we select a subset including the first page of texts with ID 1 and 2.

[11]:

subset = {
    'text_id': [1, 2],
    'page_id': 1,
}

dataset.load(subset=subset)

100%|██████████| 2/2 [00:00<00:00, 163.07it/s]

[11]:

<pymovements.dataset.dataset.Dataset at 0x7f6c544769a0>

Use the Dataset#

Once we have created the Dataset instance, we can use its methods to preprocess and analyze data in our local dataset.

[12]:

dataset.gaze[0].frame

[12]:

shape: (23_054, 5)

text_id	page_id	time	x_pix	y_pix
i64	i64	f64	f64	f64
1	1	2.415266e6	176.8	140.2
1	1	2.415267e6	176.7	139.8
1	1	2.415268e6	176.7	139.3
1	1	2.415269e6	176.6	139.3
1	1	2.41527e6	176.7	139.3
1	1	2.415271e6	176.8	139.5
1	1	2.415272e6	177.3	139.8
1	1	2.415273e6	177.8	140.0
1	1	2.415274e6	178.3	140.0
1	1	2.415275e6	178.3	139.9
1	1	2.415276e6	178.0	140.2
1	1	2.415277e6	177.7	140.4
…	…	…	…	…
1	1	2.438308e6	649.1	633.7
1	1	2.438309e6	648.8	633.9
1	1	2.43831e6	649.1	634.1
1	1	2.438311e6	649.6	634.2
1	1	2.438312e6	650.1	634.1
1	1	2.438313e6	650.0	634.0
1	1	2.438314e6	649.9	633.9
1	1	2.438315e6	649.9	633.9
1	1	2.438316e6	650.1	633.7
1	1	2.438317e6	650.2	633.5
1	1	2.438318e6	650.0	633.2
1	1	2.438319e6	649.7	633.1

Here we use the pix2deg method to convert the pixel coordinates to degrees of visual angle.

[13]:

dataset.pix2deg()

dataset.gaze[0].frame

100%|██████████| 2/2 [00:00<00:00, 486.27it/s]

[13]:

shape: (23_054, 7)

text_id	page_id	time	x_pix	y_pix	x_pos	y_pos
i64	i64	f64	f64	f64	f64	f64
1	1	2.415266e6	176.8	140.2	-11.420403	-9.148145
1	1	2.415267e6	176.7	139.8	-11.422806	-9.157834
1	1	2.415268e6	176.7	139.3	-11.422806	-9.169943
1	1	2.415269e6	176.6	139.3	-11.42521	-9.169943
1	1	2.41527e6	176.7	139.3	-11.422806	-9.169943
1	1	2.415271e6	176.8	139.5	-11.420403	-9.1651
1	1	2.415272e6	177.3	139.8	-11.408386	-9.157834
1	1	2.415273e6	177.8	140.0	-11.396367	-9.15299
1	1	2.415274e6	178.3	140.0	-11.384348	-9.15299
1	1	2.415275e6	178.3	139.9	-11.384348	-9.155412
1	1	2.415276e6	178.0	140.2	-11.39156	-9.148145
1	1	2.415277e6	177.7	140.4	-11.398771	-9.143301
…	…	…	…	…	…	…
1	1	2.438308e6	649.1	633.7	0.240135	3.033792
1	1	2.438309e6	648.8	633.9	0.232631	3.038748
1	1	2.43831e6	649.1	634.1	0.240135	3.043704
1	1	2.438311e6	649.6	634.2	0.252642	3.046182
1	1	2.438312e6	650.1	634.1	0.265149	3.043704
1	1	2.438313e6	650.0	634.0	0.262648	3.041226
1	1	2.438314e6	649.9	633.9	0.260146	3.038748
1	1	2.438315e6	649.9	633.9	0.260146	3.038748
1	1	2.438316e6	650.1	633.7	0.265149	3.033792
1	1	2.438317e6	650.2	633.5	0.26765	3.028836
1	1	2.438318e6	650.0	633.2	0.262648	3.021402
1	1	2.438319e6	649.7	633.1	0.255144	3.018924

We can use the pos2vel method to calculate the velocity of the gaze position.

[14]:

dataset.pos2vel(method='savitzky_golay', window_length=7, polyorder=2)

dataset.gaze[0].frame

100%|██████████| 2/2 [00:00<00:00, 303.03it/s]

[14]:

shape: (23_054, 9)

text_id	page_id	time	x_pix	y_pix	x_pos	y_pos	y_vel	x_vel
i64	i64	f64	f64	f64	f64	f64	f64	f64
1	1	2.415266e6	176.8	140.2	-11.420403	-9.148145	-13.666945	-5.235971
1	1	2.415267e6	176.7	139.8	-11.422806	-9.157834	-9.630308	-3.004237
1	1	2.415268e6	176.7	139.3	-11.422806	-9.169943	-5.59367	-0.772503
1	1	2.415269e6	176.6	139.3	-11.42521	-9.169943	-1.557032	1.459231
1	1	2.41527e6	176.7	139.3	-11.422806	-9.169943	1.556983	4.034446
1	1	2.415271e6	176.8	139.5	-11.420403	-9.1651	3.459956	6.695697
1	1	2.415272e6	177.3	139.8	-11.408386	-9.157834	3.20046	7.983442
1	1	2.415273e6	177.8	140.0	-11.396367	-9.15299	3.200507	6.78167
1	1	2.415274e6	178.3	140.0	-11.384348	-9.15299	2.941092	3.948804
1	1	2.415275e6	178.3	139.9	-11.384348	-9.155412	3.460254	0.343335
1	1	2.415276e6	178.0	140.2	-11.39156	-9.148145	4.152379	-1.717019
1	1	2.415277e6	177.7	140.4	-11.398771	-9.143301	5.36358	-1.974598
…	…	…	…	…	…	…	…	…
1	1	2.438308e6	649.1	633.7	0.240135	3.033792	0.708004	-0.268006
1	1	2.438309e6	648.8	633.9	0.232631	3.038748	2.566488	2.23337
1	1	2.43831e6	649.1	634.1	0.240135	3.043704	2.566496	4.109403
1	1	2.438311e6	649.6	634.2	0.252642	3.046182	0.707998	5.181423
1	1	2.438312e6	650.1	634.1	0.265149	3.043704	-0.530993	4.73475
1	1	2.438313e6	650.0	634.0	0.262648	3.041226	-1.769984	3.037385
1	1	2.438314e6	649.9	633.9	0.260146	3.038748	-2.654987	1.518691
1	1	2.438315e6	649.9	633.9	0.260146	3.038748	-3.451512	0.268004
1	1	2.438316e6	650.1	633.7	0.265149	3.033792	-3.982536	-0.357339
1	1	2.438317e6	650.2	633.5	0.26765	3.028836	-4.867566	-1.667582
1	1	2.438318e6	650.0	633.2	0.262648	3.021402	-5.752595	-2.977824
1	1	2.438319e6	649.7	633.1	0.255144	3.018924	-6.637625	-4.288067