Working with Local Dataset#

In this tutorial, we will show how to use your own local dataset with the Dataset class. The Dataset class can help you to manage and process your eyetracking data.

For demonstration purposes, we will use the raw data provided by the Toy dataset, a sample dataset that comes with pymovements.

[1]:
import pymovements as pm

toy_dataset = pm.datasets.ToyDataset(
    root='data/',
    download=True,
    extract=True,
    remove_finished=True,
)
/home/docs/checkouts/readthedocs.org/user_builds/pymovements/envs/v0.7.0/lib/python3.9/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm
Downloading http://github.com/aeye-lab/pymovements-toy-dataset/zipball/6cb5d663317bf418cec0c9abe1dde5085a8a8ebd/ to data/ToyDataset/downloads/pymovements-toy-dataset.zip
pymovements-toy-dataset.zip: 100%|██████████| 3.06M/3.06M [00:00<00:00, 24.7MB/s]

Define your Experiment#

To use the Dataset class, we first need to create an Experiment instance. This class represents the properties of the experiment, such as the screen dimensions and sampling rate.

[2]:
experiment = pm.gaze.Experiment(
    screen_width_px=1280,
    screen_height_px=1024,
    screen_width_cm=38,
    screen_height_cm=30.2,
    distance_cm=68,
    origin='lower left',
    sampling_rate=1000,
)

Parameters for File Parsing#

We also define a filename_regex which is a regular expression used to match and extract values from filenames of data files in the dataset. For example, r'trial_(?P<text_id>\d+)_(?P<page_id>\d+).csv' will match filenames that follow the pattern trial_{text_id}_{page_id}.csv and extract the values of text_id and page_id for each file.

[3]:
filename_regex = r'trial_(?P<text_id>\d+)_(?P<page_id>\d+).csv'

Both values of text_id and page_id are numeric. We can use a map to define the casting of these values.

[4]:
filename_regex_dtypes = {
    'text_id': int,
    'page_id': int,
}

We can also adjust how the CSV files are read.

The column_map dictionary maps the original column names in the CSV files to the desired column names. Here the original column names are ‘timestamp’, ‘x’, and ‘y’, and the desired column names are ‘time’, ‘x_right_pix’, and ‘y_right_pix’, respectively.

[5]:
column_map = {
    'timestamp': 'time',
    'x': 'x_right_pix',
    'y': 'y_right_pix',
}

Here, we specify that the separator in the CSV files is a tab (‘:nbsphinx-math:`t’`), and we provide the list of original column names and desired column names as the ‘columns’ and ‘new_columns’ parameters, respectively.

[6]:
read_csv_kwargs = {
    'sep': '\t',
    'columns': list(column_map.keys()),
    'new_columns': list(column_map.values()),
}

Define and load the Dataset#

Finaly we create a Dataset instance by passing in the root directory, Experiment instance, and other optional parameters such as the filename regular expression and custom CSV reading parameters. The dataset_dirname, raw_dirname, preprocessed_dirname, and events_dirname parameters define the names of the directories for the dataset, raw data, preprocessed data, and events data, respectively.

[7]:
# Define the path to the dataset directory
dataset_dir = './data/ToyDataset/'

# Set up the Dataset object
dataset = pm.datasets.Dataset(
    root=dataset_dir,
    experiment=experiment,
    filename_regex=filename_regex,
    filename_regex_dtypes=filename_regex_dtypes,
    custom_read_kwargs=read_csv_kwargs,
    dataset_dirname='.',
    raw_dirname='raw',
    preprocessed_dirname='preprocessed',
    events_dirname='events',
)

Now we can load the dataset. Here we select a subset including the first page of texts with ID 1 and 2.

[8]:
subset = {
    'text_id': [1, 2],
    'page_id': 1,
}

dataset.load(subset=subset)
100%|██████████| 2/2 [00:00<00:00, 184.06it/s]

Use the Dataset#

Once we have created the Dataset instance, we can use its methods to preprocess and analyze data in our local dataset.

[9]:
dataset.gaze[0].frame
[9]:
shape: (23054, 5)
text_idpage_idtimex_right_pixy_right_pix
i64i64f64f64f64
112.415266e6176.8140.2
112.415267e6176.7139.8
112.415268e6176.7139.3
112.415269e6176.6139.3
112.41527e6176.7139.3
112.415271e6176.8139.5
112.415272e6177.3139.8
112.415273e6177.8140.0
112.415274e6178.3140.0
112.415275e6178.3139.9
112.415276e6178.0140.2
112.415277e6177.7140.4
...............
112.438308e6649.1633.7
112.438309e6648.8633.9
112.43831e6649.1634.1
112.438311e6649.6634.2
112.438312e6650.1634.1
112.438313e6650.0634.0
112.438314e6649.9633.9
112.438315e6649.9633.9
112.438316e6650.1633.7
112.438317e6650.2633.5
112.438318e6650.0633.2
112.438319e6649.7633.1

Here we use the pix2deg method to convert the pixel coordinates to degrees of visual angle.

[10]:
dataset.pix2deg()

dataset.gaze[0].frame
100%|██████████| 2/2 [00:00<00:00, 453.88it/s]
[10]:
shape: (23054, 7)
text_idpage_idtimex_right_pixy_right_pixy_right_posx_right_pos
i64i64f64f64f64f64f64
112.415266e6176.8140.2-12.297242-8.259494
112.415267e6176.7139.8-12.306793-8.261927
112.415268e6176.7139.3-12.318732-8.261927
112.415269e6176.6139.3-12.318732-8.264361
112.41527e6176.7139.3-12.318732-8.261927
112.415271e6176.8139.5-12.313957-8.259494
112.415272e6177.3139.8-12.306793-8.247325
112.415273e6177.8140.0-12.302018-8.235155
112.415274e6178.3140.0-12.302018-8.222985
112.415275e6178.3139.9-12.304406-8.222985
112.415276e6178.0140.2-12.297242-8.230287
112.415277e6177.7140.4-12.292466-8.237589
.....................
112.438308e6649.1633.7-0.1450823.415265
112.438309e6648.8633.9-0.1400793.407836
112.43831e6649.1634.1-0.1350773.415265
112.438311e6649.6634.2-0.1325753.427645
112.438312e6650.1634.1-0.1350773.440025
112.438313e6650.0634.0-0.1375783.437549
112.438314e6649.9633.9-0.1400793.435073
112.438315e6649.9633.9-0.1400793.435073
112.438316e6650.1633.7-0.1450823.440025
112.438317e6650.2633.5-0.1500853.442501
112.438318e6650.0633.2-0.1575893.437549
112.438319e6649.7633.1-0.1600913.430121

We can use the pos2vel method to calculate the velocity of the gaze position.

[11]:
dataset.pos2vel(method='savitzky_golay', window_length=7, polyorder=2)

dataset.gaze[0].frame
100%|██████████| 2/2 [00:00<00:00, 293.35it/s]
[11]:
shape: (23054, 9)
text_idpage_idtimex_right_pixy_right_pixy_right_posx_right_posx_right_vely_right_vel
i64i64f64f64f64f64f64f64f64
112.415266e6176.8140.2-12.297242-8.259494-5.302027-13.473668
112.415267e6176.7139.8-12.306793-8.261927-3.04214-9.49412
112.415268e6176.7139.3-12.318732-8.261927-0.782253-5.514573
112.415269e6176.6139.3-12.318732-8.2643611.477633-1.535025
112.41527e6176.7139.3-12.318732-8.2619274.0852961.53496
112.415271e6176.8139.5-12.313957-8.2594946.7800253.411015
112.415272e6177.3139.8-12.306793-8.2473258.0839583.15519
112.415273e6177.8140.0-12.302018-8.2351556.8670453.155252
112.415274e6178.3140.0-12.302018-8.2229853.9985212.899534
112.415275e6178.3139.9-12.304406-8.2229850.3476683.411407
112.415276e6178.0140.2-12.297242-8.230287-1.7385964.093787
112.415277e6177.7140.4-12.292466-8.237589-1.9994055.287927
...........................
112.438308e6649.1633.7-0.1450823.415265-0.2653120.714688
112.438309e6648.8633.9-0.1400793.4078362.2107772.590745
112.43831e6649.1634.1-0.1350773.4152654.067862.590745
112.438311e6649.6634.2-0.1325753.4276455.1290650.714688
112.438312e6650.1634.1-0.1350773.4400254.686916-0.536016
112.438313e6650.0634.0-0.1375783.4375493.006674-1.786721
112.438314e6649.9633.9-0.1400793.4350731.503314-2.680081
112.438315e6649.9633.9-0.1400793.4350730.265288-3.484104
112.438316e6650.1633.7-0.1450823.440025-0.353725-4.020119
112.438317e6650.2633.5-0.1500853.442501-1.650699-4.913478
112.438318e6650.0633.2-0.1575893.437549-2.947673-5.806836
112.438319e6649.7633.1-0.1600913.430121-4.244647-6.700195