Preprocessing Raw Gaze Data#
What you will learn in this tutorial:#
how to transform pixel coordinates into degrees of visual angle
how to transform positional data into velocity data
Preparations#
We import pymovements as the alias pm for convenience.
[1]:
import pymovements as pm
Let’s start by downloading our ToyDataset and loading in its data:
[2]:
dataset = pm.Dataset('ToyDataset', path='data/ToyDataset')
dataset.download()
dataset.load()
INFO:pymovements.dataset.dataset:
You are downloading the pymovements Toy Dataset. Please be aware that pymovements does not
host or distribute any dataset resources and only provides a convenient interface to
download the public dataset resources that were published by their respective authors.
Please cite the referenced publication if you intend to use the dataset in your research.
Using already downloaded and verified file: data/ToyDataset/downloads/pymovements-toy-dataset.zip
Extracting pymovements-toy-dataset.zip to data/ToyDataset/raw
100%|██████████| 23/23 [00:00<00:00, 306.27it/s]
[2]:
-
DatasetDefinitionDatasetDefinition
-
NoneNone
-
dict (0 items)
-
dict (1 items)
-
dict (4 items)
-
list (5 items)
- 'timestamp'
- 'x'
- (3 more)
-
dict (5 items)
-
Float64Float64
-
Float64Float64
- (3 more)
-
- (2 more)
-
-
-
NoneNone
-
ExperimentExperiment
-
EyeTrackerEyeTracker
-
NoneNone
-
NoneNone
-
NoneNone
-
NoneNone
-
10001000
-
NoneNone
-
NoneNone
-
-
10001000
-
ScreenScreen
-
6868
-
30.230.2
-
10241024
-
'upper left''upper left'
-
3838
-
12801280
-
15.59938648778295315.599386487782953
-
-15.599386487782953-15.599386487782953
-
12.50804441088254612.508044410882546
-
-12.508044410882546-12.508044410882546
-
-
-
NoneNone
-
dict (1 items)
-
'trial_{text_id:d}_{page_id:d}.csv''trial_{text_id:d}_{page_id:d}.csv'
-
-
dict (1 items)
-
dict (2 items)
-
<class 'int'><class 'int'>
-
<class 'int'><class 'int'>
-
-
-
TrueTrue
-
'pymovements Toy Dataset''pymovements Toy Dataset'
-
dict (0 items)
-
'ToyDataset''ToyDataset'
-
list (2 items)
- 'x'
- 'y'
-
NoneNone
-
list (1 items)
-
ResourceDefinition
-
'gaze''gaze'
-
'pymovements-toy-dataset.zip''pymovements-toy-dataset.zip'
-
'trial_{text_id:d}_{page_id:d}.csv''trial_{text_id:d}_{page_id:d}.csv'
-
dict (2 items)
-
<class 'int'><class 'int'>
-
<class 'int'><class 'int'>
-
-
NoneNone
-
NoneNone
-
'256901852c1c07581d375eef705855d6''256901852c1c07581d375eef705855d6'
-
NoneNone
-
str'https://github.com/pymovements/pymovements-toy-dataset/archive/refs/heads/main.zip'
-
-
ResourceDefinition
-
'timestamp''timestamp'
-
'ms''ms'
-
NoneNone
-
NoneNone
-
-
list (0 items)
-
dict (1 items)
-
DataFrame (5 columns, 20 rows)shape: (20, 5)
text_id page_id filepath load_function load_kwargs i64 i64 str null null 0 1 "pymovements-toy-dataset-main/d… null null 0 2 "pymovements-toy-dataset-main/d… null null 0 3 "pymovements-toy-dataset-main/d… null null 0 4 "pymovements-toy-dataset-main/d… null null 0 5 "pymovements-toy-dataset-main/d… null null … … … … … 3 1 "pymovements-toy-dataset-main/d… null null 3 2 "pymovements-toy-dataset-main/d… null null 3 3 "pymovements-toy-dataset-main/d… null null 3 4 "pymovements-toy-dataset-main/d… null null 3 5 "pymovements-toy-dataset-main/d… null null
-
-
list (20 items)
-
Gaze
-
DataFrame (6 columns, 17223 rows)shape: (17_223, 6)
time stimuli_x stimuli_y text_id page_id pixel i64 f64 f64 i64 i64 list[f64] 1988145 -1.0 -1.0 0 1 [206.8, 152.4] 1988146 -1.0 -1.0 0 1 [206.9, 152.1] 1988147 -1.0 -1.0 0 1 [207.0, 151.8] 1988148 -1.0 -1.0 0 1 [207.1, 151.7] 1988149 -1.0 -1.0 0 1 [207.0, 151.5] … … … … … … 2005363 -1.0 -1.0 0 1 [361.0, 415.4] 2005364 -1.0 -1.0 0 1 [358.0, 414.5] 2005365 -1.0 -1.0 0 1 [355.8, 413.8] 2005366 -1.0 -1.0 0 1 [353.1, 413.2] 2005367 -1.0 -1.0 0 1 [351.2, 412.9] -
EventsEvents
-
DataFrame (6 columns, 0 rows)shape: (0, 6)
text_id page_id name onset offset duration i64 i64 str i64 i64 i64 -
list (2 items)
- 'text_id'
- 'page_id'
-
-
list (2 items)
- 'text_id'
- 'page_id'
-
ExperimentExperiment
-
EyeTrackerEyeTracker
-
NoneNone
-
NoneNone
-
NoneNone
-
NoneNone
-
10001000
-
NoneNone
-
NoneNone
-
-
10001000
-
ScreenScreen
-
6868
-
30.230.2
-
10241024
-
'upper left''upper left'
-
3838
-
12801280
-
15.59938648778295315.599386487782953
-
-15.599386487782953-15.599386487782953
-
12.50804441088254612.508044410882546
-
-12.508044410882546-12.508044410882546
-
-
-
-
Gaze
-
DataFrame (6 columns, 29799 rows)shape: (29_799, 6)
time stimuli_x stimuli_y text_id page_id pixel i64 f64 f64 i64 i64 list[f64] 2008305 -1.0 -1.0 0 2 [141.4, 153.6] 2008306 -1.0 -1.0 0 2 [141.1, 153.2] 2008307 -1.0 -1.0 0 2 [140.7, 152.8] 2008308 -1.0 -1.0 0 2 [140.6, 152.7] 2008309 -1.0 -1.0 0 2 [140.5, 152.6] … … … … … … 2038099 -1.0 -1.0 0 2 [273.8, 773.8] 2038100 -1.0 -1.0 0 2 [273.8, 774.1] 2038101 -1.0 -1.0 0 2 [273.9, 774.5] 2038102 -1.0 -1.0 0 2 [274.0, 774.4] 2038103 -1.0 -1.0 0 2 [274.0, 773.9] -
EventsEvents
-
DataFrame (6 columns, 0 rows)shape: (0, 6)
text_id page_id name onset offset duration i64 i64 str i64 i64 i64 -
list (2 items)
- 'text_id'
- 'page_id'
-
-
list (2 items)
- 'text_id'
- 'page_id'
-
ExperimentExperiment
-
EyeTrackerEyeTracker
-
NoneNone
-
NoneNone
-
NoneNone
-
NoneNone
-
10001000
-
NoneNone
-
NoneNone
-
-
10001000
-
ScreenScreen
-
6868
-
30.230.2
-
10241024
-
'upper left''upper left'
-
3838
-
12801280
-
15.59938648778295315.599386487782953
-
-15.599386487782953-15.599386487782953
-
12.50804441088254612.508044410882546
-
-12.508044410882546-12.508044410882546
-
-
-
- (18 more)
-
Gaze
-
PosixPath('data/ToyDataset')PosixPath('data/ToyDataset')
-
DatasetPathsDatasetPaths
-
PosixPath('data/ToyDataset')PosixPath('data/ToyDataset')
-
PosixPath('data/ToyDataset/downloads')PosixPath('data/ToyDataset/downloads')
-
PosixPath('data/ToyDataset/events')PosixPath('data/ToyDataset/events')
-
PosixPath('data/ToyDataset/precomputed_events')PosixPath('data/ToyDataset/precomputed_events')
-
PosixPathPosixPath('data/ToyDataset/precomputed_reading_measures')
-
PosixPath('data/ToyDataset/preprocessed')PosixPath('data/ToyDataset/preprocessed')
-
PosixPath('data/ToyDataset/raw')PosixPath('data/ToyDataset/raw')
-
PosixPath('data/ToyDataset')PosixPath('data/ToyDataset')
-
-
list (0 items)
-
list (0 items)
We can verify that all files have been loaded in by checking the fileinfo attribute:
[3]:
dataset.fileinfo
[3]:
{'gaze': shape: (20, 5)
┌─────────┬─────────┬─────────────────────────────────┬───────────────┬─────────────┐
│ text_id ┆ page_id ┆ filepath ┆ load_function ┆ load_kwargs │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str ┆ null ┆ null │
╞═════════╪═════════╪═════════════════════════════════╪═══════════════╪═════════════╡
│ 0 ┆ 1 ┆ pymovements-toy-dataset-main/d… ┆ null ┆ null │
│ 0 ┆ 2 ┆ pymovements-toy-dataset-main/d… ┆ null ┆ null │
│ 0 ┆ 3 ┆ pymovements-toy-dataset-main/d… ┆ null ┆ null │
│ 0 ┆ 4 ┆ pymovements-toy-dataset-main/d… ┆ null ┆ null │
│ 0 ┆ 5 ┆ pymovements-toy-dataset-main/d… ┆ null ┆ null │
│ … ┆ … ┆ … ┆ … ┆ … │
│ 3 ┆ 1 ┆ pymovements-toy-dataset-main/d… ┆ null ┆ null │
│ 3 ┆ 2 ┆ pymovements-toy-dataset-main/d… ┆ null ┆ null │
│ 3 ┆ 3 ┆ pymovements-toy-dataset-main/d… ┆ null ┆ null │
│ 3 ┆ 4 ┆ pymovements-toy-dataset-main/d… ┆ null ┆ null │
│ 3 ┆ 5 ┆ pymovements-toy-dataset-main/d… ┆ null ┆ null │
└─────────┴─────────┴─────────────────────────────────┴───────────────┴─────────────┘}
Now let’s inpect our gaze dataframe:
[4]:
dataset.gaze[0]
[4]:
-
DataFrame (6 columns, 17223 rows)shape: (17_223, 6)
time stimuli_x stimuli_y text_id page_id pixel i64 f64 f64 i64 i64 list[f64] 1988145 -1.0 -1.0 0 1 [206.8, 152.4] 1988146 -1.0 -1.0 0 1 [206.9, 152.1] 1988147 -1.0 -1.0 0 1 [207.0, 151.8] 1988148 -1.0 -1.0 0 1 [207.1, 151.7] 1988149 -1.0 -1.0 0 1 [207.0, 151.5] … … … … … … 2005363 -1.0 -1.0 0 1 [361.0, 415.4] 2005364 -1.0 -1.0 0 1 [358.0, 414.5] 2005365 -1.0 -1.0 0 1 [355.8, 413.8] 2005366 -1.0 -1.0 0 1 [353.1, 413.2] 2005367 -1.0 -1.0 0 1 [351.2, 412.9] -
EventsEvents
-
DataFrame (6 columns, 0 rows)shape: (0, 6)
text_id page_id name onset offset duration i64 i64 str i64 i64 i64 -
list (2 items)
- 'text_id'
- 'page_id'
-
-
list (2 items)
- 'text_id'
- 'page_id'
-
ExperimentExperiment
-
EyeTrackerEyeTracker
-
NoneNone
-
NoneNone
-
NoneNone
-
NoneNone
-
10001000
-
NoneNone
-
NoneNone
-
-
10001000
-
ScreenScreen
-
6868
-
30.230.2
-
10241024
-
'upper left''upper left'
-
3838
-
12801280
-
15.59938648778295315.599386487782953
-
-15.599386487782953-15.599386487782953
-
12.50804441088254612.508044410882546
-
-12.508044410882546-12.508044410882546
-
-
Apart from some trial identifier columns we see the columns time and pixel.
Preprocessing#
We now want to transform these pixel position coordinates into coordinates in degrees of visual angle. This is simply done by:
[5]:
dataset.pix2deg()
dataset.gaze[0]
[5]:
-
DataFrame (7 columns, 17223 rows)shape: (17_223, 7)
time stimuli_x stimuli_y text_id page_id pixel position i64 f64 f64 i64 i64 list[f64] list[f64] 1988145 -1.0 -1.0 0 1 [206.8, 152.4] [-10.697598, -8.852399] 1988146 -1.0 -1.0 0 1 [206.9, 152.1] [-10.695183, -8.859678] 1988147 -1.0 -1.0 0 1 [207.0, 151.8] [-10.692768, -8.866956] 1988148 -1.0 -1.0 0 1 [207.1, 151.7] [-10.690352, -8.869381] 1988149 -1.0 -1.0 0 1 [207.0, 151.5] [-10.692768, -8.874233] … … … … … … … 2005363 -1.0 -1.0 0 1 [361.0, 415.4] [-6.932438, -2.386672] 2005364 -1.0 -1.0 0 1 [358.0, 414.5] [-7.006376, -2.408998] 2005365 -1.0 -1.0 0 1 [355.8, 413.8] [-7.060582, -2.426362] 2005366 -1.0 -1.0 0 1 [353.1, 413.2] [-7.12709, -2.441245] 2005367 -1.0 -1.0 0 1 [351.2, 412.9] [-7.173881, -2.448686] -
EventsEvents
-
DataFrame (6 columns, 0 rows)shape: (0, 6)
text_id page_id name onset offset duration i64 i64 str i64 i64 i64 -
list (2 items)
- 'text_id'
- 'page_id'
-
-
list (2 items)
- 'text_id'
- 'page_id'
-
ExperimentExperiment
-
EyeTrackerEyeTracker
-
NoneNone
-
NoneNone
-
NoneNone
-
NoneNone
-
10001000
-
NoneNone
-
NoneNone
-
-
10001000
-
ScreenScreen
-
6868
-
30.230.2
-
10241024
-
'upper left''upper left'
-
3838
-
12801280
-
15.59938648778295315.599386487782953
-
-15.599386487782953-15.599386487782953
-
12.50804441088254612.508044410882546
-
-12.508044410882546-12.508044410882546
-
-
The processed result has been added as a new column named position to our gaze dataframe.
Additionally we would like to have velocity data available too. We have four different methods available:
preceding: this will just take the single preceding sample in account for velocity calculation. Most noisy variant.neighbors: this will take the neighboring samples in account for velocity calculation. A bit less noisy.smooth: this will increase the neighboring samples to two on each side. You can get a smooth conversion this way.savitzky_golay: this is using the Savitzky-Golay differentiation filter for conversion. You can specify additional parameters likewindow_lengthanddegree. Depending on your parameters this will lead to the best results.
Let’s use the fivepoint method first:
[6]:
dataset.pos2vel(method='fivepoint')
dataset.gaze[0]
[6]:
-
DataFrame (8 columns, 17223 rows)shape: (17_223, 8)
time stimuli_x stimuli_y text_id page_id pixel position velocity i64 f64 f64 i64 i64 list[f64] list[f64] list[f64] 1988145 -1.0 -1.0 0 1 [206.8, 152.4] [-10.697598, -8.852399] [null, null] 1988146 -1.0 -1.0 0 1 [206.9, 152.1] [-10.695183, -8.859678] [null, null] 1988147 -1.0 -1.0 0 1 [207.0, 151.8] [-10.692768, -8.866956] [1.610194, -5.256267] 1988148 -1.0 -1.0 0 1 [207.1, 151.7] [-10.690352, -8.869381] [0.402548, -4.447465] 1988149 -1.0 -1.0 0 1 [207.0, 151.5] [-10.692768, -8.874233] [0.402561, -3.234462] … … … … … … … … 2005363 -1.0 -1.0 0 1 [361.0, 415.4] [-6.932438, -2.386672] [-63.266374, -21.085616] 2005364 -1.0 -1.0 0 1 [358.0, 414.5] [-7.006376, -2.408998] [-63.249652, -19.431326] 2005365 -1.0 -1.0 0 1 [355.8, 413.8] [-7.060582, -2.426362] [-60.359624, -15.710061] 2005366 -1.0 -1.0 0 1 [353.1, 413.2] [-7.12709, -2.441245] [null, null] 2005367 -1.0 -1.0 0 1 [351.2, 412.9] [-7.173881, -2.448686] [null, null] -
EventsEvents
-
DataFrame (6 columns, 0 rows)shape: (0, 6)
text_id page_id name onset offset duration i64 i64 str i64 i64 i64 -
list (2 items)
- 'text_id'
- 'page_id'
-
-
list (2 items)
- 'text_id'
- 'page_id'
-
ExperimentExperiment
-
EyeTrackerEyeTracker
-
NoneNone
-
NoneNone
-
NoneNone
-
NoneNone
-
10001000
-
NoneNone
-
NoneNone
-
-
10001000
-
ScreenScreen
-
6868
-
30.230.2
-
10241024
-
'upper left''upper left'
-
3838
-
12801280
-
15.59938648778295315.599386487782953
-
-15.599386487782953-15.599386487782953
-
12.50804441088254612.508044410882546
-
-12.508044410882546-12.508044410882546
-
-
The processed result has been added as a new column named velocity to our gaze dataframe.
We can also use the Savitzky-Golay differentiation filter with some additional parameters like this:
[7]:
dataset.pos2vel(method='savitzky_golay', degree=2, window_length=7)
dataset.gaze[0]
[7]:
-
DataFrame (8 columns, 17223 rows)shape: (17_223, 8)
time stimuli_x stimuli_y text_id page_id pixel position velocity i64 f64 f64 i64 i64 list[f64] list[f64] list[f64] 1988145 -1.0 -1.0 0 1 [206.8, 152.4] [-10.697598, -8.852399] [1.207641, -3.119165] 1988146 -1.0 -1.0 0 1 [206.9, 152.1] [-10.695183, -8.859678] [1.20764, -4.072198] 1988147 -1.0 -1.0 0 1 [207.0, 151.8] [-10.692768, -8.866956] [1.035119, -4.765267] 1988148 -1.0 -1.0 0 1 [207.1, 151.7] [-10.690352, -8.869381] [1.207654, -4.245382] 1988149 -1.0 -1.0 0 1 [207.0, 151.5] [-10.692768, -8.874233] [1.552735, -2.339263] … … … … … … … … 2005363 -1.0 -1.0 0 1 [361.0, 415.4] [-6.932438, -2.386672] [-62.062479, -20.465552] 2005364 -1.0 -1.0 0 1 [358.0, 414.5] [-7.006376, -2.408998] [-61.343786, -18.073031] 2005365 -1.0 -1.0 0 1 [355.8, 413.8] [-7.060582, -2.426362] [-53.501231, -14.617634] 2005366 -1.0 -1.0 0 1 [353.1, 413.2] [-7.12709, -2.441245] [-41.879965, -10.276475] 2005367 -1.0 -1.0 0 1 [351.2, 412.9] [-7.173881, -2.448686] [-27.710881, -6.112645] -
EventsEvents
-
DataFrame (6 columns, 0 rows)shape: (0, 6)
text_id page_id name onset offset duration i64 i64 str i64 i64 i64 -
list (2 items)
- 'text_id'
- 'page_id'
-
-
list (2 items)
- 'text_id'
- 'page_id'
-
ExperimentExperiment
-
EyeTrackerEyeTracker
-
NoneNone
-
NoneNone
-
NoneNone
-
NoneNone
-
10001000
-
NoneNone
-
NoneNone
-
-
10001000
-
ScreenScreen
-
6868
-
30.230.2
-
10241024
-
'upper left''upper left'
-
3838
-
12801280
-
15.59938648778295315.599386487782953
-
-15.599386487782953-15.599386487782953
-
12.50804441088254612.508044410882546
-
-12.508044410882546-12.508044410882546
-
-
This has overwritten our velocity columns. As we see, the values in the velocity columns are slightly different.
What you have learned in this tutorial:#
transforming pixel coordinates into degrees of visual angle by using
Dataset.pix2deg()transforming positional data into velocity data by using
Dataset.pos2vel()passing additional keyword arguments when using the Savitzky-Golay differentiation filter