Preprocessing Raw Gaze Data#
What you will learn in this tutorial:#
how to transform pixel coordinates into degrees of visual angle
how to transform positional data into velocity data
Preparations#
We import pymovements
as the alias pm
for convenience.
[1]:
import pymovements as pm
/home/docs/checkouts/readthedocs.org/user_builds/pymovements/envs/stable/lib/python3.9/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
from .autonotebook import tqdm as notebook_tqdm
Let’s start by downloading our ToyDataset
and loading in its data:
[2]:
dataset = pm.Dataset('ToyDataset', path='data/ToyDataset')
dataset.download()
dataset.load()
Using already downloaded and verified file: data/ToyDataset/downloads/pymovements-toy-dataset.zip
Extracting pymovements-toy-dataset.zip to data/ToyDataset/raw
100%|██████████| 20/20 [00:00<00:00, 20.53it/s]
[2]:
<pymovements.dataset.dataset.Dataset at 0x7f4e8cabee80>
We can verify that all files have been loaded in by checking the fileinfo
attribute:
[3]:
dataset.fileinfo
[3]:
text_id | page_id | filepath |
---|---|---|
i64 | i64 | str |
0 | 1 | "aeye-lab-pymov… |
0 | 2 | "aeye-lab-pymov… |
0 | 3 | "aeye-lab-pymov… |
0 | 4 | "aeye-lab-pymov… |
0 | 5 | "aeye-lab-pymov… |
1 | 1 | "aeye-lab-pymov… |
1 | 2 | "aeye-lab-pymov… |
1 | 3 | "aeye-lab-pymov… |
1 | 4 | "aeye-lab-pymov… |
1 | 5 | "aeye-lab-pymov… |
2 | 1 | "aeye-lab-pymov… |
2 | 2 | "aeye-lab-pymov… |
2 | 3 | "aeye-lab-pymov… |
2 | 4 | "aeye-lab-pymov… |
2 | 5 | "aeye-lab-pymov… |
3 | 1 | "aeye-lab-pymov… |
3 | 2 | "aeye-lab-pymov… |
3 | 3 | "aeye-lab-pymov… |
3 | 4 | "aeye-lab-pymov… |
3 | 5 | "aeye-lab-pymov… |
Now let’s inpect our gaze dataframe:
[4]:
dataset.gaze[0].frame.head()
[4]:
time | stimuli_x | stimuli_y | text_id | page_id | pixel |
---|---|---|---|---|---|
f32 | f32 | f32 | i64 | i64 | list[f32] |
1.988145e6 | -1.0 | -1.0 | 0 | 1 | [206.800003, 152.399994] |
1.988146e6 | -1.0 | -1.0 | 0 | 1 | [206.899994, 152.100006] |
1.988147e6 | -1.0 | -1.0 | 0 | 1 | [207.0, 151.800003] |
1.988148e6 | -1.0 | -1.0 | 0 | 1 | [207.100006, 151.699997] |
1.988149e6 | -1.0 | -1.0 | 0 | 1 | [207.0, 151.5] |
Apart from some trial identifier columns we see the columns time
and pixel
.
Preprocessing#
We now want to transform these pixel position coordinates into coordinates in degrees of visual angle. This is simply done by:
[5]:
dataset.pix2deg()
dataset.gaze[0].frame
100%|██████████| 20/20 [00:01<00:00, 10.48it/s]
[5]:
time | stimuli_x | stimuli_y | text_id | page_id | pixel | position |
---|---|---|---|---|---|---|
f32 | f32 | f32 | i64 | i64 | list[f32] | list[f32] |
1.988145e6 | -1.0 | -1.0 | 0 | 1 | [206.800003, 152.399994] | [-10.697598, -8.8524] |
1.988146e6 | -1.0 | -1.0 | 0 | 1 | [206.899994, 152.100006] | [-10.695184, -8.859678] |
1.988147e6 | -1.0 | -1.0 | 0 | 1 | [207.0, 151.800003] | [-10.692768, -8.866957] |
1.988148e6 | -1.0 | -1.0 | 0 | 1 | [207.100006, 151.699997] | [-10.690351, -8.869382] |
1.988149e6 | -1.0 | -1.0 | 0 | 1 | [207.0, 151.5] | [-10.692768, -8.874233] |
1.98815e6 | -1.0 | -1.0 | 0 | 1 | [207.0, 151.300003] | [-10.692768, -8.879086] |
1.988151e6 | -1.0 | -1.0 | 0 | 1 | [207.199997, 151.399994] | [-10.687937, -8.87666] |
1.988152e6 | -1.0 | -1.0 | 0 | 1 | [207.399994, 151.600006] | [-10.683106, -8.871807] |
1.988153e6 | -1.0 | -1.0 | 0 | 1 | [207.600006, 151.899994] | [-10.678275, -8.864531] |
1.988154e6 | -1.0 | -1.0 | 0 | 1 | [207.699997, 152.100006] | [-10.675859, -8.859678] |
1.988155e6 | -1.0 | -1.0 | 0 | 1 | [207.699997, 152.100006] | [-10.675859, -8.859678] |
1.988156e6 | -1.0 | -1.0 | 0 | 1 | [207.699997, 152.199997] | [-10.675859, -8.857252] |
… | … | … | … | … | … | … |
2.005356e6 | -1.0 | -1.0 | 0 | 1 | [370.399994, 419.0] | [-6.700617, -2.297363] |
2.005357e6 | -1.0 | -1.0 | 0 | 1 | [371.200012, 419.0] | [-6.680877, -2.297363] |
2.005358e6 | -1.0 | -1.0 | 0 | 1 | [371.100006, 418.899994] | [-6.683344, -2.299844] |
2.005359e6 | -1.0 | -1.0 | 0 | 1 | [369.899994, 418.700012] | [-6.712954, -2.304806] |
2.00536e6 | -1.0 | -1.0 | 0 | 1 | [368.100006, 418.100006] | [-6.75736, -2.319691] |
2.005361e6 | -1.0 | -1.0 | 0 | 1 | [365.899994, 417.100006] | [-6.811623, -2.3445] |
2.005362e6 | -1.0 | -1.0 | 0 | 1 | [363.299988, 416.299988] | [-6.875737, -2.364347] |
2.005363e6 | -1.0 | -1.0 | 0 | 1 | [361.0, 415.399994] | [-6.932438, -2.386672] |
2.005364e6 | -1.0 | -1.0 | 0 | 1 | [358.0, 414.5] | [-7.006376, -2.408998] |
2.005365e6 | -1.0 | -1.0 | 0 | 1 | [355.799988, 413.799988] | [-7.060582, -2.426362] |
2.005366e6 | -1.0 | -1.0 | 0 | 1 | [353.100006, 413.200012] | [-7.12709, -2.441245] |
2.005367e6 | -1.0 | -1.0 | 0 | 1 | [351.200012, 412.899994] | [-7.173881, -2.448686] |
The processed result has been added as a new column named position
to our gaze dataframe.
Additionally we would like to have velocity data available too. We have four different methods available:
preceding
: this will just take the single preceding sample in account for velocity calculation. Most noisy variant.neighbors
: this will take the neighboring samples in account for velocity calculation. A bit less noisy.smooth
: this will increase the neighboring samples to two on each side. You can get a smooth conversion this way.savitzky_golay
: this is using the Savitzky-Golay differentiation filter for conversion. You can specify additional parameters likewindow_length
anddegree
. Depending on your parameters this will lead to the best results.
Let’s use the fivepoint
method first:
[6]:
dataset.pos2vel(method='fivepoint')
dataset.gaze[0].frame
100%|██████████| 20/20 [00:01<00:00, 19.40it/s]
[6]:
time | stimuli_x | stimuli_y | text_id | page_id | pixel | position | velocity |
---|---|---|---|---|---|---|---|
f32 | f32 | f32 | i64 | i64 | list[f32] | list[f32] | list[f32] |
1.988145e6 | -1.0 | -1.0 | 0 | 1 | [206.800003, 152.399994] | [-10.697598, -8.8524] | [null, null] |
1.988146e6 | -1.0 | -1.0 | 0 | 1 | [206.899994, 152.100006] | [-10.695184, -8.859678] | [null, null] |
1.988147e6 | -1.0 | -1.0 | 0 | 1 | [207.0, 151.800003] | [-10.692768, -8.866957] | [1.610438, -5.256017] |
1.988148e6 | -1.0 | -1.0 | 0 | 1 | [207.100006, 151.699997] | [-10.690351, -8.869382] | [0.40261, -4.447301] |
1.988149e6 | -1.0 | -1.0 | 0 | 1 | [207.0, 151.5] | [-10.692768, -8.874233] | [0.402451, -3.234386] |
1.98815e6 | -1.0 | -1.0 | 0 | 1 | [207.0, 151.300003] | [-10.692768, -8.879086] | [2.012571, -0.808557] |
1.988151e6 | -1.0 | -1.0 | 0 | 1 | [207.199997, 151.399994] | [-10.687937, -8.87666] | [4.025777, 2.830188] |
1.988152e6 | -1.0 | -1.0 | 0 | 1 | [207.399994, 151.600006] | [-10.683106, -8.871807] | [4.428546, 5.256176] |
1.988153e6 | -1.0 | -1.0 | 0 | 1 | [207.600006, 151.899994] | [-10.678275, -8.864531] | [3.220717, 4.851818] |
1.988154e6 | -1.0 | -1.0 | 0 | 1 | [207.699997, 152.100006] | [-10.675859, -8.859678] | [1.610438, 3.234545] |
1.988155e6 | -1.0 | -1.0 | 0 | 1 | [207.699997, 152.100006] | [-10.675859, -8.859678] | [0.000159, 1.617432] |
1.988156e6 | -1.0 | -1.0 | 0 | 1 | [207.699997, 152.199997] | [-10.675859, -8.857252] | [-0.805219, 1.213074] |
… | … | … | … | … | … | … | … |
2.005356e6 | -1.0 | -1.0 | 0 | 1 | [370.399994, 419.0] | [-6.700617, -2.297363] | [30.837774, 1.65391] |
2.005357e6 | -1.0 | -1.0 | 0 | 1 | [371.200012, 419.0] | [-6.680877, -2.297363] | [7.401864, -1.240412] |
2.005358e6 | -1.0 | -1.0 | 0 | 1 | [371.100006, 418.899994] | [-6.683344, -2.299844] | [-14.803171, -4.961729] |
2.005359e6 | -1.0 | -1.0 | 0 | 1 | [369.899994, 418.700012] | [-6.712954, -2.304806] | [-34.126919, -11.16403] |
2.00536e6 | -1.0 | -1.0 | 0 | 1 | [368.100006, 418.100006] | [-6.75736, -2.319691] | [-48.510315, -17.366093] |
2.005361e6 | -1.0 | -1.0 | 0 | 1 | [365.899994, 417.100006] | [-6.811623, -2.3445] | [-56.310177, -21.087051] |
2.005362e6 | -1.0 | -1.0 | 0 | 1 | [363.299988, 416.299988] | [-6.875737, -2.364347] | [-61.638596, -21.913252] |
2.005363e6 | -1.0 | -1.0 | 0 | 1 | [361.0, 415.399994] | [-6.932438, -2.386672] | [-63.266281, -21.085701] |
2.005364e6 | -1.0 | -1.0 | 0 | 1 | [358.0, 414.5] | [-7.006376, -2.408998] | [-63.249668, -19.431353] |
2.005365e6 | -1.0 | -1.0 | 0 | 1 | [355.799988, 413.799988] | [-7.060582, -2.426362] | [-60.359718, -15.709997] |
2.005366e6 | -1.0 | -1.0 | 0 | 1 | [353.100006, 413.200012] | [-7.12709, -2.441245] | [null, null] |
2.005367e6 | -1.0 | -1.0 | 0 | 1 | [351.200012, 412.899994] | [-7.173881, -2.448686] | [null, null] |
The processed result has been added as a new column named velocity
to our gaze dataframe.
We can also use the Savitzky-Golay differentiation filter with some additional parameters like this:
[7]:
dataset.pos2vel(method='savitzky_golay', degree=2, window_length=7)
dataset.gaze[0].frame
100%|██████████| 20/20 [00:01<00:00, 18.55it/s]
[7]:
time | stimuli_x | stimuli_y | text_id | page_id | pixel | position | velocity |
---|---|---|---|---|---|---|---|
f32 | f32 | f32 | i64 | i64 | list[f32] | list[f32] | list[f32] |
1.988145e6 | -1.0 | -1.0 | 0 | 1 | [206.800003, 152.399994] | [-10.697598, -8.8524] | [1.207726, -3.11923] |
1.988146e6 | -1.0 | -1.0 | 0 | 1 | [206.899994, 152.100006] | [-10.695184, -8.859678] | [1.207692, -4.072189] |
1.988147e6 | -1.0 | -1.0 | 0 | 1 | [207.0, 151.800003] | [-10.692768, -8.866957] | [1.035145, -4.765272] |
1.988148e6 | -1.0 | -1.0 | 0 | 1 | [207.100006, 151.699997] | [-10.690351, -8.869382] | [1.207726, -4.245451] |
1.988149e6 | -1.0 | -1.0 | 0 | 1 | [207.0, 151.5] | [-10.692768, -8.874233] | [1.552786, -2.339193] |
1.98815e6 | -1.0 | -1.0 | 0 | 1 | [207.0, 151.300003] | [-10.692768, -8.879086] | [2.242872, 0.000034] |
1.988151e6 | -1.0 | -1.0 | 0 | 1 | [207.199997, 151.399994] | [-10.687937, -8.87666] | [2.932991, 1.992668] |
1.988152e6 | -1.0 | -1.0 | 0 | 1 | [207.399994, 151.600006] | [-10.683106, -8.871807] | [3.364461, 3.378902] |
1.988153e6 | -1.0 | -1.0 | 0 | 1 | [207.600006, 151.899994] | [-10.678275, -8.864531] | [2.933128, 3.985473] |
1.988154e6 | -1.0 | -1.0 | 0 | 1 | [207.699997, 152.100006] | [-10.675859, -8.859678] | [1.639094, 3.29239] |
1.988155e6 | -1.0 | -1.0 | 0 | 1 | [207.699997, 152.100006] | [-10.675859, -8.859678] | [0.517641, 2.425943] |
1.988156e6 | -1.0 | -1.0 | 0 | 1 | [207.699997, 152.199997] | [-10.675859, -8.857252] | [-0.25882, 0.953129] |
… | … | … | … | … | … | … | … |
2.005356e6 | -1.0 | -1.0 | 0 | 1 | [370.399994, 419.0] | [-6.700617, -2.297363] | [30.127287, 2.215104] |
2.005357e6 | -1.0 | -1.0 | 0 | 1 | [371.200012, 419.0] | [-6.680877, -2.297363] | [8.104988, -1.772072] |
2.005358e6 | -1.0 | -1.0 | 0 | 1 | [371.100006, 418.899994] | [-6.683344, -2.299844] | [-12.8627, -6.64522] |
2.005359e6 | -1.0 | -1.0 | 0 | 1 | [369.899994, 418.700012] | [-6.712954, -2.304806] | [-30.745234, -11.25254] |
2.00536e6 | -1.0 | -1.0 | 0 | 1 | [368.100006, 418.100006] | [-6.75736, -2.319691] | [-44.219154, -15.593843] |
2.005361e6 | -1.0 | -1.0 | 0 | 1 | [365.899994, 417.100006] | [-6.811623, -2.3445] | [-54.51572, -19.137569] |
2.005362e6 | -1.0 | -1.0 | 0 | 1 | [363.299988, 416.299988] | [-6.875737, -2.364347] | [-59.347614, -20.909182] |
2.005363e6 | -1.0 | -1.0 | 0 | 1 | [361.0, 415.399994] | [-6.932438, -2.386672] | [-62.0625, -20.465605] |
2.005364e6 | -1.0 | -1.0 | 0 | 1 | [358.0, 414.5] | [-7.006376, -2.408998] | [-61.343773, -18.07303] |
2.005365e6 | -1.0 | -1.0 | 0 | 1 | [355.799988, 413.799988] | [-7.060582, -2.426362] | [-53.501213, -14.617588] |
2.005366e6 | -1.0 | -1.0 | 0 | 1 | [353.100006, 413.200012] | [-7.12709, -2.441245] | [-41.879959, -10.276445] |
2.005367e6 | -1.0 | -1.0 | 0 | 1 | [351.200012, 412.899994] | [-7.173881, -2.448686] | [-27.710863, -6.112601] |
This has overwritten our velocity columns. As we see, the values in the velocity columns are slightly different.
What you have learned in this tutorial:#
transforming pixel coordinates into degrees of visual angle by using
Dataset.pix2deg()
transforming positional data into velocity data by using
Dataset.pos2vel()
passing additional keyword arguments when using the Savitzky-Golay differentiation filter