Preprocessing Raw Gaze Data#
What you will learn in this tutorial:#
how to transform pixel coordinates into degrees of visual angle
how to transform positional data into velocity data
Preparations#
We import pymovements as the alias pm for convenience.
[1]:
import pymovements as pm
/home/docs/checkouts/readthedocs.org/user_builds/pymovements/envs/v0.16.2/lib/python3.9/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
from .autonotebook import tqdm as notebook_tqdm
Let’s start by downloading our ToyDataset and loading in its data:
[2]:
dataset = pm.Dataset('ToyDataset', path='data/ToyDataset')
dataset.download()
dataset.load()
Using already downloaded and verified file: data/ToyDataset/downloads/pymovements-toy-dataset.zip
Extracting pymovements-toy-dataset.zip to data/ToyDataset/raw
100%|██████████| 20/20 [00:00<00:00, 45.51it/s]
[2]:
<pymovements.dataset.dataset.Dataset at 0x7f2e9ac37c10>
We can verify that all files have been loaded in by checking the fileinfo attribute:
[3]:
dataset.fileinfo
[3]:
| text_id | page_id | filepath |
|---|---|---|
| i64 | i64 | str |
| 0 | 1 | "aeye-lab-pymov… |
| 0 | 2 | "aeye-lab-pymov… |
| 0 | 3 | "aeye-lab-pymov… |
| 0 | 4 | "aeye-lab-pymov… |
| 0 | 5 | "aeye-lab-pymov… |
| 1 | 1 | "aeye-lab-pymov… |
| 1 | 2 | "aeye-lab-pymov… |
| 1 | 3 | "aeye-lab-pymov… |
| 1 | 4 | "aeye-lab-pymov… |
| 1 | 5 | "aeye-lab-pymov… |
| 2 | 1 | "aeye-lab-pymov… |
| 2 | 2 | "aeye-lab-pymov… |
| 2 | 3 | "aeye-lab-pymov… |
| 2 | 4 | "aeye-lab-pymov… |
| 2 | 5 | "aeye-lab-pymov… |
| 3 | 1 | "aeye-lab-pymov… |
| 3 | 2 | "aeye-lab-pymov… |
| 3 | 3 | "aeye-lab-pymov… |
| 3 | 4 | "aeye-lab-pymov… |
| 3 | 5 | "aeye-lab-pymov… |
Now let’s inpect our gaze dataframe:
[4]:
dataset.gaze[0].frame.head()
[4]:
| text_id | page_id | time | stimuli_x | stimuli_y | pixel |
|---|---|---|---|---|---|
| i64 | i64 | f64 | f64 | f64 | list[f64] |
| 0 | 1 | 1.988145e6 | -1.0 | -1.0 | [206.8, 152.4] |
| 0 | 1 | 1.988146e6 | -1.0 | -1.0 | [206.9, 152.1] |
| 0 | 1 | 1.988147e6 | -1.0 | -1.0 | [207.0, 151.8] |
| 0 | 1 | 1.988148e6 | -1.0 | -1.0 | [207.1, 151.7] |
| 0 | 1 | 1.988149e6 | -1.0 | -1.0 | [207.0, 151.5] |
Apart from some trial identifier columns we see the columns time and pixel.
Preprocessing#
We now want to transform these pixel position coordinates into coordinates in degrees of visual angle. This is simply done by:
[5]:
dataset.pix2deg()
dataset.gaze[0].frame
100%|██████████| 20/20 [00:00<00:00, 20.37it/s]
[5]:
| text_id | page_id | time | stimuli_x | stimuli_y | pixel | position |
|---|---|---|---|---|---|---|
| i64 | i64 | f64 | f64 | f64 | list[f64] | list[f64] |
| 0 | 1 | 1.988145e6 | -1.0 | -1.0 | [206.8, 152.4] | [-10.697598, -8.852399] |
| 0 | 1 | 1.988146e6 | -1.0 | -1.0 | [206.9, 152.1] | [-10.695183, -8.859678] |
| 0 | 1 | 1.988147e6 | -1.0 | -1.0 | [207.0, 151.8] | [-10.692768, -8.866956] |
| 0 | 1 | 1.988148e6 | -1.0 | -1.0 | [207.1, 151.7] | [-10.690352, -8.869381] |
| 0 | 1 | 1.988149e6 | -1.0 | -1.0 | [207.0, 151.5] | [-10.692768, -8.874233] |
| 0 | 1 | 1.98815e6 | -1.0 | -1.0 | [207.0, 151.3] | [-10.692768, -8.879085] |
| 0 | 1 | 1.988151e6 | -1.0 | -1.0 | [207.2, 151.4] | [-10.687937, -8.876659] |
| 0 | 1 | 1.988152e6 | -1.0 | -1.0 | [207.4, 151.6] | [-10.683106, -8.871807] |
| 0 | 1 | 1.988153e6 | -1.0 | -1.0 | [207.6, 151.9] | [-10.678275, -8.86453] |
| 0 | 1 | 1.988154e6 | -1.0 | -1.0 | [207.7, 152.1] | [-10.67586, -8.859678] |
| 0 | 1 | 1.988155e6 | -1.0 | -1.0 | [207.7, 152.1] | [-10.67586, -8.859678] |
| 0 | 1 | 1.988156e6 | -1.0 | -1.0 | [207.7, 152.2] | [-10.67586, -8.857252] |
| … | … | … | … | … | … | … |
| 0 | 1 | 2.005356e6 | -1.0 | -1.0 | [370.4, 419.0] | [-6.700617, -2.297363] |
| 0 | 1 | 2.005357e6 | -1.0 | -1.0 | [371.2, 419.0] | [-6.680877, -2.297363] |
| 0 | 1 | 2.005358e6 | -1.0 | -1.0 | [371.1, 418.9] | [-6.683345, -2.299844] |
| 0 | 1 | 2.005359e6 | -1.0 | -1.0 | [369.9, 418.7] | [-6.712953, -2.304806] |
| 0 | 1 | 2.00536e6 | -1.0 | -1.0 | [368.1, 418.1] | [-6.75736, -2.319691] |
| 0 | 1 | 2.005361e6 | -1.0 | -1.0 | [365.9, 417.1] | [-6.811623, -2.3445] |
| 0 | 1 | 2.005362e6 | -1.0 | -1.0 | [363.3, 416.3] | [-6.875737, -2.364346] |
| 0 | 1 | 2.005363e6 | -1.0 | -1.0 | [361.0, 415.4] | [-6.932438, -2.386672] |
| 0 | 1 | 2.005364e6 | -1.0 | -1.0 | [358.0, 414.5] | [-7.006376, -2.408998] |
| 0 | 1 | 2.005365e6 | -1.0 | -1.0 | [355.8, 413.8] | [-7.060582, -2.426362] |
| 0 | 1 | 2.005366e6 | -1.0 | -1.0 | [353.1, 413.2] | [-7.12709, -2.441245] |
| 0 | 1 | 2.005367e6 | -1.0 | -1.0 | [351.2, 412.9] | [-7.173881, -2.448686] |
The processed result has been added as a new column named position to our gaze dataframe.
Additionally we would like to have velocity data available too. We have four different methods available:
preceding: this will just take the single preceding sample in account for velocity calculation. Most noisy variant.neighbors: this will take the neighboring samples in account for velocity calculation. A bit less noisy.smooth: this will increase the neighboring samples to two on each side. You can get a smooth conversion this way.savitzky_golay: this is using the Savitzky-Golay differentiation filter for conversion. You can specify additional parameters likewindow_lengthanddegree. Depending on your parameters this will lead to the best results.
Let’s use the fivepoint method first:
[6]:
dataset.pos2vel(method='fivepoint')
dataset.gaze[0].frame
100%|██████████| 20/20 [00:00<00:00, 36.43it/s]
[6]:
| text_id | page_id | time | stimuli_x | stimuli_y | pixel | position | velocity |
|---|---|---|---|---|---|---|---|
| i64 | i64 | f64 | f64 | f64 | list[f64] | list[f64] | list[f64] |
| 0 | 1 | 1.988145e6 | -1.0 | -1.0 | [206.8, 152.4] | [-10.697598, -8.852399] | [null, null] |
| 0 | 1 | 1.988146e6 | -1.0 | -1.0 | [206.9, 152.1] | [-10.695183, -8.859678] | [null, null] |
| 0 | 1 | 1.988147e6 | -1.0 | -1.0 | [207.0, 151.8] | [-10.692768, -8.866956] | [1.610194, -5.256267] |
| 0 | 1 | 1.988148e6 | -1.0 | -1.0 | [207.1, 151.7] | [-10.690352, -8.869381] | [0.402548, -4.447465] |
| 0 | 1 | 1.988149e6 | -1.0 | -1.0 | [207.0, 151.5] | [-10.692768, -8.874233] | [0.402561, -3.234462] |
| 0 | 1 | 1.98815e6 | -1.0 | -1.0 | [207.0, 151.3] | [-10.692768, -8.879085] | [2.012819, -0.808615] |
| 0 | 1 | 1.988151e6 | -1.0 | -1.0 | [207.2, 151.4] | [-10.687937, -8.876659] | [4.025683, 2.83017] |
| 0 | 1 | 1.988152e6 | -1.0 | -1.0 | [207.4, 151.6] | [-10.683106, -8.871807] | [4.428328, 5.256091] |
| 0 | 1 | 1.988153e6 | -1.0 | -1.0 | [207.6, 151.9] | [-10.678275, -8.86453] | [3.220663, 4.851847] |
| 0 | 1 | 1.988154e6 | -1.0 | -1.0 | [207.7, 152.1] | [-10.67586, -8.859678] | [1.610354, 3.234622] |
| 0 | 1 | 1.988155e6 | -1.0 | -1.0 | [207.7, 152.1] | [-10.67586, -8.859678] | [-2.9606e-13, 1.617343] |
| 0 | 1 | 1.988156e6 | -1.0 | -1.0 | [207.7, 152.2] | [-10.67586, -8.857252] | [-0.805187, 1.213025] |
| … | … | … | … | … | … | … | … |
| 0 | 1 | 2.005356e6 | -1.0 | -1.0 | [370.4, 419.0] | [-6.700617, -2.297363] | [30.837758, 1.653971] |
| 0 | 1 | 2.005357e6 | -1.0 | -1.0 | [371.2, 419.0] | [-6.680877, -2.297363] | [7.401726, -1.240481] |
| 0 | 1 | 2.005358e6 | -1.0 | -1.0 | [371.1, 418.9] | [-6.683345, -2.299844] | [-14.803188, -4.961884] |
| 0 | 1 | 2.005359e6 | -1.0 | -1.0 | [369.9, 418.7] | [-6.712953, -2.304806] | [-34.126826, -11.164066] |
| 0 | 1 | 2.00536e6 | -1.0 | -1.0 | [368.1, 418.1] | [-6.75736, -2.319691] | [-48.510256, -17.366038] |
| 0 | 1 | 2.005361e6 | -1.0 | -1.0 | [365.9, 417.1] | [-6.811623, -2.3445] | [-56.310241, -21.086876] |
| 0 | 1 | 2.005362e6 | -1.0 | -1.0 | [363.3, 416.3] | [-6.875737, -2.364346] | [-61.63851, -21.913173] |
| 0 | 1 | 2.005363e6 | -1.0 | -1.0 | [361.0, 415.4] | [-6.932438, -2.386672] | [-63.266374, -21.085616] |
| 0 | 1 | 2.005364e6 | -1.0 | -1.0 | [358.0, 414.5] | [-7.006376, -2.408998] | [-63.249652, -19.431326] |
| 0 | 1 | 2.005365e6 | -1.0 | -1.0 | [355.8, 413.8] | [-7.060582, -2.426362] | [-60.359624, -15.710061] |
| 0 | 1 | 2.005366e6 | -1.0 | -1.0 | [353.1, 413.2] | [-7.12709, -2.441245] | [null, null] |
| 0 | 1 | 2.005367e6 | -1.0 | -1.0 | [351.2, 412.9] | [-7.173881, -2.448686] | [null, null] |
The processed result has been added as a new column named velocity to our gaze dataframe.
We can also use the Savitzky-Golay differentiation filter with some additional parameters like this:
[7]:
dataset.pos2vel(method='savitzky_golay', degree=2, window_length=7)
dataset.gaze[0].frame
100%|██████████| 20/20 [00:00<00:00, 35.06it/s]
[7]:
| text_id | page_id | time | stimuli_x | stimuli_y | pixel | position | velocity |
|---|---|---|---|---|---|---|---|
| i64 | i64 | f64 | f64 | f64 | list[f64] | list[f64] | list[f64] |
| 0 | 1 | 1.988145e6 | -1.0 | -1.0 | [206.8, 152.4] | [-10.697598, -8.852399] | [1.207641, -3.119165] |
| 0 | 1 | 1.988146e6 | -1.0 | -1.0 | [206.9, 152.1] | [-10.695183, -8.859678] | [1.20764, -4.072198] |
| 0 | 1 | 1.988147e6 | -1.0 | -1.0 | [207.0, 151.8] | [-10.692768, -8.866956] | [1.035119, -4.765267] |
| 0 | 1 | 1.988148e6 | -1.0 | -1.0 | [207.1, 151.7] | [-10.690352, -8.869381] | [1.207654, -4.245382] |
| 0 | 1 | 1.988149e6 | -1.0 | -1.0 | [207.0, 151.5] | [-10.692768, -8.874233] | [1.552735, -2.339263] |
| 0 | 1 | 1.98815e6 | -1.0 | -1.0 | [207.0, 151.3] | [-10.692768, -8.879085] | [2.242885, 0.000009] |
| 0 | 1 | 1.988151e6 | -1.0 | -1.0 | [207.2, 151.4] | [-10.687937, -8.876659] | [2.933036, 1.992718] |
| 0 | 1 | 1.988152e6 | -1.0 | -1.0 | [207.4, 151.6] | [-10.683106, -8.871807] | [3.364372, 3.378942] |
| 0 | 1 | 1.988153e6 | -1.0 | -1.0 | [207.6, 151.9] | [-10.678275, -8.86453] | [2.933062, 3.98543] |
| 0 | 1 | 1.988154e6 | -1.0 | -1.0 | [207.7, 152.1] | [-10.67586, -8.859678] | [1.63908, 3.292347] |
| 0 | 1 | 1.988155e6 | -1.0 | -1.0 | [207.7, 152.1] | [-10.67586, -8.859678] | [0.517608, 2.425984] |
| 0 | 1 | 1.988156e6 | -1.0 | -1.0 | [207.7, 152.2] | [-10.67586, -8.857252] | [-0.25881, 0.953079] |
| … | … | … | … | … | … | … | … |
| 0 | 1 | 2.005356e6 | -1.0 | -1.0 | [370.4, 419.0] | [-6.700617, -2.297363] | [30.127398, 2.215118] |
| 0 | 1 | 2.005357e6 | -1.0 | -1.0 | [371.2, 419.0] | [-6.680877, -2.297363] | [8.10499, -1.772092] |
| 0 | 1 | 2.005358e6 | -1.0 | -1.0 | [371.1, 418.9] | [-6.683345, -2.299844] | [-12.862729, -6.645273] |
| 0 | 1 | 2.005359e6 | -1.0 | -1.0 | [369.9, 418.7] | [-6.712953, -2.304806] | [-30.745214, -11.252527] |
| 0 | 1 | 2.00536e6 | -1.0 | -1.0 | [368.1, 418.1] | [-6.75736, -2.319691] | [-44.219144, -15.593809] |
| 0 | 1 | 2.005361e6 | -1.0 | -1.0 | [365.9, 417.1] | [-6.811623, -2.3445] | [-54.515696, -19.137494] |
| 0 | 1 | 2.005362e6 | -1.0 | -1.0 | [363.3, 416.3] | [-6.875737, -2.364346] | [-59.347609, -20.909048] |
| 0 | 1 | 2.005363e6 | -1.0 | -1.0 | [361.0, 415.4] | [-6.932438, -2.386672] | [-62.062479, -20.465552] |
| 0 | 1 | 2.005364e6 | -1.0 | -1.0 | [358.0, 414.5] | [-7.006376, -2.408998] | [-61.343786, -18.073031] |
| 0 | 1 | 2.005365e6 | -1.0 | -1.0 | [355.8, 413.8] | [-7.060582, -2.426362] | [-53.501231, -14.617634] |
| 0 | 1 | 2.005366e6 | -1.0 | -1.0 | [353.1, 413.2] | [-7.12709, -2.441245] | [-41.879965, -10.276475] |
| 0 | 1 | 2.005367e6 | -1.0 | -1.0 | [351.2, 412.9] | [-7.173881, -2.448686] | [-27.710881, -6.112645] |
This has overwritten our velocity columns. As we see, the values in the velocity columns are slightly different.
What you have learned in this tutorial:#
transforming pixel coordinates into degrees of visual angle by using
Dataset.pix2deg()transforming positional data into velocity data by using
Dataset.pos2vel()passing additional keyword arguments when using the Savitzky-Golay differentiation filter