Preprocessing Raw Gaze Data#
What you will learn in this tutorial:#
how to transform pixel coordinates into degrees of visual angle
how to transform positional data into velocity data
Preparations#
We import pymovements as the alias pm for convenience.
[1]:
import pymovements as pm
/home/docs/checkouts/readthedocs.org/user_builds/pymovements/envs/v0.14.0/lib/python3.9/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
from .autonotebook import tqdm as notebook_tqdm
Let’s start by downloading our ToyDataset and loading in its data:
[2]:
dataset = pm.Dataset('ToyDataset', path='data/ToyDataset')
dataset.download()
dataset.load()
Using already downloaded and verified file: data/ToyDataset/downloads/pymovements-toy-dataset.zip
Extracting pymovements-toy-dataset.zip to data/ToyDataset/raw
100%|██████████| 20/20 [00:00<00:00, 201.23it/s]
[2]:
<pymovements.dataset.dataset.Dataset at 0x7f443d3eb820>
We can verify that all files have been loaded in by checking the fileinfo attribute:
[3]:
dataset.fileinfo
[3]:
| text_id | page_id | filepath |
|---|---|---|
| i64 | i64 | str |
| 0 | 1 | "aeye-lab-pymov… |
| 0 | 2 | "aeye-lab-pymov… |
| 0 | 3 | "aeye-lab-pymov… |
| 0 | 4 | "aeye-lab-pymov… |
| 0 | 5 | "aeye-lab-pymov… |
| 1 | 1 | "aeye-lab-pymov… |
| 1 | 2 | "aeye-lab-pymov… |
| 1 | 3 | "aeye-lab-pymov… |
| 1 | 4 | "aeye-lab-pymov… |
| 1 | 5 | "aeye-lab-pymov… |
| 2 | 1 | "aeye-lab-pymov… |
| 2 | 2 | "aeye-lab-pymov… |
| 2 | 3 | "aeye-lab-pymov… |
| 2 | 4 | "aeye-lab-pymov… |
| 2 | 5 | "aeye-lab-pymov… |
| 3 | 1 | "aeye-lab-pymov… |
| 3 | 2 | "aeye-lab-pymov… |
| 3 | 3 | "aeye-lab-pymov… |
| 3 | 4 | "aeye-lab-pymov… |
| 3 | 5 | "aeye-lab-pymov… |
Now let’s inpect our gaze dataframe:
[4]:
dataset.gaze[0].frame.head()
[4]:
| text_id | page_id | time | x_right_pix | y_right_pix |
|---|---|---|---|---|
| i64 | i64 | f64 | f64 | f64 |
| 0 | 1 | 1.988145e6 | 206.8 | 152.4 |
| 0 | 1 | 1.988146e6 | 206.9 | 152.1 |
| 0 | 1 | 1.988147e6 | 207.0 | 151.8 |
| 0 | 1 | 1.988148e6 | 207.1 | 151.7 |
| 0 | 1 | 1.988149e6 | 207.0 | 151.5 |
Apart from some additional labels we see the following columns: time, x_right_pix and y_right_pix.
Preprocessing#
We now want to transform these pixel position coordinates into coordinates in degrees of visual angle. This is simply done by:
[5]:
dataset.pix2deg()
dataset.gaze[0].frame
100%|██████████| 20/20 [00:00<00:00, 770.38it/s]
[5]:
| text_id | page_id | time | x_right_pix | y_right_pix | y_right_pos | x_right_pos |
|---|---|---|---|---|---|---|
| i64 | i64 | f64 | f64 | f64 | f64 | f64 |
| 0 | 1 | 1.988145e6 | 206.8 | 152.4 | -12.005591 | -7.528075 |
| 0 | 1 | 1.988146e6 | 206.9 | 152.1 | -12.01277 | -7.525633 |
| 0 | 1 | 1.988147e6 | 207.0 | 151.8 | -12.019949 | -7.52319 |
| 0 | 1 | 1.988148e6 | 207.1 | 151.7 | -12.022342 | -7.520748 |
| 0 | 1 | 1.988149e6 | 207.0 | 151.5 | -12.027128 | -7.52319 |
| 0 | 1 | 1.98815e6 | 207.0 | 151.3 | -12.031913 | -7.52319 |
| 0 | 1 | 1.988151e6 | 207.2 | 151.4 | -12.02952 | -7.518305 |
| 0 | 1 | 1.988152e6 | 207.4 | 151.6 | -12.024735 | -7.513421 |
| 0 | 1 | 1.988153e6 | 207.6 | 151.9 | -12.017556 | -7.508536 |
| 0 | 1 | 1.988154e6 | 207.7 | 152.1 | -12.01277 | -7.506093 |
| 0 | 1 | 1.988155e6 | 207.7 | 152.1 | -12.01277 | -7.506093 |
| 0 | 1 | 1.988156e6 | 207.7 | 152.2 | -12.010377 | -7.506093 |
| … | … | … | … | … | … | … |
| 0 | 1 | 2.005356e6 | 370.4 | 419.0 | -5.498696 | -3.501922 |
| 0 | 1 | 2.005357e6 | 371.2 | 419.0 | -5.498696 | -3.482116 |
| 0 | 1 | 2.005358e6 | 371.1 | 418.9 | -5.501175 | -3.484592 |
| 0 | 1 | 2.005359e6 | 369.9 | 418.7 | -5.506132 | -3.5143 |
| 0 | 1 | 2.00536e6 | 368.1 | 418.1 | -5.521002 | -3.558859 |
| 0 | 1 | 2.005361e6 | 365.9 | 417.1 | -5.545783 | -3.613315 |
| 0 | 1 | 2.005362e6 | 363.3 | 416.3 | -5.565607 | -3.677663 |
| 0 | 1 | 2.005363e6 | 361.0 | 415.4 | -5.587907 | -3.734578 |
| 0 | 1 | 2.005364e6 | 358.0 | 414.5 | -5.610206 | -3.808805 |
| 0 | 1 | 2.005365e6 | 355.8 | 413.8 | -5.627548 | -3.863229 |
| 0 | 1 | 2.005366e6 | 353.1 | 413.2 | -5.642412 | -3.930014 |
| 0 | 1 | 2.005367e6 | 351.2 | 412.9 | -5.649843 | -3.977003 |
The processed result has been added as new columns to our gaze dataframe: x_right_pos, y_right_pos.
Additionally we would like to have velocity data available too. We have four different methods available:
preceding: this will just take the single preceding sample in account for velocity calculation. Most noisy variant.neighbors: this will take the neighboring samples in account for velocity calculation. A bit less noisy.smooth: this will increase the neighboring samples to two on each side. You can get a smooth conversion this way.savitzky_golay: this is using the Savitzky-Golay differentiation filter for conversion. You can specify additional parameters likewindow_lengthandpolyorder. Depending on your parameters this will lead to the best results.
Let’s use the smooth method first:
[6]:
dataset.pos2vel(method='smooth')
dataset.gaze[0].frame
100%|██████████| 20/20 [00:00<00:00, 695.25it/s]
[6]:
| text_id | page_id | time | x_right_pix | y_right_pix | y_right_pos | x_right_pos | x_right_vel | y_right_vel |
|---|---|---|---|---|---|---|---|---|
| i64 | i64 | f64 | f64 | f64 | f64 | f64 | f64 | f64 |
| 0 | 1 | 1.988145e6 | 206.8 | 152.4 | -12.005591 | -7.528075 | 1.221164 | -3.589697 |
| 0 | 1 | 1.988146e6 | 206.9 | 152.1 | -12.01277 | -7.525633 | 2.442343 | -7.179203 |
| 0 | 1 | 1.988147e6 | 207.0 | 151.8 | -12.019949 | -7.52319 | 1.628238 | -5.184827 |
| 0 | 1 | 1.988148e6 | 207.1 | 151.7 | -12.022342 | -7.520748 | 0.407059 | -4.386968 |
| 0 | 1 | 1.988149e6 | 207.0 | 151.5 | -12.027128 | -7.52319 | 0.407069 | -3.190445 |
| 0 | 1 | 1.98815e6 | 207.0 | 151.3 | -12.031913 | -7.52319 | 2.035352 | -0.797611 |
| 0 | 1 | 1.988151e6 | 207.2 | 151.4 | -12.02952 | -7.518305 | 4.070736 | 2.79166 |
| 0 | 1 | 1.988152e6 | 207.4 | 151.6 | -12.024735 | -7.513421 | 4.477864 | 5.184593 |
| 0 | 1 | 1.988153e6 | 207.6 | 151.9 | -12.017556 | -7.508536 | 3.256672 | 4.785873 |
| 0 | 1 | 1.988154e6 | 207.7 | 152.1 | -12.01277 | -7.506093 | 1.628352 | 3.190657 |
| 0 | 1 | 1.988155e6 | 207.7 | 152.1 | -12.01277 | -7.506093 | 0.0 | 1.595371 |
| 0 | 1 | 1.988156e6 | 207.7 | 152.2 | -12.010377 | -7.506093 | -0.814183 | 1.196552 |
| … | … | … | … | … | … | … | … | … |
| 0 | 1 | 2.005356e6 | 370.4 | 419.0 | -5.498696 | -3.501922 | 30.943909 | 1.652276 |
| 0 | 1 | 2.005357e6 | 371.2 | 419.0 | -5.498696 | -3.482116 | 7.426888 | -1.239212 |
| 0 | 1 | 2.005358e6 | 371.1 | 418.9 | -5.501175 | -3.484592 | -14.853638 | -4.956757 |
| 0 | 1 | 2.005359e6 | 369.9 | 418.7 | -5.506132 | -3.5143 | -34.244438 | -11.152291 |
| 0 | 1 | 2.00536e6 | 368.1 | 418.1 | -5.521002 | -3.558859 | -48.680946 | -17.347327 |
| 0 | 1 | 2.005361e6 | 365.9 | 417.1 | -5.545783 | -3.613315 | -56.513558 | -21.063531 |
| 0 | 1 | 2.005362e6 | 363.3 | 416.3 | -5.565607 | -3.677663 | -61.868102 | -21.888043 |
| 0 | 1 | 2.005363e6 | 361.0 | 415.4 | -5.587907 | -3.734578 | -63.509382 | -21.060565 |
| 0 | 1 | 2.005364e6 | 358.0 | 414.5 | -5.610206 | -3.808805 | -63.500293 | -19.40755 |
| 0 | 1 | 2.005365e6 | 355.8 | 413.8 | -5.627548 | -3.863229 | -60.605681 | -15.690342 |
| 0 | 1 | 2.005366e6 | 353.1 | 413.2 | -5.642412 | -3.930014 | -56.887102 | -11.147739 |
| 0 | 1 | 2.005367e6 | 351.2 | 412.9 | -5.649843 | -3.977003 | -23.494968 | -3.715818 |
This added the following new velocity columns to our gaze dataframe: x_right_vel, y_right_vel
We can also use the Savitzky-Golay differentiation filter with some additional parameters like this:
[7]:
dataset.pos2vel(method='savitzky_golay', window_length=7, polyorder=2)
dataset.gaze[0].frame
100%|██████████| 20/20 [00:00<00:00, 441.88it/s]
[7]:
| text_id | page_id | time | x_right_pix | y_right_pix | y_right_pos | x_right_pos | x_right_vel | y_right_vel |
|---|---|---|---|---|---|---|---|---|
| i64 | i64 | f64 | f64 | f64 | f64 | f64 | f64 | f64 |
| 0 | 1 | 1.988145e6 | 206.8 | 152.4 | -12.005591 | -7.528075 | 1.918969 | -8.119266 |
| 0 | 1 | 1.988146e6 | 206.9 | 152.1 | -12.01277 | -7.525633 | 1.686374 | -6.80873 |
| 0 | 1 | 1.988147e6 | 207.0 | 151.8 | -12.019949 | -7.52319 | 1.453779 | -5.498195 |
| 0 | 1 | 1.988148e6 | 207.1 | 151.7 | -12.022342 | -7.520748 | 1.221184 | -4.187659 |
| 0 | 1 | 1.988149e6 | 207.0 | 151.5 | -12.027128 | -7.52319 | 1.570121 | -2.307447 |
| 0 | 1 | 1.98815e6 | 207.0 | 151.3 | -12.031913 | -7.52319 | 2.267985 | 0.000012 |
| 0 | 1 | 1.988151e6 | 207.2 | 151.4 | -12.02952 | -7.518305 | 2.965849 | 1.965619 |
| 0 | 1 | 1.988152e6 | 207.4 | 151.6 | -12.024735 | -7.513421 | 3.402009 | 3.332988 |
| 0 | 1 | 1.988153e6 | 207.6 | 151.9 | -12.017556 | -7.508536 | 2.965868 | 3.931231 |
| 0 | 1 | 1.988154e6 | 207.7 | 152.1 | -12.01277 | -7.506093 | 1.657408 | 3.247586 |
| 0 | 1 | 1.988155e6 | 207.7 | 152.1 | -12.01277 | -7.506093 | 0.523394 | 2.393016 |
| 0 | 1 | 1.988156e6 | 207.7 | 152.2 | -12.010377 | -7.506093 | -0.261702 | 0.940131 |
| … | … | … | … | … | … | … | … | … |
| 0 | 1 | 2.005356e6 | 370.4 | 419.0 | -5.498696 | -3.501922 | 30.233706 | 2.212816 |
| 0 | 1 | 2.005357e6 | 371.2 | 419.0 | -5.498696 | -3.482116 | 8.133334 | -1.770247 |
| 0 | 1 | 2.005358e6 | 371.1 | 418.9 | -5.501175 | -3.484592 | -12.907492 | -6.638259 |
| 0 | 1 | 2.005359e6 | 369.9 | 418.7 | -5.506132 | -3.5143 | -30.853149 | -11.240462 |
| 0 | 1 | 2.00536e6 | 368.1 | 418.1 | -5.521002 | -3.558859 | -44.376559 | -15.576756 |
| 0 | 1 | 2.005361e6 | 365.9 | 417.1 | -5.545783 | -3.613315 | -54.71422 | -19.116071 |
| 0 | 1 | 2.005362e6 | 363.3 | 416.3 | -5.565607 | -3.677663 | -59.569318 | -20.885051 |
| 0 | 1 | 2.005363e6 | 361.0 | 415.4 | -5.587907 | -3.734578 | -62.301177 | -20.441377 |
| 0 | 1 | 2.005364e6 | 358.0 | 414.5 | -5.610206 | -3.808805 | -61.58637 | -18.051086 |
| 0 | 1 | 2.005365e6 | 355.8 | 413.8 | -5.627548 | -3.863229 | -59.751855 | -15.454523 |
| 0 | 1 | 2.005366e6 | 353.1 | 413.2 | -5.642412 | -3.930014 | -57.917341 | -12.857961 |
| 0 | 1 | 2.005367e6 | 351.2 | 412.9 | -5.649843 | -3.977003 | -56.082826 | -10.261398 |
This has overwritten our velocity columns. As we see, the values in the velocity columns are slightly different.
What you have learned in this tutorial:#
transforming pixel coordinates into degrees of visual angle by using
Dataset.pix2deg()transforming positional data into velocity data by using
Dataset.pos2vel()passing additional keyword arguments when using the Savitzky-Golay differentiation filter