Preprocessing Raw Gaze Data#

What you will learn in this tutorial:#

how to transform pixel coordinates into degrees of visual angle
how to transform positional data into velocity data

Preparations#

We import pymovements as the alias pm for convenience.

[1]:

import pymovements as pm

/home/docs/checkouts/readthedocs.org/user_builds/pymovements/envs/stable/lib/python3.9/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm

Let’s start by downloading our ToyDataset and loading in its data:

[2]:

dataset = pm.Dataset('ToyDataset', path='data/ToyDataset')
dataset.download()
dataset.load()

Using already downloaded and verified file: data/ToyDataset/downloads/pymovements-toy-dataset.zip
Extracting pymovements-toy-dataset.zip to data/ToyDataset/raw

100%|██████████| 20/20 [00:00<00:00, 20.53it/s]

[2]:

<pymovements.dataset.dataset.Dataset at 0x7f4e8cabee80>

We can verify that all files have been loaded in by checking the fileinfo attribute:

[3]:

dataset.fileinfo

[3]:

shape: (20, 3)

text_id	page_id	filepath
i64	i64	str
0	1	"aeye-lab-pymov…
0	2	"aeye-lab-pymov…
0	3	"aeye-lab-pymov…
0	4	"aeye-lab-pymov…
0	5	"aeye-lab-pymov…
1	1	"aeye-lab-pymov…
1	2	"aeye-lab-pymov…
1	3	"aeye-lab-pymov…
1	4	"aeye-lab-pymov…
1	5	"aeye-lab-pymov…
2	1	"aeye-lab-pymov…
2	2	"aeye-lab-pymov…
2	3	"aeye-lab-pymov…
2	4	"aeye-lab-pymov…
2	5	"aeye-lab-pymov…
3	1	"aeye-lab-pymov…
3	2	"aeye-lab-pymov…
3	3	"aeye-lab-pymov…
3	4	"aeye-lab-pymov…
3	5	"aeye-lab-pymov…

Now let’s inpect our gaze dataframe:

[4]:

dataset.gaze[0].frame.head()

[4]:

shape: (5, 6)

time	stimuli_x	stimuli_y	text_id	page_id	pixel
f32	f32	f32	i64	i64	list[f32]
1.988145e6	-1.0	-1.0	0	1	[206.800003, 152.399994]
1.988146e6	-1.0	-1.0	0	1	[206.899994, 152.100006]
1.988147e6	-1.0	-1.0	0	1	[207.0, 151.800003]
1.988148e6	-1.0	-1.0	0	1	[207.100006, 151.699997]
1.988149e6	-1.0	-1.0	0	1	[207.0, 151.5]

Apart from some trial identifier columns we see the columns time and pixel.

Preprocessing#

We now want to transform these pixel position coordinates into coordinates in degrees of visual angle. This is simply done by:

[5]:

dataset.pix2deg()

dataset.gaze[0].frame

100%|██████████| 20/20 [00:01<00:00, 10.48it/s]

[5]:

shape: (17_223, 7)

time	stimuli_x	stimuli_y	text_id	page_id	pixel	position
f32	f32	f32	i64	i64	list[f32]	list[f32]
1.988145e6	-1.0	-1.0	0	1	[206.800003, 152.399994]	[-10.697598, -8.8524]
1.988146e6	-1.0	-1.0	0	1	[206.899994, 152.100006]	[-10.695184, -8.859678]
1.988147e6	-1.0	-1.0	0	1	[207.0, 151.800003]	[-10.692768, -8.866957]
1.988148e6	-1.0	-1.0	0	1	[207.100006, 151.699997]	[-10.690351, -8.869382]
1.988149e6	-1.0	-1.0	0	1	[207.0, 151.5]	[-10.692768, -8.874233]
1.98815e6	-1.0	-1.0	0	1	[207.0, 151.300003]	[-10.692768, -8.879086]
1.988151e6	-1.0	-1.0	0	1	[207.199997, 151.399994]	[-10.687937, -8.87666]
1.988152e6	-1.0	-1.0	0	1	[207.399994, 151.600006]	[-10.683106, -8.871807]
1.988153e6	-1.0	-1.0	0	1	[207.600006, 151.899994]	[-10.678275, -8.864531]
1.988154e6	-1.0	-1.0	0	1	[207.699997, 152.100006]	[-10.675859, -8.859678]
1.988155e6	-1.0	-1.0	0	1	[207.699997, 152.100006]	[-10.675859, -8.859678]
1.988156e6	-1.0	-1.0	0	1	[207.699997, 152.199997]	[-10.675859, -8.857252]
…	…	…	…	…	…	…
2.005356e6	-1.0	-1.0	0	1	[370.399994, 419.0]	[-6.700617, -2.297363]
2.005357e6	-1.0	-1.0	0	1	[371.200012, 419.0]	[-6.680877, -2.297363]
2.005358e6	-1.0	-1.0	0	1	[371.100006, 418.899994]	[-6.683344, -2.299844]
2.005359e6	-1.0	-1.0	0	1	[369.899994, 418.700012]	[-6.712954, -2.304806]
2.00536e6	-1.0	-1.0	0	1	[368.100006, 418.100006]	[-6.75736, -2.319691]
2.005361e6	-1.0	-1.0	0	1	[365.899994, 417.100006]	[-6.811623, -2.3445]
2.005362e6	-1.0	-1.0	0	1	[363.299988, 416.299988]	[-6.875737, -2.364347]
2.005363e6	-1.0	-1.0	0	1	[361.0, 415.399994]	[-6.932438, -2.386672]
2.005364e6	-1.0	-1.0	0	1	[358.0, 414.5]	[-7.006376, -2.408998]
2.005365e6	-1.0	-1.0	0	1	[355.799988, 413.799988]	[-7.060582, -2.426362]
2.005366e6	-1.0	-1.0	0	1	[353.100006, 413.200012]	[-7.12709, -2.441245]
2.005367e6	-1.0	-1.0	0	1	[351.200012, 412.899994]	[-7.173881, -2.448686]

The processed result has been added as a new column named position to our gaze dataframe.

Additionally we would like to have velocity data available too. We have four different methods available:

preceding: this will just take the single preceding sample in account for velocity calculation. Most noisy variant.
neighbors: this will take the neighboring samples in account for velocity calculation. A bit less noisy.
smooth: this will increase the neighboring samples to two on each side. You can get a smooth conversion this way.
savitzky_golay: this is using the Savitzky-Golay differentiation filter for conversion. You can specify additional parameters like window_length and degree. Depending on your parameters this will lead to the best results.

Let’s use the fivepoint method first:

[6]:

dataset.pos2vel(method='fivepoint')

dataset.gaze[0].frame

100%|██████████| 20/20 [00:01<00:00, 19.40it/s]

[6]:

shape: (17_223, 8)

time	stimuli_x	stimuli_y	text_id	page_id	pixel	position	velocity
f32	f32	f32	i64	i64	list[f32]	list[f32]	list[f32]
1.988145e6	-1.0	-1.0	0	1	[206.800003, 152.399994]	[-10.697598, -8.8524]	[null, null]
1.988146e6	-1.0	-1.0	0	1	[206.899994, 152.100006]	[-10.695184, -8.859678]	[null, null]
1.988147e6	-1.0	-1.0	0	1	[207.0, 151.800003]	[-10.692768, -8.866957]	[1.610438, -5.256017]
1.988148e6	-1.0	-1.0	0	1	[207.100006, 151.699997]	[-10.690351, -8.869382]	[0.40261, -4.447301]
1.988149e6	-1.0	-1.0	0	1	[207.0, 151.5]	[-10.692768, -8.874233]	[0.402451, -3.234386]
1.98815e6	-1.0	-1.0	0	1	[207.0, 151.300003]	[-10.692768, -8.879086]	[2.012571, -0.808557]
1.988151e6	-1.0	-1.0	0	1	[207.199997, 151.399994]	[-10.687937, -8.87666]	[4.025777, 2.830188]
1.988152e6	-1.0	-1.0	0	1	[207.399994, 151.600006]	[-10.683106, -8.871807]	[4.428546, 5.256176]
1.988153e6	-1.0	-1.0	0	1	[207.600006, 151.899994]	[-10.678275, -8.864531]	[3.220717, 4.851818]
1.988154e6	-1.0	-1.0	0	1	[207.699997, 152.100006]	[-10.675859, -8.859678]	[1.610438, 3.234545]
1.988155e6	-1.0	-1.0	0	1	[207.699997, 152.100006]	[-10.675859, -8.859678]	[0.000159, 1.617432]
1.988156e6	-1.0	-1.0	0	1	[207.699997, 152.199997]	[-10.675859, -8.857252]	[-0.805219, 1.213074]
…	…	…	…	…	…	…	…
2.005356e6	-1.0	-1.0	0	1	[370.399994, 419.0]	[-6.700617, -2.297363]	[30.837774, 1.65391]
2.005357e6	-1.0	-1.0	0	1	[371.200012, 419.0]	[-6.680877, -2.297363]	[7.401864, -1.240412]
2.005358e6	-1.0	-1.0	0	1	[371.100006, 418.899994]	[-6.683344, -2.299844]	[-14.803171, -4.961729]
2.005359e6	-1.0	-1.0	0	1	[369.899994, 418.700012]	[-6.712954, -2.304806]	[-34.126919, -11.16403]
2.00536e6	-1.0	-1.0	0	1	[368.100006, 418.100006]	[-6.75736, -2.319691]	[-48.510315, -17.366093]
2.005361e6	-1.0	-1.0	0	1	[365.899994, 417.100006]	[-6.811623, -2.3445]	[-56.310177, -21.087051]
2.005362e6	-1.0	-1.0	0	1	[363.299988, 416.299988]	[-6.875737, -2.364347]	[-61.638596, -21.913252]
2.005363e6	-1.0	-1.0	0	1	[361.0, 415.399994]	[-6.932438, -2.386672]	[-63.266281, -21.085701]
2.005364e6	-1.0	-1.0	0	1	[358.0, 414.5]	[-7.006376, -2.408998]	[-63.249668, -19.431353]
2.005365e6	-1.0	-1.0	0	1	[355.799988, 413.799988]	[-7.060582, -2.426362]	[-60.359718, -15.709997]
2.005366e6	-1.0	-1.0	0	1	[353.100006, 413.200012]	[-7.12709, -2.441245]	[null, null]
2.005367e6	-1.0	-1.0	0	1	[351.200012, 412.899994]	[-7.173881, -2.448686]	[null, null]

The processed result has been added as a new column named velocity to our gaze dataframe.

We can also use the Savitzky-Golay differentiation filter with some additional parameters like this:

[7]:

dataset.pos2vel(method='savitzky_golay', degree=2, window_length=7)

dataset.gaze[0].frame

100%|██████████| 20/20 [00:01<00:00, 18.55it/s]

[7]:

shape: (17_223, 8)

time	stimuli_x	stimuli_y	text_id	page_id	pixel	position	velocity
f32	f32	f32	i64	i64	list[f32]	list[f32]	list[f32]
1.988145e6	-1.0	-1.0	0	1	[206.800003, 152.399994]	[-10.697598, -8.8524]	[1.207726, -3.11923]
1.988146e6	-1.0	-1.0	0	1	[206.899994, 152.100006]	[-10.695184, -8.859678]	[1.207692, -4.072189]
1.988147e6	-1.0	-1.0	0	1	[207.0, 151.800003]	[-10.692768, -8.866957]	[1.035145, -4.765272]
1.988148e6	-1.0	-1.0	0	1	[207.100006, 151.699997]	[-10.690351, -8.869382]	[1.207726, -4.245451]
1.988149e6	-1.0	-1.0	0	1	[207.0, 151.5]	[-10.692768, -8.874233]	[1.552786, -2.339193]
1.98815e6	-1.0	-1.0	0	1	[207.0, 151.300003]	[-10.692768, -8.879086]	[2.242872, 0.000034]
1.988151e6	-1.0	-1.0	0	1	[207.199997, 151.399994]	[-10.687937, -8.87666]	[2.932991, 1.992668]
1.988152e6	-1.0	-1.0	0	1	[207.399994, 151.600006]	[-10.683106, -8.871807]	[3.364461, 3.378902]
1.988153e6	-1.0	-1.0	0	1	[207.600006, 151.899994]	[-10.678275, -8.864531]	[2.933128, 3.985473]
1.988154e6	-1.0	-1.0	0	1	[207.699997, 152.100006]	[-10.675859, -8.859678]	[1.639094, 3.29239]
1.988155e6	-1.0	-1.0	0	1	[207.699997, 152.100006]	[-10.675859, -8.859678]	[0.517641, 2.425943]
1.988156e6	-1.0	-1.0	0	1	[207.699997, 152.199997]	[-10.675859, -8.857252]	[-0.25882, 0.953129]
…	…	…	…	…	…	…	…
2.005356e6	-1.0	-1.0	0	1	[370.399994, 419.0]	[-6.700617, -2.297363]	[30.127287, 2.215104]
2.005357e6	-1.0	-1.0	0	1	[371.200012, 419.0]	[-6.680877, -2.297363]	[8.104988, -1.772072]
2.005358e6	-1.0	-1.0	0	1	[371.100006, 418.899994]	[-6.683344, -2.299844]	[-12.8627, -6.64522]
2.005359e6	-1.0	-1.0	0	1	[369.899994, 418.700012]	[-6.712954, -2.304806]	[-30.745234, -11.25254]
2.00536e6	-1.0	-1.0	0	1	[368.100006, 418.100006]	[-6.75736, -2.319691]	[-44.219154, -15.593843]
2.005361e6	-1.0	-1.0	0	1	[365.899994, 417.100006]	[-6.811623, -2.3445]	[-54.51572, -19.137569]
2.005362e6	-1.0	-1.0	0	1	[363.299988, 416.299988]	[-6.875737, -2.364347]	[-59.347614, -20.909182]
2.005363e6	-1.0	-1.0	0	1	[361.0, 415.399994]	[-6.932438, -2.386672]	[-62.0625, -20.465605]
2.005364e6	-1.0	-1.0	0	1	[358.0, 414.5]	[-7.006376, -2.408998]	[-61.343773, -18.07303]
2.005365e6	-1.0	-1.0	0	1	[355.799988, 413.799988]	[-7.060582, -2.426362]	[-53.501213, -14.617588]
2.005366e6	-1.0	-1.0	0	1	[353.100006, 413.200012]	[-7.12709, -2.441245]	[-41.879959, -10.276445]
2.005367e6	-1.0	-1.0	0	1	[351.200012, 412.899994]	[-7.173881, -2.448686]	[-27.710863, -6.112601]

This has overwritten our velocity columns. As we see, the values in the velocity columns are slightly different.

What you have learned in this tutorial:#

transforming pixel coordinates into degrees of visual angle by using Dataset.pix2deg()
transforming positional data into velocity data by using Dataset.pos2vel()
passing additional keyword arguments when using the Savitzky-Golay differentiation filter