pymovements in 10 minutes#

What you will learn in this tutorial:#

how to download one of the publicly available datasets
how to load a subset of the data into your memory
how to transform pixel coordinates into degrees of visual angle
how to transform positional data into velocity data
how to detect fixations by using the I-VT algorithm
how to detect saccades by using the microsaccades algorithm
how to compute additional event properties for your analysis
how to save your preprocessed data
how to plot the main saccadic sequence from your data

Downloading one of the public datasets#

We import pymovements as the alias pm for convenience.

[1]:

import polars as pl

import pymovements as pm

/home/docs/checkouts/readthedocs.org/user_builds/pymovements/envs/v0.14.0/lib/python3.9/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm

pymovements provides a library of publicly available datasets.

You can browse through the available dataset definitions here: Datasets

For this tutorial we will limit ourselves to the ToyDataset due to its minimal space requirements.

Other datasets can be downloaded by simply replacing ToyDataset with one of the other available datasets.

We can initialize and download by passing the desired dataset name as a string argument.

Additionally we need the root directory path of your data.

[2]:

dataset = pm.Dataset('ToyDataset', path='data/ToyDataset')
dataset.download()

Downloading http://github.com/aeye-lab/pymovements-toy-dataset/zipball/6cb5d663317bf418cec0c9abe1dde5085a8a8ebd/ to data/ToyDataset/downloads/pymovements-toy-dataset.zip

pymovements-toy-dataset.zip: 100%|██████████| 3.06M/3.06M [00:00<00:00, 25.3MB/s]

Checking integrity of pymovements-toy-dataset.zip
Extracting pymovements-toy-dataset.zip to data/ToyDataset/raw

[2]:

<pymovements.dataset.dataset.Dataset at 0x7fd18f6877f0>

Our downloaded dataset will be placed in new a directory with the name of the dataset:

[3]:

dataset.path

[3]:

PosixPath('data/ToyDataset')

Archive files are automatically extracted into the path specified by Dataset.paths.raw:

[4]:

dataset.paths.raw

[4]:

PosixPath('data/ToyDataset/raw')

Loading in your data into memory#

Next we load our dataset into memory to be able to work with it:

[5]:

dataset.load()

100%|██████████| 20/20 [00:00<00:00, 195.27it/s]

[5]:

<pymovements.dataset.dataset.Dataset at 0x7fd18f6877f0>

This way we fill two attributes with data. First we have the fileinfo attribute which holds all the basic information for files:

[6]:

dataset.fileinfo.head()

[6]:

shape: (5, 3)

text_id	page_id	filepath
i64	i64	str
0	1	"aeye-lab-pymov…
0	2	"aeye-lab-pymov…
0	3	"aeye-lab-pymov…
0	4	"aeye-lab-pymov…
0	5	"aeye-lab-pymov…

We notice that for each filepath a text_id and page_id is specified.

We have also loaded our gaze data into the dataframes in the gaze attribute:

[7]:

dataset.gaze[0].frame.head()

[7]:

shape: (5, 5)

text_id	page_id	time	x_right_pix	y_right_pix
i64	i64	f64	f64	f64
0	1	1.988145e6	206.8	152.4
0	1	1.988146e6	206.9	152.1
0	1	1.988147e6	207.0	151.8
0	1	1.988148e6	207.1	151.7
0	1	1.988149e6	207.0	151.5

Apart from the familiar columns from the fileinfo dataframe we see the columns time, x_right_pix and y_right_pix.

The last two columns refer to the pixel coordinates at the timestep specified by time.

We are also able to just take a subset of the data by specifying values of the fileinfo columns. The key refers to the column in the fileinfo dataframe. The values in the dictionary can be of type bool, int, float or str, but also lists and ranges

[8]:

subset = {
    'text_id': 0,
    'page_id': range(3),
}
dataset.load(subset=subset)

dataset.fileinfo

100%|██████████| 2/2 [00:00<00:00, 207.13it/s]

[8]:

shape: (2, 3)

text_id	page_id	filepath
i64	i64	str
0	1	"aeye-lab-pymov…
0	2	"aeye-lab-pymov…

Now we selected only a small subset of our data.

Preprocessing raw gaze data#

We now want to preprocess our gaze data by transforming pixel coordinates into degrees of visual angle and then computing velocity data from our positional data.

[9]:

dataset.pix2deg()

dataset.gaze[0].frame.head()

100%|██████████| 2/2 [00:00<00:00, 549.93it/s]

[9]:

shape: (5, 7)

text_id	page_id	time	x_right_pix	y_right_pix	y_right_pos	x_right_pos
i64	i64	f64	f64	f64	f64	f64
0	1	1.988145e6	206.8	152.4	-12.005591	-7.528075
0	1	1.988146e6	206.9	152.1	-12.01277	-7.525633
0	1	1.988147e6	207.0	151.8	-12.019949	-7.52319
0	1	1.988148e6	207.1	151.7	-12.022342	-7.520748
0	1	1.988149e6	207.0	151.5	-12.027128	-7.52319

We notice that two new columns have appeared: x_right_pos and y_right_pos. These are the positional columns specified in degrees of visual angle (dva).

For transforming our positional data into velocity data we will use the Savitzky-Golay differentiation filter.

We can also specify some additional parameters for this method:

[10]:

dataset.pos2vel(method='savitzky_golay', window_length=7, polyorder=2)

dataset.gaze[0].frame.head()

100%|██████████| 2/2 [00:00<00:00, 368.15it/s]

[10]:

shape: (5, 9)

text_id	page_id	time	x_right_pix	y_right_pix	y_right_pos	x_right_pos	x_right_vel	y_right_vel
i64	i64	f64	f64	f64	f64	f64	f64	f64
0	1	1.988145e6	206.8	152.4	-12.005591	-7.528075	1.918969	-8.119266
0	1	1.988146e6	206.9	152.1	-12.01277	-7.525633	1.686374	-6.80873
0	1	1.988147e6	207.0	151.8	-12.019949	-7.52319	1.453779	-5.498195
0	1	1.988148e6	207.1	151.7	-12.022342	-7.520748	1.221184	-4.187659
0	1	1.988149e6	207.0	151.5	-12.027128	-7.52319	1.570121	-2.307447

Detecting events#

Now let’s detect some events.

First we will detect fixations using the I-VT algorithm using its default parameters:

[11]:

dataset.detect_events('ivt')

dataset.events[0].frame.head()

2it [00:00, 298.54it/s]

[11]:

shape: (5, 6)

name	onset	offset	duration	text_id	page_id
str	i64	i64	i64	i64	i64
"fixation"	1988145	1988322	177	0	1
"fixation"	1988351	1988546	195	0	1
"fixation"	1988592	1988736	144	0	1
"fixation"	1988788	1989012	224	0	1
"fixation"	1989044	1989170	126	0	1

Next we detect some saccades. This time we don’t use the default parameters but specify our own:

[12]:

dataset.detect_events('microsaccades', minimum_duration=8)

dataset.events[0].frame.filter(pl.col('name') == 'saccade').head()

2it [00:00, 91.65it/s]

[12]:

shape: (5, 6)

name	onset	offset	duration	text_id	page_id
str	i64	i64	i64	i64	i64
"saccade"	1988322	1988337	15	0	1
"saccade"	1988341	1988351	10	0	1
"saccade"	1988546	1988567	21	0	1
"saccade"	1988570	1988583	13	0	1
"saccade"	1988736	1988760	24	0	1

Computing event properties#

The event dataframe currently only holds the name, onset, offset and duration of an event (additionally we have some more identifier columns at the beginning).

We now want to compute some additional properties for each event. Event properties are things like peak velocity, amplitude and dispersion during an event.

We start out with computing the dispersion:

[13]:

dataset.compute_event_properties("dispersion")

dataset.events[0].frame.head()

2it [00:02,  1.19s/it]

[13]:

shape: (5, 7)

name	onset	offset	duration	text_id	page_id	dispersion
str	i64	i64	i64	i64	i64	f64
"fixation"	1988145	1988322	177	0	1	0.154585
"fixation"	1988351	1988546	195	0	1	0.291794
"fixation"	1988592	1988736	144	0	1	0.295701
"fixation"	1988788	1989012	224	0	1	0.27063
"fixation"	1989044	1989170	126	0	1	0.348295

We notice that a new column with the name dispersion has appeared in the event dataframe.

We can also pass a list of properties. Let’s add the amplitude and peak velocity:

[14]:

dataset.compute_event_properties(["amplitude", "peak_velocity"])

dataset.events[0].frame.head()

2it [00:02,  1.21s/it]

[14]:

shape: (5, 9)

name	onset	offset	duration	text_id	page_id	dispersion	amplitude	peak_velocity
str	i64	i64	i64	i64	i64	f64	f64	f64
"fixation"	1988145	1988322	177	0	1	0.154585	0.109689	16.423157
"fixation"	1988351	1988546	195	0	1	0.291794	0.206443	19.12955
"fixation"	1988592	1988736	144	0	1	0.295701	0.209179	17.794216
"fixation"	1988788	1989012	224	0	1	0.27063	0.191971	19.194043
"fixation"	1989044	1989170	126	0	1	0.348295	0.304209	18.583422

This way we can compute all of our desired properties in a single run.

Plotting our data#

pymovements provides a range of plotting functions.

You can browse through the available plotting functions here: Plotting

In this this tutorial we will plot the saccadic main sequence of our data.

[15]:

pm.plotting.main_sequence_plot(dataset.events[0])

../_images/tutorials_pymovements-in-10-minutes_40_0.png

Saving and loading your dataframes#

If we want to save interim results we can simply use the save() method like this:

[16]:

dataset.save()

100%|██████████| 2/2 [00:00<00:00, 1437.88it/s]
100%|██████████| 2/2 [00:00<00:00, 410.10it/s]

[16]:

<pymovements.dataset.dataset.Dataset at 0x7fd18f6877f0>

Let’s test this out by initializing a new PublicDataset object in the same directory and loading in the preprocessed gaze and event data.

This time we don’t need to download anything.

[17]:

preprocessed_dataset = pm.Dataset('ToyDataset', path='data/ToyDataset')

dataset.load(events=True, preprocessed=True, subset=subset)

display(dataset.gaze[0])
display(dataset.events[0])

100%|██████████| 2/2 [00:00<00:00, 1150.07it/s]
100%|██████████| 2/2 [00:00<00:00, 774.93it/s]

<pymovements.gaze.gaze_dataframe.GazeDataFrame at 0x7fd18c0c3520>

<pymovements.events.events.EventDataFrame at 0x7fd18f5df4f0>