Preprocessing Raw Gaze Data#

What you will learn in this tutorial:#

  • how to transform pixel coordinates into degrees of visual angle

  • how to transform positional data into velocity data

Preparations#

We import pymovements as the alias pm for convenience.

import pymovements as pm

Let’s start by downloading our ToyDataset and loading in its data:

dataset = pm.Dataset('ToyDataset', path='data/ToyDataset')
dataset.download()
dataset.load()
INFO:pymovements.dataset.dataset:
        You are downloading the pymovements Toy Dataset. Please be aware that pymovements does not
        host or distribute any dataset resources and only provides a convenient interface to
        download the public dataset resources that were published by their respective authors.

        Please cite the referenced publication if you intend to use the dataset in your research.
        
Using already downloaded and verified file: data/ToyDataset/downloads/pymovements-toy-dataset.zip
Extracting pymovements-toy-dataset.zip to data/ToyDataset/raw
Extracting archive:   0%|          | 0/23 [00:00<?, ?file/s]
Extracting archive: 100%|██████████| 23/23 [00:00<00:00, 361.87file/s]

Dataset
  • DatasetDefinition
    DatasetDefinition
    • 'ToyDataset'
    • 'pymovements Toy Dataset'
    • 'Example toy dataset. This dataset includes monocu...'
      'Example toy dataset.\n\nThis dataset includes monocular eye tracking data from a single participant in a single\nsession. Eye movements are recorded at a sampling frequency of 1000 Hz using an EyeLink Portable\nDuo video-based eye tracker and are provided as pixel coordinates.\n\nThe participant is instructed to read 4 texts with 5 screens each.\n'
    • Experiment
      Experiment
      • EyeTracker
        EyeTracker
        • None
        • None
        • None
        • None
        • 1000
        • None
        • None
      • Screen
        Screen
        • 68
        • 30.2
        • 1024
        • 'upper left'
        • tuple (2 items)
          • 1280
          • 1024
        • tuple (2 items)
          • 38
          • 30.2
        • 38
        • 1280
        • 15.599386487782953
        • -15.599386487782953
        • 12.508044410882546
        • -12.508044410882546
    • list (1 items)
      • ResourceDefinition
        • 'gaze'
        • 'pymovements-toy-dataset.zip'
        • 'trial_{text_id:d}_{page_id:d}.csv'
        • dict (2 items)
          • <class 'int'>
          • <class 'int'>
        • None
        • dict (4 items)
          • 'timestamp'
          • 'ms'
          • (2 more)
        • '256901852c1c07581d375eef705855d6'
        • None
        • WebSource
          WebSource(url='https://github.com/pymovements/pymovements-toy-dataset/archive/refs/heads/main.zip', filename='pymovements-toy-dataset.zip', md5='256901852c1c07581d375eef705855d6', mirrors=None)
        • 'https://github.com/pymovements/pymovements-toy-dat...'
          'https://github.com/pymovements/pymovements-toy-dataset/archive/refs/heads/main.zip'
  • tuple (20 items)
    • Events
      • DataFrame (4 columns, 0 rows)
        shape: (0, 4)
        nameonsetoffsetduration
        stri64i64i64
      • None
    • Events
      • DataFrame (4 columns, 0 rows)
        shape: (0, 4)
        nameonsetoffsetduration
        stri64i64i64
      • None
    • (18 more)
  • dict (1 items)
    • DataFrame (3 columns, 20 rows)
      shape: (20, 3)
      text_idpage_idfilepath
      i64i64str
      01"pymovements-toy-dataset-main/d…
      02"pymovements-toy-dataset-main/d…
      03"pymovements-toy-dataset-main/d…
      04"pymovements-toy-dataset-main/d…
      05"pymovements-toy-dataset-main/d…
      31"pymovements-toy-dataset-main/d…
      32"pymovements-toy-dataset-main/d…
      33"pymovements-toy-dataset-main/d…
      34"pymovements-toy-dataset-main/d…
      35"pymovements-toy-dataset-main/d…
  • list (20 items)
    • Gaze
      • DataFrame (4 columns, 17223 rows)
        shape: (17_223, 4)
        timestimuli_xstimuli_ypixel
        i64f64f64list[f64]
        1988145-1.0-1.0[206.8, 152.4]
        1988146-1.0-1.0[206.9, 152.1]
        1988147-1.0-1.0[207.0, 151.8]
        1988148-1.0-1.0[207.1, 151.7]
        1988149-1.0-1.0[207.0, 151.5]
        2005363-1.0-1.0[361.0, 415.4]
        2005364-1.0-1.0[358.0, 414.5]
        2005365-1.0-1.0[355.8, 413.8]
        2005366-1.0-1.0[353.1, 413.2]
        2005367-1.0-1.0[351.2, 412.9]
      • Events
        Events
        • DataFrame (4 columns, 0 rows)
          shape: (0, 4)
          nameonsetoffsetduration
          stri64i64i64
        • None
      • dict (2 items)
        • 0
        • 1
      • None
      • None
      • Experiment
        Experiment
        • EyeTracker
          EyeTracker
          • None
          • None
          • None
          • None
          • 1000
          • None
          • None
        • Screen
          Screen
          • 68
          • 30.2
          • 1024
          • 'upper left'
          • tuple (2 items)
            • 1280
            • 1024
          • tuple (2 items)
            • 38
            • 30.2
          • 38
          • 1280
          • 15.599386487782953
          • -15.599386487782953
          • 12.508044410882546
          • -12.508044410882546
    • Gaze
      • DataFrame (4 columns, 29799 rows)
        shape: (29_799, 4)
        timestimuli_xstimuli_ypixel
        i64f64f64list[f64]
        2008305-1.0-1.0[141.4, 153.6]
        2008306-1.0-1.0[141.1, 153.2]
        2008307-1.0-1.0[140.7, 152.8]
        2008308-1.0-1.0[140.6, 152.7]
        2008309-1.0-1.0[140.5, 152.6]
        2038099-1.0-1.0[273.8, 773.8]
        2038100-1.0-1.0[273.8, 774.1]
        2038101-1.0-1.0[273.9, 774.5]
        2038102-1.0-1.0[274.0, 774.4]
        2038103-1.0-1.0[274.0, 773.9]
      • Events
        Events
        • DataFrame (4 columns, 0 rows)
          shape: (0, 4)
          nameonsetoffsetduration
          stri64i64i64
        • None
      • dict (2 items)
        • 0
        • 2
      • None
      • None
      • Experiment
        Experiment
        • EyeTracker
          EyeTracker
          • None
          • None
          • None
          • None
          • 1000
          • None
          • None
        • Screen
          Screen
          • 68
          • 30.2
          • 1024
          • 'upper left'
          • tuple (2 items)
            • 1280
            • 1024
          • tuple (2 items)
            • 38
            • 30.2
          • 38
          • 1280
          • 15.599386487782953
          • -15.599386487782953
          • 12.508044410882546
          • -12.508044410882546
    • (18 more)
  • Participants
    Participants
    • DataFrame (1 columns, 0 rows)
      shape: (0, 1)
      participant_id
      str
    • dict (1 items)
      • dict (1 items)
        • 'string'
  • PosixPath('data/ToyDataset')
  • DatasetPaths
    DatasetPaths
    • PosixPath('data/ToyDataset')
    • PosixPath('data/ToyDataset/downloads')
    • PosixPath('data/ToyDataset/events')
    • PosixPath('data/ToyDataset/precomputed_events')
    • PosixPath('data/ToyDataset/precomputed_reading_measures')
    • PosixPath('data/ToyDataset/preprocessed')
    • PosixPath('data/ToyDataset/raw')
    • PosixPath('data/ToyDataset')
    • PosixPath('data/ToyDataset/stimuli')
  • list (0 items)
  • list (0 items)
  • list (0 items)

We can verify that all files have been loaded in by checking the fileinfo attribute:

dataset.fileinfo
{'gaze': shape: (20, 3)
 ┌─────────┬─────────┬─────────────────────────────────┐
 │ text_id ┆ page_id ┆ filepath                        │
 │ ---     ┆ ---     ┆ ---                             │
 │ i64     ┆ i64     ┆ str                             │
 ╞═════════╪═════════╪═════════════════════════════════╡
 │ 0       ┆ 1       ┆ pymovements-toy-dataset-main/d… │
 │ 0       ┆ 2       ┆ pymovements-toy-dataset-main/d… │
 │ 0       ┆ 3       ┆ pymovements-toy-dataset-main/d… │
 │ 0       ┆ 4       ┆ pymovements-toy-dataset-main/d… │
 │ 0       ┆ 5       ┆ pymovements-toy-dataset-main/d… │
 │ …       ┆ …       ┆ …                               │
 │ 3       ┆ 1       ┆ pymovements-toy-dataset-main/d… │
 │ 3       ┆ 2       ┆ pymovements-toy-dataset-main/d… │
 │ 3       ┆ 3       ┆ pymovements-toy-dataset-main/d… │
 │ 3       ┆ 4       ┆ pymovements-toy-dataset-main/d… │
 │ 3       ┆ 5       ┆ pymovements-toy-dataset-main/d… │
 └─────────┴─────────┴─────────────────────────────────┘}

Now let’s inspect our gaze dataframe:

dataset.gaze[0]
Gaze
  • DataFrame (4 columns, 17223 rows)
    shape: (17_223, 4)
    timestimuli_xstimuli_ypixel
    i64f64f64list[f64]
    1988145-1.0-1.0[206.8, 152.4]
    1988146-1.0-1.0[206.9, 152.1]
    1988147-1.0-1.0[207.0, 151.8]
    1988148-1.0-1.0[207.1, 151.7]
    1988149-1.0-1.0[207.0, 151.5]
    2005363-1.0-1.0[361.0, 415.4]
    2005364-1.0-1.0[358.0, 414.5]
    2005365-1.0-1.0[355.8, 413.8]
    2005366-1.0-1.0[353.1, 413.2]
    2005367-1.0-1.0[351.2, 412.9]
  • Events
    Events
    • DataFrame (4 columns, 0 rows)
      shape: (0, 4)
      nameonsetoffsetduration
      stri64i64i64
    • None
  • dict (2 items)
    • 0
    • 1
  • None
  • None
  • Experiment
    Experiment
    • EyeTracker
      EyeTracker
      • None
      • None
      • None
      • None
      • 1000
      • None
      • None
    • Screen
      Screen
      • 68
      • 30.2
      • 1024
      • 'upper left'
      • tuple (2 items)
        • 1280
        • 1024
      • tuple (2 items)
        • 38
        • 30.2
      • 38
      • 1280
      • 15.599386487782953
      • -15.599386487782953
      • 12.508044410882546
      • -12.508044410882546

Apart from some trial identifier columns we see the columns time and pixel.

Preprocessing#

We now want to transform these pixel position coordinates into coordinates in degrees of visual angle. This is simply done by:

dataset.pix2deg()

dataset.gaze[0]
Gaze
  • DataFrame (5 columns, 17223 rows)
    shape: (17_223, 5)
    timestimuli_xstimuli_ypixelposition
    i64f64f64list[f64]list[f64]
    1988145-1.0-1.0[206.8, 152.4][-10.697598, -8.852399]
    1988146-1.0-1.0[206.9, 152.1][-10.695183, -8.859678]
    1988147-1.0-1.0[207.0, 151.8][-10.692768, -8.866956]
    1988148-1.0-1.0[207.1, 151.7][-10.690352, -8.869381]
    1988149-1.0-1.0[207.0, 151.5][-10.692768, -8.874233]
    2005363-1.0-1.0[361.0, 415.4][-6.932438, -2.386672]
    2005364-1.0-1.0[358.0, 414.5][-7.006376, -2.408998]
    2005365-1.0-1.0[355.8, 413.8][-7.060582, -2.426362]
    2005366-1.0-1.0[353.1, 413.2][-7.12709, -2.441245]
    2005367-1.0-1.0[351.2, 412.9][-7.173881, -2.448686]
  • Events
    Events
    • DataFrame (4 columns, 0 rows)
      shape: (0, 4)
      nameonsetoffsetduration
      stri64i64i64
    • None
  • dict (2 items)
    • 0
    • 1
  • None
  • None
  • Experiment
    Experiment
    • EyeTracker
      EyeTracker
      • None
      • None
      • None
      • None
      • 1000
      • None
      • None
    • Screen
      Screen
      • 68
      • 30.2
      • 1024
      • 'upper left'
      • tuple (2 items)
        • 1280
        • 1024
      • tuple (2 items)
        • 38
        • 30.2
      • 38
      • 1280
      • 15.599386487782953
      • -15.599386487782953
      • 12.508044410882546
      • -12.508044410882546

The processed result has been added as a new column named position to our gaze dataframe.

Additionally, we would like to have velocity data available too. We have four different methods available:

  • preceding: this will just take the single preceding sample into account for velocity calculation. Most noisy variant.

  • neighbors: this will take the neighboring samples into account for velocity calculation. A bit less noisy.

  • smooth: this will increase the neighboring samples to two on each side. You can get a smooth conversion this way.

  • savitzky_golay: this is using the Savitzky-Golay differentiation filter for conversion. You can specify additional parameters like window_length and degree. Depending on your parameters, this will lead to the best results.

Let’s use the fivepoint method first:

dataset.pos2vel(method='fivepoint')

dataset.gaze[0]
Gaze
  • DataFrame (6 columns, 17223 rows)
    shape: (17_223, 6)
    timestimuli_xstimuli_ypixelpositionvelocity
    i64f64f64list[f64]list[f64]list[f64]
    1988145-1.0-1.0[206.8, 152.4][-10.697598, -8.852399][null, null]
    1988146-1.0-1.0[206.9, 152.1][-10.695183, -8.859678][null, null]
    1988147-1.0-1.0[207.0, 151.8][-10.692768, -8.866956][1.610194, -5.256267]
    1988148-1.0-1.0[207.1, 151.7][-10.690352, -8.869381][0.402548, -4.447465]
    1988149-1.0-1.0[207.0, 151.5][-10.692768, -8.874233][0.402561, -3.234462]
    2005363-1.0-1.0[361.0, 415.4][-6.932438, -2.386672][-63.266374, -21.085616]
    2005364-1.0-1.0[358.0, 414.5][-7.006376, -2.408998][-63.249652, -19.431326]
    2005365-1.0-1.0[355.8, 413.8][-7.060582, -2.426362][-60.359624, -15.710061]
    2005366-1.0-1.0[353.1, 413.2][-7.12709, -2.441245][null, null]
    2005367-1.0-1.0[351.2, 412.9][-7.173881, -2.448686][null, null]
  • Events
    Events
    • DataFrame (4 columns, 0 rows)
      shape: (0, 4)
      nameonsetoffsetduration
      stri64i64i64
    • None
  • dict (2 items)
    • 0
    • 1
  • None
  • None
  • Experiment
    Experiment
    • EyeTracker
      EyeTracker
      • None
      • None
      • None
      • None
      • 1000
      • None
      • None
    • Screen
      Screen
      • 68
      • 30.2
      • 1024
      • 'upper left'
      • tuple (2 items)
        • 1280
        • 1024
      • tuple (2 items)
        • 38
        • 30.2
      • 38
      • 1280
      • 15.599386487782953
      • -15.599386487782953
      • 12.508044410882546
      • -12.508044410882546

The processed result has been added as a new column named velocity to our gaze dataframe.

We can also use the Savitzky-Golay differentiation filter with some additional parameters like this:

dataset.pos2vel(method='savitzky_golay', degree=2, window_length=7)

dataset.gaze[0]
Gaze
  • DataFrame (6 columns, 17223 rows)
    shape: (17_223, 6)
    timestimuli_xstimuli_ypixelpositionvelocity
    i64f64f64list[f64]list[f64]list[f64]
    1988145-1.0-1.0[206.8, 152.4][-10.697598, -8.852399][1.207641, -3.119165]
    1988146-1.0-1.0[206.9, 152.1][-10.695183, -8.859678][1.20764, -4.072198]
    1988147-1.0-1.0[207.0, 151.8][-10.692768, -8.866956][1.035119, -4.765267]
    1988148-1.0-1.0[207.1, 151.7][-10.690352, -8.869381][1.207654, -4.245382]
    1988149-1.0-1.0[207.0, 151.5][-10.692768, -8.874233][1.552735, -2.339263]
    2005363-1.0-1.0[361.0, 415.4][-6.932438, -2.386672][-62.062479, -20.465552]
    2005364-1.0-1.0[358.0, 414.5][-7.006376, -2.408998][-61.343786, -18.073031]
    2005365-1.0-1.0[355.8, 413.8][-7.060582, -2.426362][-53.501231, -14.617634]
    2005366-1.0-1.0[353.1, 413.2][-7.12709, -2.441245][-41.879965, -10.276475]
    2005367-1.0-1.0[351.2, 412.9][-7.173881, -2.448686][-27.710881, -6.112645]
  • Events
    Events
    • DataFrame (4 columns, 0 rows)
      shape: (0, 4)
      nameonsetoffsetduration
      stri64i64i64
    • None
  • dict (2 items)
    • 0
    • 1
  • None
  • None
  • Experiment
    Experiment
    • EyeTracker
      EyeTracker
      • None
      • None
      • None
      • None
      • 1000
      • None
      • None
    • Screen
      Screen
      • 68
      • 30.2
      • 1024
      • 'upper left'
      • tuple (2 items)
        • 1280
        • 1024
      • tuple (2 items)
        • 38
        • 30.2
      • 38
      • 1280
      • 15.599386487782953
      • -15.599386487782953
      • 12.508044410882546
      • -12.508044410882546

This has overwritten our velocity columns. As we see, the values in the velocity columns are slightly different.

What you have learned in this tutorial:#

  • transforming pixel coordinates into degrees of visual angle by using Dataset.pix2deg()

  • transforming positional data into velocity data by using Dataset.pos2vel()

  • passing additional keyword arguments when using the Savitzky-Golay differentiation filter