Downloading Public Datasets#

What you will learn in this tutorial:#

  • how to get an overview of the available public datasets

  • how to download and extract one of the available public datasets

  • how to customize the default directory structure

Preparations#

We import pymovements as the alias pm for convenience.

import pymovements as pm

pymovements provides a library of publicly available datasets.

You can browse through the available dataset definitions here: Dataset

To get the names of all currently available datasets, you can use the DatasetLibrary.names() method:

pm.DatasetLibrary.names()
['BSC',
 'BSCII',
 'ChineseReading',
 'CoLAGaze',
 'CodeComprehension',
 'CopCo',
 'DAEMONS',
 'DIDEC',
 'EMTeC',
 'ETDD70',
 'FakeNewsPerception',
 'Gaze4Hate',
 'GazeBase',
 'GazeBaseVR',
 'GazeGraph',
 'GazeOnFaces',
 'HBN',
 'IITB_HGC',
 'InteRead',
 'JuDo1000',
 'MECOL1W1',
 'MECOL2W1',
 'MECOL2W2',
 'MouseCursor',
 'OneStop',
 'PoTeC',
 'PotsdamBingeRemotePVT',
 'PotsdamBingeWearablePVT',
 'Provo',
 'SBSAT',
 'TECO',
 'ToyDataset',
 'ToyDatasetEyeLink',
 'UCL']

For this tutorial we will limit ourselves to the ToyDataset due to its minimal space requirements.

Other datasets can be downloaded by simply replacing ToyDataset with one of the other available datasets.

If you want to get more information about a specific dataset without downloading it yet, you can use the DatasetLibrary.get() method:

pm.DatasetLibrary.get('ToyDataset')
DatasetDefinition
  • None
    None
  • dict (0 items)
    • dict (1 items)
      • dict (4 items)
        • list (5 items)
          • 'timestamp'
          • 'x'
          • (3 more)
        • dict (5 items)
          • Float64
            Float64
          • Float64
            Float64
          • (3 more)
        • (2 more)
    • None
      None
    • Experiment
      Experiment
      • EyeTracker
        EyeTracker
        • None
          None
        • None
          None
        • None
          None
        • None
          None
        • 1000
          1000
        • None
          None
        • None
          None
      • 1000
        1000
      • Screen
        Screen
        • 68
          68
        • 30.2
          30.2
        • 1024
          1024
        • 'upper left'
          'upper left'
        • 38
          38
        • 1280
          1280
        • 15.599386487782953
          15.599386487782953
        • -15.599386487782953
          -15.599386487782953
        • 12.508044410882546
          12.508044410882546
        • -12.508044410882546
          -12.508044410882546
    • None
      None
    • dict (1 items)
      • 'trial_{text_id:d}_{page_id:d}.csv'
        'trial_{text_id:d}_{page_id:d}.csv'
    • dict (1 items)
      • dict (2 items)
        • <class 'int'>
          <class 'int'>
        • <class 'int'>
          <class 'int'>
    • True
      True
    • 'pymovements Toy Dataset'
      'pymovements Toy Dataset'
    • dict (0 items)
      • 'ToyDataset'
        'ToyDataset'
      • list (2 items)
        • 'x'
        • 'y'
      • None
        None
      • list (1 items)
        • ResourceDefinition
          • 'gaze'
            'gaze'
          • 'pymovements-toy-dataset.zip'
            'pymovements-toy-dataset.zip'
          • 'trial_{text_id:d}_{page_id:d}.csv'
            'trial_{text_id:d}_{page_id:d}.csv'
          • dict (2 items)
            • <class 'int'>
              <class 'int'>
            • <class 'int'>
              <class 'int'>
          • None
            None
          • None
            None
          • '256901852c1c07581d375eef705855d6'
            '256901852c1c07581d375eef705855d6'
          • None
            None
          • str
            'https://github.com/pymovements/pymovements-toy-dataset/archive/refs/heads/main.zip'
      • 'timestamp'
        'timestamp'
      • 'ms'
        'ms'
      • None
        None
      • None
        None

      First we initialize our public dataset by specifying its name and the root data directory.

      Our dataset will then be placed in a directory with the name of the dataset:

      dataset = pm.Dataset('ToyDataset', path='data/ToyDataset')
      
      dataset.path
      
      PosixPath('data/ToyDataset')
      

      If you only want to specify a root directory which contains all your datasets, you can pass a DatasetPaths instance.

      The directory of your dataset will have the same name as in the dataset definition.

      dataset_paths = pm.DatasetPaths(root='data/')
      dataset = pm.Dataset('ToyDataset', path=dataset_paths)
      
      dataset.path
      
      PosixPath('data/ToyDataset')
      

      Can also specify an alternative dataset directory for your downloaded dataset.

      dataset_paths_alt = pm.DatasetPaths(root='data/', dataset='my_dataset')
      dataset_alt = pm.Dataset('ToyDataset', path=dataset_paths_alt)
      
      dataset_alt.path
      
      PosixPath('data/my_dataset')
      

      Downloading#

      The dataset will then be downloaded by calling:

      dataset.download()
      
      INFO:pymovements.dataset.dataset:
              You are downloading the pymovements Toy Dataset. Please be aware that pymovements does not
              host or distribute any dataset resources and only provides a convenient interface to
              download the public dataset resources that were published by their respective authors.
      
              Please cite the referenced publication if you intend to use the dataset in your research.
              
      
      Using already downloaded and verified file: data/ToyDataset/downloads/pymovements-toy-dataset.zip
      Extracting pymovements-toy-dataset.zip to data/ToyDataset/raw
      
        0%|          | 0/23 [00:00<?, ?it/s]
      
      100%|██████████| 23/23 [00:00<00:00, 341.03it/s]
      
      
      
      Dataset
      • DatasetDefinition
        DatasetDefinition
        • None
          None
        • dict (0 items)
          • dict (1 items)
            • dict (4 items)
              • list (5 items)
                • 'timestamp'
                • 'x'
                • (3 more)
              • dict (5 items)
                • Float64
                  Float64
                • Float64
                  Float64
                • (3 more)
              • (2 more)
          • None
            None
          • Experiment
            Experiment
            • EyeTracker
              EyeTracker
              • None
                None
              • None
                None
              • None
                None
              • None
                None
              • 1000
                1000
              • None
                None
              • None
                None
            • 1000
              1000
            • Screen
              Screen
              • 68
                68
              • 30.2
                30.2
              • 1024
                1024
              • 'upper left'
                'upper left'
              • 38
                38
              • 1280
                1280
              • 15.599386487782953
                15.599386487782953
              • -15.599386487782953
                -15.599386487782953
              • 12.508044410882546
                12.508044410882546
              • -12.508044410882546
                -12.508044410882546
          • None
            None
          • dict (1 items)
            • 'trial_{text_id:d}_{page_id:d}.csv'
              'trial_{text_id:d}_{page_id:d}.csv'
          • dict (1 items)
            • dict (2 items)
              • <class 'int'>
                <class 'int'>
              • <class 'int'>
                <class 'int'>
          • True
            True
          • 'pymovements Toy Dataset'
            'pymovements Toy Dataset'
          • dict (0 items)
            • 'ToyDataset'
              'ToyDataset'
            • list (2 items)
              • 'x'
              • 'y'
            • None
              None
            • list (1 items)
              • ResourceDefinition
                • 'gaze'
                  'gaze'
                • 'pymovements-toy-dataset.zip'
                  'pymovements-toy-dataset.zip'
                • 'trial_{text_id:d}_{page_id:d}.csv'
                  'trial_{text_id:d}_{page_id:d}.csv'
                • dict (2 items)
                  • <class 'int'>
                    <class 'int'>
                  • <class 'int'>
                    <class 'int'>
                • None
                  None
                • None
                  None
                • '256901852c1c07581d375eef705855d6'
                  '256901852c1c07581d375eef705855d6'
                • None
                  None
                • str
                  'https://github.com/pymovements/pymovements-toy-dataset/archive/refs/heads/main.zip'
            • 'timestamp'
              'timestamp'
            • 'ms'
              'ms'
            • None
              None
            • None
              None
          • ()
            ()
          • DataFrame (0 columns, 0 rows)
            shape: (0, 0)
          • list (0 items)
            • PosixPath('data/ToyDataset')
              PosixPath('data/ToyDataset')
            • DatasetPaths
              DatasetPaths
              • PosixPath('data/ToyDataset')
                PosixPath('data/ToyDataset')
              • PosixPath('data/ToyDataset/downloads')
                PosixPath('data/ToyDataset/downloads')
              • PosixPath('data/ToyDataset/events')
                PosixPath('data/ToyDataset/events')
              • PosixPath('data/ToyDataset/precomputed_events')
                PosixPath('data/ToyDataset/precomputed_events')
              • PosixPath
                PosixPath('data/ToyDataset/precomputed_reading_measures')
              • PosixPath('data/ToyDataset/preprocessed')
                PosixPath('data/ToyDataset/preprocessed')
              • PosixPath('data/ToyDataset/raw')
                PosixPath('data/ToyDataset/raw')
              • PosixPath('data')
                PosixPath('data')
            • list (0 items)
              • list (0 items)

                As we see from the download message, the dataset resource has been downloaded to a downloads directory.

                You can get the path to the downloads directory from the downloads attribute:

                dataset.paths.downloads
                
                PosixPath('data/ToyDataset/downloads')
                

                You can also specify a custom directory name during initialization:

                dataset_paths_3 = pm.DatasetPaths(root='data/', downloads='new_downloads')
                dataset_3 = pm.Dataset('ToyDataset', path=dataset_paths_3)
                
                dataset_3.paths.downloads
                
                PosixPath('data/ToyDataset/new_downloads')
                

                By default, all archives are recursively extracted to Dataset.paths.raw:

                dataset.paths.raw
                
                PosixPath('data/ToyDataset/raw')
                

                If you want to remove the downloaded archives after extraction to save some space, you can set remove_finished to True:

                dataset.extract(remove_finished=True)
                
                Extracting pymovements-toy-dataset.zip to data/ToyDataset/raw
                
                  0%|          | 0/23 [00:00<?, ?it/s]
                
                100%|██████████| 23/23 [00:00<00:00, 341.61it/s]
                
                
                
                Dataset
                • DatasetDefinition
                  DatasetDefinition
                  • None
                    None
                  • dict (0 items)
                    • dict (1 items)
                      • dict (4 items)
                        • list (5 items)
                          • 'timestamp'
                          • 'x'
                          • (3 more)
                        • dict (5 items)
                          • Float64
                            Float64
                          • Float64
                            Float64
                          • (3 more)
                        • (2 more)
                    • None
                      None
                    • Experiment
                      Experiment
                      • EyeTracker
                        EyeTracker
                        • None
                          None
                        • None
                          None
                        • None
                          None
                        • None
                          None
                        • 1000
                          1000
                        • None
                          None
                        • None
                          None
                      • 1000
                        1000
                      • Screen
                        Screen
                        • 68
                          68
                        • 30.2
                          30.2
                        • 1024
                          1024
                        • 'upper left'
                          'upper left'
                        • 38
                          38
                        • 1280
                          1280
                        • 15.599386487782953
                          15.599386487782953
                        • -15.599386487782953
                          -15.599386487782953
                        • 12.508044410882546
                          12.508044410882546
                        • -12.508044410882546
                          -12.508044410882546
                    • None
                      None
                    • dict (1 items)
                      • 'trial_{text_id:d}_{page_id:d}.csv'
                        'trial_{text_id:d}_{page_id:d}.csv'
                    • dict (1 items)
                      • dict (2 items)
                        • <class 'int'>
                          <class 'int'>
                        • <class 'int'>
                          <class 'int'>
                    • True
                      True
                    • 'pymovements Toy Dataset'
                      'pymovements Toy Dataset'
                    • dict (0 items)
                      • 'ToyDataset'
                        'ToyDataset'
                      • list (2 items)
                        • 'x'
                        • 'y'
                      • None
                        None
                      • list (1 items)
                        • ResourceDefinition
                          • 'gaze'
                            'gaze'
                          • 'pymovements-toy-dataset.zip'
                            'pymovements-toy-dataset.zip'
                          • 'trial_{text_id:d}_{page_id:d}.csv'
                            'trial_{text_id:d}_{page_id:d}.csv'
                          • dict (2 items)
                            • <class 'int'>
                              <class 'int'>
                            • <class 'int'>
                              <class 'int'>
                          • None
                            None
                          • None
                            None
                          • '256901852c1c07581d375eef705855d6'
                            '256901852c1c07581d375eef705855d6'
                          • None
                            None
                          • str
                            'https://github.com/pymovements/pymovements-toy-dataset/archive/refs/heads/main.zip'
                      • 'timestamp'
                        'timestamp'
                      • 'ms'
                        'ms'
                      • None
                        None
                      • None
                        None
                    • ()
                      ()
                    • DataFrame (0 columns, 0 rows)
                      shape: (0, 0)
                    • list (0 items)
                      • PosixPath('data/ToyDataset')
                        PosixPath('data/ToyDataset')
                      • DatasetPaths
                        DatasetPaths
                        • PosixPath('data/ToyDataset')
                          PosixPath('data/ToyDataset')
                        • PosixPath('data/ToyDataset/downloads')
                          PosixPath('data/ToyDataset/downloads')
                        • PosixPath('data/ToyDataset/events')
                          PosixPath('data/ToyDataset/events')
                        • PosixPath('data/ToyDataset/precomputed_events')
                          PosixPath('data/ToyDataset/precomputed_events')
                        • PosixPath
                          PosixPath('data/ToyDataset/precomputed_reading_measures')
                        • PosixPath('data/ToyDataset/preprocessed')
                          PosixPath('data/ToyDataset/preprocessed')
                        • PosixPath('data/ToyDataset/raw')
                          PosixPath('data/ToyDataset/raw')
                        • PosixPath('data')
                          PosixPath('data')
                      • list (0 items)
                        • list (0 items)

                          This is also available for the PublicDataset.download() method:

                          dataset.download(remove_finished=True)
                          
                          INFO:pymovements.dataset.dataset:
                                  You are downloading the pymovements Toy Dataset. Please be aware that pymovements does not
                                  host or distribute any dataset resources and only provides a convenient interface to
                                  download the public dataset resources that were published by their respective authors.
                          
                                  Please cite the referenced publication if you intend to use the dataset in your research.
                                  
                          
                          Downloading https://github.com/pymovements/pymovements-toy-dataset/archive/refs/heads/main.zip to data/ToyDataset/downloads/pymovements-toy-dataset.zip
                          
                          Checking integrity of pymovements-toy-dataset.zip
                          Extracting pymovements-toy-dataset.zip to data/ToyDataset/raw
                          
                            0%|          | 0/23 [00:00<?, ?it/s]
                          
                          100%|██████████| 23/23 [00:00<00:00, 341.29it/s]
                          
                          
                          
                          Dataset
                          • DatasetDefinition
                            DatasetDefinition
                            • None
                              None
                            • dict (0 items)
                              • dict (1 items)
                                • dict (4 items)
                                  • list (5 items)
                                    • 'timestamp'
                                    • 'x'
                                    • (3 more)
                                  • dict (5 items)
                                    • Float64
                                      Float64
                                    • Float64
                                      Float64
                                    • (3 more)
                                  • (2 more)
                              • None
                                None
                              • Experiment
                                Experiment
                                • EyeTracker
                                  EyeTracker
                                  • None
                                    None
                                  • None
                                    None
                                  • None
                                    None
                                  • None
                                    None
                                  • 1000
                                    1000
                                  • None
                                    None
                                  • None
                                    None
                                • 1000
                                  1000
                                • Screen
                                  Screen
                                  • 68
                                    68
                                  • 30.2
                                    30.2
                                  • 1024
                                    1024
                                  • 'upper left'
                                    'upper left'
                                  • 38
                                    38
                                  • 1280
                                    1280
                                  • 15.599386487782953
                                    15.599386487782953
                                  • -15.599386487782953
                                    -15.599386487782953
                                  • 12.508044410882546
                                    12.508044410882546
                                  • -12.508044410882546
                                    -12.508044410882546
                              • None
                                None
                              • dict (1 items)
                                • 'trial_{text_id:d}_{page_id:d}.csv'
                                  'trial_{text_id:d}_{page_id:d}.csv'
                              • dict (1 items)
                                • dict (2 items)
                                  • <class 'int'>
                                    <class 'int'>
                                  • <class 'int'>
                                    <class 'int'>
                              • True
                                True
                              • 'pymovements Toy Dataset'
                                'pymovements Toy Dataset'
                              • dict (0 items)
                                • 'ToyDataset'
                                  'ToyDataset'
                                • list (2 items)
                                  • 'x'
                                  • 'y'
                                • None
                                  None
                                • list (1 items)
                                  • ResourceDefinition
                                    • 'gaze'
                                      'gaze'
                                    • 'pymovements-toy-dataset.zip'
                                      'pymovements-toy-dataset.zip'
                                    • 'trial_{text_id:d}_{page_id:d}.csv'
                                      'trial_{text_id:d}_{page_id:d}.csv'
                                    • dict (2 items)
                                      • <class 'int'>
                                        <class 'int'>
                                      • <class 'int'>
                                        <class 'int'>
                                    • None
                                      None
                                    • None
                                      None
                                    • '256901852c1c07581d375eef705855d6'
                                      '256901852c1c07581d375eef705855d6'
                                    • None
                                      None
                                    • str
                                      'https://github.com/pymovements/pymovements-toy-dataset/archive/refs/heads/main.zip'
                                • 'timestamp'
                                  'timestamp'
                                • 'ms'
                                  'ms'
                                • None
                                  None
                                • None
                                  None
                              • ()
                                ()
                              • DataFrame (0 columns, 0 rows)
                                shape: (0, 0)
                              • list (0 items)
                                • PosixPath('data/ToyDataset')
                                  PosixPath('data/ToyDataset')
                                • DatasetPaths
                                  DatasetPaths
                                  • PosixPath('data/ToyDataset')
                                    PosixPath('data/ToyDataset')
                                  • PosixPath('data/ToyDataset/downloads')
                                    PosixPath('data/ToyDataset/downloads')
                                  • PosixPath('data/ToyDataset/events')
                                    PosixPath('data/ToyDataset/events')
                                  • PosixPath('data/ToyDataset/precomputed_events')
                                    PosixPath('data/ToyDataset/precomputed_events')
                                  • PosixPath
                                    PosixPath('data/ToyDataset/precomputed_reading_measures')
                                  • PosixPath('data/ToyDataset/preprocessed')
                                    PosixPath('data/ToyDataset/preprocessed')
                                  • PosixPath('data/ToyDataset/raw')
                                    PosixPath('data/ToyDataset/raw')
                                  • PosixPath('data')
                                    PosixPath('data')
                                • list (0 items)
                                  • list (0 items)

                                    Inspecting the dataset#

                                    The Dataset class provides a method to scan the dataset files and create a fileinfo table. This is useful to get an overview of the dataset structure and for example to check if all files have been downloaded correctly and how to specify a subset of files for further processing.

                                    dataset.scan()
                                    
                                    Dataset
                                    • DatasetDefinition
                                      DatasetDefinition
                                      • None
                                        None
                                      • dict (0 items)
                                        • dict (1 items)
                                          • dict (4 items)
                                            • list (5 items)
                                              • 'timestamp'
                                              • 'x'
                                              • (3 more)
                                            • dict (5 items)
                                              • Float64
                                                Float64
                                              • Float64
                                                Float64
                                              • (3 more)
                                            • (2 more)
                                        • None
                                          None
                                        • Experiment
                                          Experiment
                                          • EyeTracker
                                            EyeTracker
                                            • None
                                              None
                                            • None
                                              None
                                            • None
                                              None
                                            • None
                                              None
                                            • 1000
                                              1000
                                            • None
                                              None
                                            • None
                                              None
                                          • 1000
                                            1000
                                          • Screen
                                            Screen
                                            • 68
                                              68
                                            • 30.2
                                              30.2
                                            • 1024
                                              1024
                                            • 'upper left'
                                              'upper left'
                                            • 38
                                              38
                                            • 1280
                                              1280
                                            • 15.599386487782953
                                              15.599386487782953
                                            • -15.599386487782953
                                              -15.599386487782953
                                            • 12.508044410882546
                                              12.508044410882546
                                            • -12.508044410882546
                                              -12.508044410882546
                                        • None
                                          None
                                        • dict (1 items)
                                          • 'trial_{text_id:d}_{page_id:d}.csv'
                                            'trial_{text_id:d}_{page_id:d}.csv'
                                        • dict (1 items)
                                          • dict (2 items)
                                            • <class 'int'>
                                              <class 'int'>
                                            • <class 'int'>
                                              <class 'int'>
                                        • True
                                          True
                                        • 'pymovements Toy Dataset'
                                          'pymovements Toy Dataset'
                                        • dict (0 items)
                                          • 'ToyDataset'
                                            'ToyDataset'
                                          • list (2 items)
                                            • 'x'
                                            • 'y'
                                          • None
                                            None
                                          • list (1 items)
                                            • ResourceDefinition
                                              • 'gaze'
                                                'gaze'
                                              • 'pymovements-toy-dataset.zip'
                                                'pymovements-toy-dataset.zip'
                                              • 'trial_{text_id:d}_{page_id:d}.csv'
                                                'trial_{text_id:d}_{page_id:d}.csv'
                                              • dict (2 items)
                                                • <class 'int'>
                                                  <class 'int'>
                                                • <class 'int'>
                                                  <class 'int'>
                                              • None
                                                None
                                              • None
                                                None
                                              • '256901852c1c07581d375eef705855d6'
                                                '256901852c1c07581d375eef705855d6'
                                              • None
                                                None
                                              • str
                                                'https://github.com/pymovements/pymovements-toy-dataset/archive/refs/heads/main.zip'
                                          • 'timestamp'
                                            'timestamp'
                                          • 'ms'
                                            'ms'
                                          • None
                                            None
                                          • None
                                            None
                                        • ()
                                          ()
                                        • dict (1 items)
                                          • DataFrame (5 columns, 20 rows)
                                            shape: (20, 5)
                                            text_idpage_idfilepathload_functionload_kwargs
                                            i64i64strnullnull
                                            01"pymovements-toy-dataset-main/d…nullnull
                                            02"pymovements-toy-dataset-main/d…nullnull
                                            03"pymovements-toy-dataset-main/d…nullnull
                                            04"pymovements-toy-dataset-main/d…nullnull
                                            05"pymovements-toy-dataset-main/d…nullnull
                                            31"pymovements-toy-dataset-main/d…nullnull
                                            32"pymovements-toy-dataset-main/d…nullnull
                                            33"pymovements-toy-dataset-main/d…nullnull
                                            34"pymovements-toy-dataset-main/d…nullnull
                                            35"pymovements-toy-dataset-main/d…nullnull
                                        • list (0 items)
                                          • PosixPath('data/ToyDataset')
                                            PosixPath('data/ToyDataset')
                                          • DatasetPaths
                                            DatasetPaths
                                            • PosixPath('data/ToyDataset')
                                              PosixPath('data/ToyDataset')
                                            • PosixPath('data/ToyDataset/downloads')
                                              PosixPath('data/ToyDataset/downloads')
                                            • PosixPath('data/ToyDataset/events')
                                              PosixPath('data/ToyDataset/events')
                                            • PosixPath('data/ToyDataset/precomputed_events')
                                              PosixPath('data/ToyDataset/precomputed_events')
                                            • PosixPath
                                              PosixPath('data/ToyDataset/precomputed_reading_measures')
                                            • PosixPath('data/ToyDataset/preprocessed')
                                              PosixPath('data/ToyDataset/preprocessed')
                                            • PosixPath('data/ToyDataset/raw')
                                              PosixPath('data/ToyDataset/raw')
                                            • PosixPath('data')
                                              PosixPath('data')
                                          • list (0 items)
                                            • list (0 items)

                                              Loading into memory#

                                              Based on the fileinfo table, we can define a subset of the dataset that we want to load into our working memory. We can do this by specifying a dictionary of the format dict[str, float | int | str | list[float | int | str]] where the keys are the column names of the fileinfo table and the values are the specifications of the files to load:

                                              dataset.load(subset={'text_id': [1, 2], 'page_id': 1})

                                              However, in this case we will load the entire dataset, so we do not need to specify a subset. We simply load the data into our working memory by using the Dataset.load() method without any additional arguments:

                                              dataset.load()
                                              
                                              Dataset
                                              • DatasetDefinition
                                                DatasetDefinition
                                                • None
                                                  None
                                                • dict (0 items)
                                                  • dict (1 items)
                                                    • dict (4 items)
                                                      • list (5 items)
                                                        • 'timestamp'
                                                        • 'x'
                                                        • (3 more)
                                                      • dict (5 items)
                                                        • Float64
                                                          Float64
                                                        • Float64
                                                          Float64
                                                        • (3 more)
                                                      • (2 more)
                                                  • None
                                                    None
                                                  • Experiment
                                                    Experiment
                                                    • EyeTracker
                                                      EyeTracker
                                                      • None
                                                        None
                                                      • None
                                                        None
                                                      • None
                                                        None
                                                      • None
                                                        None
                                                      • 1000
                                                        1000
                                                      • None
                                                        None
                                                      • None
                                                        None
                                                    • 1000
                                                      1000
                                                    • Screen
                                                      Screen
                                                      • 68
                                                        68
                                                      • 30.2
                                                        30.2
                                                      • 1024
                                                        1024
                                                      • 'upper left'
                                                        'upper left'
                                                      • 38
                                                        38
                                                      • 1280
                                                        1280
                                                      • 15.599386487782953
                                                        15.599386487782953
                                                      • -15.599386487782953
                                                        -15.599386487782953
                                                      • 12.508044410882546
                                                        12.508044410882546
                                                      • -12.508044410882546
                                                        -12.508044410882546
                                                  • None
                                                    None
                                                  • dict (1 items)
                                                    • 'trial_{text_id:d}_{page_id:d}.csv'
                                                      'trial_{text_id:d}_{page_id:d}.csv'
                                                  • dict (1 items)
                                                    • dict (2 items)
                                                      • <class 'int'>
                                                        <class 'int'>
                                                      • <class 'int'>
                                                        <class 'int'>
                                                  • True
                                                    True
                                                  • 'pymovements Toy Dataset'
                                                    'pymovements Toy Dataset'
                                                  • dict (0 items)
                                                    • 'ToyDataset'
                                                      'ToyDataset'
                                                    • list (2 items)
                                                      • 'x'
                                                      • 'y'
                                                    • None
                                                      None
                                                    • list (1 items)
                                                      • ResourceDefinition
                                                        • 'gaze'
                                                          'gaze'
                                                        • 'pymovements-toy-dataset.zip'
                                                          'pymovements-toy-dataset.zip'
                                                        • 'trial_{text_id:d}_{page_id:d}.csv'
                                                          'trial_{text_id:d}_{page_id:d}.csv'
                                                        • dict (2 items)
                                                          • <class 'int'>
                                                            <class 'int'>
                                                          • <class 'int'>
                                                            <class 'int'>
                                                        • None
                                                          None
                                                        • None
                                                          None
                                                        • '256901852c1c07581d375eef705855d6'
                                                          '256901852c1c07581d375eef705855d6'
                                                        • None
                                                          None
                                                        • str
                                                          'https://github.com/pymovements/pymovements-toy-dataset/archive/refs/heads/main.zip'
                                                    • 'timestamp'
                                                      'timestamp'
                                                    • 'ms'
                                                      'ms'
                                                    • None
                                                      None
                                                    • None
                                                      None
                                                  • tuple
                                                    (shape: (0, 6) ┌─────────┬─────────┬──────┬───────┬────────┬──────────┐ │ text_id ┆ page_id ┆ name ┆ onset ┆ offset ┆ duration │ │ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │ │ i64 ┆ i64 ┆ str ┆ i64 ┆ i64 ┆ i64 │ ╞═════════╪═════════╪══════╪═══════╪════════╪══════════╡ └─────────┴─────────┴──────┴───────┴────────┴──────────┘, shape: (0, 6) ┌─────────┬─────────┬──────┬───────┬────────┬──────────┐ │ text_id ┆ page_id ┆ name ┆ onset ┆ offset ┆ duration │ │ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │ │ i64 ┆ i64 ┆ str ┆ i64 ┆ i64 ┆ i64 │ ╞═════════╪═════════╪══════╪═══════╪════════╪══════════╡ └─────────┴─────────┴──────┴───────┴────────┴──────────┘, shape: (0, 6) ┌─────────┬─────────┬──────┬───────┬────────┬──────────┐ │ text_id ┆ page_id ┆ name ┆ onset ┆ offset ┆ duration │ │ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │ │ i64 ┆ i64 ┆ str ┆ i64 ┆ i64 ┆ i64 │ ╞═════════╪═════════╪══════╪═══════╪════════╪══════════╡ └─────────┴─────────┴──────┴───────┴────────┴──────────┘, shape: (0, 6) ┌─────────┬─────────┬──────┬───────┬────────┬──────────┐ │ text_id ┆ page_id ┆ name ┆ onset ┆ offset ┆ duration │ │ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │ │ i64 ┆ i64 ┆ str ┆ i64 ┆ i64 ┆ i64 │ ╞═════════╪═════════╪══════╪═══════╪════════╪══════════╡ └─────────┴─────────┴──────┴───────┴────────┴──────────┘, shape: (0, 6) ┌─────────┬─────────┬──────┬───────┬────────┬──────────┐ │ text_id ┆ page_id ┆ name ┆ onset ┆ offset ┆ duration │ │ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │ │ i64 ┆ i64 ┆ str ┆ i64 ┆ i64 ┆ i64 │ ╞═════════╪═════════╪══════╪═══════╪════════╪══════════╡ └─────────┴─────────┴──────┴───────┴────────┴──────────┘, shape: (0, 6) ┌─────────┬─────────┬──────┬───────┬────────┬──────────┐ │ text_id ┆ page_id ┆ name ┆ onset ┆ offset ┆ duration │ │ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │ │ i64 ┆ i64 ┆ str ┆ i64 ┆ i64 ┆ i64 │ ╞═════════╪═════════╪══════╪═══════╪════════╪══════════╡ └─────────┴─────────┴──────┴───────┴────────┴──────────┘, shape: (0, 6) ┌─────────┬─────────┬──────┬───────┬────────┬──────────┐ │ text_id ┆ page_id ┆ name ┆ onset ┆ offset ┆ duration │ │ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │ │ i64 ┆ i64 ┆ str ┆ i64 ┆ i64 ┆ i64 │ ╞═════════╪═════════╪══════╪═══════╪════════╪══════════╡ └─────────┴─────────┴──────┴───────┴────────┴──────────┘, shape: (0, 6) ┌─────────┬─────────┬──────┬───────┬────────┬──────────┐ │ text_id ┆ page_id ┆ name ┆ onset ┆ offset ┆ duration │ │ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │ │ i64 ┆ i64 ┆ str ┆ i64 ┆ i64 ┆ i64 │ ╞═════════╪═════════╪══════╪═══════╪════════╪══════════╡ └─────────┴─────────┴──────┴───────┴────────┴──────────┘, shape: (0, 6) ┌─────────┬─────────┬──────┬───────┬────────┬──────────┐ │ text_id ┆ page_id ┆ name ┆ onset ┆ offset ┆ duration │ │ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │ │ i64 ┆ i64 ┆ str ┆ i64 ┆ i64 ┆ i64 │ ╞═════════╪═════════╪══════╪═══════╪════════╪══════════╡ └─────────┴─────────┴──────┴───────┴────────┴──────────┘, shape: (0, 6) ┌─────────┬─────────┬──────┬───────┬────────┬──────────┐ │ text_id ┆ page_id ┆ name ┆ onset ┆ offset ┆ duration │ │ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │ │ i64 ┆ i64 ┆ str ┆ i64 ┆ i64 ┆ i64 │ ╞═════════╪═════════╪══════╪═══════╪════════╪══════════╡ └─────────┴─────────┴──────┴───────┴────────┴──────────┘, shape: (0, 6) ┌─────────┬─────────┬──────┬───────┬────────┬──────────┐ │ text_id ┆ page_id ┆ name ┆ onset ┆ offset ┆ duration │ │ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │ │ i64 ┆ i64 ┆ str ┆ i64 ┆ i64 ┆ i64 │ ╞═════════╪═════════╪══════╪═══════╪════════╪══════════╡ └─────────┴─────────┴──────┴───────┴────────┴──────────┘, shape: (0, 6) ┌─────────┬─────────┬──────┬───────┬────────┬──────────┐ │ text_id ┆ page_id ┆ name ┆ onset ┆ offset ┆ duration │ │ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │ │ i64 ┆ i64 ┆ str ┆ i64 ┆ i64 ┆ i64 │ ╞═════════╪═════════╪══════╪═══════╪════════╪══════════╡ └─────────┴─────────┴──────┴───────┴────────┴──────────┘, shape: (0, 6) ┌─────────┬─────────┬──────┬───────┬────────┬──────────┐ │ text_id ┆ page_id ┆ name ┆ onset ┆ offset ┆ duration │ │ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │ │ i64 ┆ i64 ┆ str ┆ i64 ┆ i64 ┆ i64 │ ╞═════════╪═════════╪══════╪═══════╪════════╪══════════╡ └─────────┴─────────┴──────┴───────┴────────┴──────────┘, shape: (0, 6) ┌─────────┬─────────┬──────┬───────┬────────┬──────────┐ │ text_id ┆ page_id ┆ name ┆ onset ┆ offset ┆ duration │ │ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │ │ i64 ┆ i64 ┆ str ┆ i64 ┆ i64 ┆ i64 │ ╞═════════╪═════════╪══════╪═══════╪════════╪══════════╡ └─────────┴─────────┴──────┴───────┴────────┴──────────┘, shape: (0, 6) ┌─────────┬─────────┬──────┬───────┬────────┬──────────┐ │ text_id ┆ page_id ┆ name ┆ onset ┆ offset ┆ duration │ │ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │ │ i64 ┆ i64 ┆ str ┆ i64 ┆ i64 ┆ i64 │ ╞═════════╪═════════╪══════╪═══════╪════════╪══════════╡ └─────────┴─────────┴──────┴───────┴────────┴──────────┘, shape: (0, 6) ┌─────────┬─────────┬──────┬───────┬────────┬──────────┐ │ text_id ┆ page_id ┆ name ┆ onset ┆ offset ┆ duration │ │ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │ │ i64 ┆ i64 ┆ str ┆ i64 ┆ i64 ┆ i64 │ ╞═════════╪═════════╪══════╪═══════╪════════╪══════════╡ └─────────┴─────────┴──────┴───────┴────────┴──────────┘, shape: (0, 6) ┌─────────┬─────────┬──────┬───────┬────────┬──────────┐ │ text_id ┆ page_id ┆ name ┆ onset ┆ offset ┆ duration │ │ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │ │ i64 ┆ i64 ┆ str ┆ i64 ┆ i64 ┆ i64 │ ╞═════════╪═════════╪══════╪═══════╪════════╪══════════╡ └─────────┴─────────┴──────┴───────┴────────┴──────────┘, shape: (0, 6) ┌─────────┬─────────┬──────┬───────┬────────┬──────────┐ │ text_id ┆ page_id ┆ name ┆ onset ┆ offset ┆ duration │ │ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │ │ i64 ┆ i64 ┆ str ┆ i64 ┆ i64 ┆ i64 │ ╞═════════╪═════════╪══════╪═══════╪════════╪══════════╡ └─────────┴─────────┴──────┴───────┴────────┴──────────┘, shape: (0, 6) ┌─────────┬─────────┬──────┬───────┬────────┬──────────┐ │ text_id ┆ page_id ┆ name ┆ onset ┆ offset ┆ duration │ │ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │ │ i64 ┆ i64 ┆ str ┆ i64 ┆ i64 ┆ i64 │ ╞═════════╪═════════╪══════╪═══════╪════════╪══════════╡ └─────────┴─────────┴──────┴───────┴────────┴──────────┘, shape: (0, 6) ┌─────────┬─────────┬──────┬───────┬────────┬──────────┐ │ text_id ┆ page_id ┆ name ┆ onset ┆ offset ┆ duration │ │ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │ │ i64 ┆ i64 ┆ str ┆ i64 ┆ i64 ┆ i64 │ ╞═════════╪═════════╪══════╪═══════╪════════╪══════════╡ └─────────┴─────────┴──────┴───────┴────────┴──────────┘)
                                                  • dict (1 items)
                                                    • DataFrame (5 columns, 20 rows)
                                                      shape: (20, 5)
                                                      text_idpage_idfilepathload_functionload_kwargs
                                                      i64i64strnullnull
                                                      01"pymovements-toy-dataset-main/d…nullnull
                                                      02"pymovements-toy-dataset-main/d…nullnull
                                                      03"pymovements-toy-dataset-main/d…nullnull
                                                      04"pymovements-toy-dataset-main/d…nullnull
                                                      05"pymovements-toy-dataset-main/d…nullnull
                                                      31"pymovements-toy-dataset-main/d…nullnull
                                                      32"pymovements-toy-dataset-main/d…nullnull
                                                      33"pymovements-toy-dataset-main/d…nullnull
                                                      34"pymovements-toy-dataset-main/d…nullnull
                                                      35"pymovements-toy-dataset-main/d…nullnull
                                                  • list (20 items)
                                                    • Gaze
                                                      • DataFrame (6 columns, 17223 rows)
                                                        shape: (17_223, 6)
                                                        timestimuli_xstimuli_ytext_idpage_idpixel
                                                        i64f64f64i64i64list[f64]
                                                        1988145-1.0-1.001[206.8, 152.4]
                                                        1988146-1.0-1.001[206.9, 152.1]
                                                        1988147-1.0-1.001[207.0, 151.8]
                                                        1988148-1.0-1.001[207.1, 151.7]
                                                        1988149-1.0-1.001[207.0, 151.5]
                                                        2005363-1.0-1.001[361.0, 415.4]
                                                        2005364-1.0-1.001[358.0, 414.5]
                                                        2005365-1.0-1.001[355.8, 413.8]
                                                        2005366-1.0-1.001[353.1, 413.2]
                                                        2005367-1.0-1.001[351.2, 412.9]
                                                      • Events
                                                        Events
                                                        • DataFrame (6 columns, 0 rows)
                                                          shape: (0, 6)
                                                          text_idpage_idnameonsetoffsetduration
                                                          i64i64stri64i64i64
                                                        • list (2 items)
                                                          • 'text_id'
                                                          • 'page_id'
                                                      • list (2 items)
                                                        • 'text_id'
                                                        • 'page_id'
                                                      • Experiment
                                                        Experiment
                                                        • EyeTracker
                                                          EyeTracker
                                                          • None
                                                            None
                                                          • None
                                                            None
                                                          • None
                                                            None
                                                          • None
                                                            None
                                                          • 1000
                                                            1000
                                                          • None
                                                            None
                                                          • None
                                                            None
                                                        • 1000
                                                          1000
                                                        • Screen
                                                          Screen
                                                          • 68
                                                            68
                                                          • 30.2
                                                            30.2
                                                          • 1024
                                                            1024
                                                          • 'upper left'
                                                            'upper left'
                                                          • 38
                                                            38
                                                          • 1280
                                                            1280
                                                          • 15.599386487782953
                                                            15.599386487782953
                                                          • -15.599386487782953
                                                            -15.599386487782953
                                                          • 12.508044410882546
                                                            12.508044410882546
                                                          • -12.508044410882546
                                                            -12.508044410882546
                                                    • Gaze
                                                      • DataFrame (6 columns, 29799 rows)
                                                        shape: (29_799, 6)
                                                        timestimuli_xstimuli_ytext_idpage_idpixel
                                                        i64f64f64i64i64list[f64]
                                                        2008305-1.0-1.002[141.4, 153.6]
                                                        2008306-1.0-1.002[141.1, 153.2]
                                                        2008307-1.0-1.002[140.7, 152.8]
                                                        2008308-1.0-1.002[140.6, 152.7]
                                                        2008309-1.0-1.002[140.5, 152.6]
                                                        2038099-1.0-1.002[273.8, 773.8]
                                                        2038100-1.0-1.002[273.8, 774.1]
                                                        2038101-1.0-1.002[273.9, 774.5]
                                                        2038102-1.0-1.002[274.0, 774.4]
                                                        2038103-1.0-1.002[274.0, 773.9]
                                                      • Events
                                                        Events
                                                        • DataFrame (6 columns, 0 rows)
                                                          shape: (0, 6)
                                                          text_idpage_idnameonsetoffsetduration
                                                          i64i64stri64i64i64
                                                        • list (2 items)
                                                          • 'text_id'
                                                          • 'page_id'
                                                      • list (2 items)
                                                        • 'text_id'
                                                        • 'page_id'
                                                      • Experiment
                                                        Experiment
                                                        • EyeTracker
                                                          EyeTracker
                                                          • None
                                                            None
                                                          • None
                                                            None
                                                          • None
                                                            None
                                                          • None
                                                            None
                                                          • 1000
                                                            1000
                                                          • None
                                                            None
                                                          • None
                                                            None
                                                        • 1000
                                                          1000
                                                        • Screen
                                                          Screen
                                                          • 68
                                                            68
                                                          • 30.2
                                                            30.2
                                                          • 1024
                                                            1024
                                                          • 'upper left'
                                                            'upper left'
                                                          • 38
                                                            38
                                                          • 1280
                                                            1280
                                                          • 15.599386487782953
                                                            15.599386487782953
                                                          • -15.599386487782953
                                                            -15.599386487782953
                                                          • 12.508044410882546
                                                            12.508044410882546
                                                          • -12.508044410882546
                                                            -12.508044410882546
                                                    • (18 more)
                                                  • PosixPath('data/ToyDataset')
                                                    PosixPath('data/ToyDataset')
                                                  • DatasetPaths
                                                    DatasetPaths
                                                    • PosixPath('data/ToyDataset')
                                                      PosixPath('data/ToyDataset')
                                                    • PosixPath('data/ToyDataset/downloads')
                                                      PosixPath('data/ToyDataset/downloads')
                                                    • PosixPath('data/ToyDataset/events')
                                                      PosixPath('data/ToyDataset/events')
                                                    • PosixPath('data/ToyDataset/precomputed_events')
                                                      PosixPath('data/ToyDataset/precomputed_events')
                                                    • PosixPath
                                                      PosixPath('data/ToyDataset/precomputed_reading_measures')
                                                    • PosixPath('data/ToyDataset/preprocessed')
                                                      PosixPath('data/ToyDataset/preprocessed')
                                                    • PosixPath('data/ToyDataset/raw')
                                                      PosixPath('data/ToyDataset/raw')
                                                    • PosixPath('data')
                                                      PosixPath('data')
                                                  • list (0 items)
                                                    • list (0 items)

                                                      Let’s verify that we have correctly scanned the dataset files:

                                                      dataset.fileinfo
                                                      
                                                      {'gaze': shape: (20, 5)
                                                       ┌─────────┬─────────┬─────────────────────────────────┬───────────────┬─────────────┐
                                                       │ text_id ┆ page_id ┆ filepath                        ┆ load_function ┆ load_kwargs │
                                                       │ ---     ┆ ---     ┆ ---                             ┆ ---           ┆ ---         │
                                                       │ i64     ┆ i64     ┆ str                             ┆ null          ┆ null        │
                                                       ╞═════════╪═════════╪═════════════════════════════════╪═══════════════╪═════════════╡
                                                       │ 0       ┆ 1       ┆ pymovements-toy-dataset-main/d… ┆ null          ┆ null        │
                                                       │ 0       ┆ 2       ┆ pymovements-toy-dataset-main/d… ┆ null          ┆ null        │
                                                       │ 0       ┆ 3       ┆ pymovements-toy-dataset-main/d… ┆ null          ┆ null        │
                                                       │ 0       ┆ 4       ┆ pymovements-toy-dataset-main/d… ┆ null          ┆ null        │
                                                       │ 0       ┆ 5       ┆ pymovements-toy-dataset-main/d… ┆ null          ┆ null        │
                                                       │ …       ┆ …       ┆ …                               ┆ …             ┆ …           │
                                                       │ 3       ┆ 1       ┆ pymovements-toy-dataset-main/d… ┆ null          ┆ null        │
                                                       │ 3       ┆ 2       ┆ pymovements-toy-dataset-main/d… ┆ null          ┆ null        │
                                                       │ 3       ┆ 3       ┆ pymovements-toy-dataset-main/d… ┆ null          ┆ null        │
                                                       │ 3       ┆ 4       ┆ pymovements-toy-dataset-main/d… ┆ null          ┆ null        │
                                                       │ 3       ┆ 5       ┆ pymovements-toy-dataset-main/d… ┆ null          ┆ null        │
                                                       └─────────┴─────────┴─────────────────────────────────┴───────────────┴─────────────┘}
                                                      

                                                      Wonderful, all of our data has been downloaded and loaded in successfully!

                                                      What you have learned in this tutorial:#

                                                      • how to initialize a public dataset

                                                      • how to download and extract dataset resources

                                                      • how to customize the default directory structure

                                                      • how to load the dataset into your working memory