Downloading Public Datasets#
What you will learn in this tutorial:#
how to get an overview of the available public datasets
how to download and extract one of the available public datasets
how to customize the default directory structure
Preparations#
We import pymovements as the alias pm for convenience.
[1]:
import pymovements as pm
pymovements provides a library of publicly available datasets.
You can browse through the available dataset definitions here: Datasets
To get the names of all currently available datasets, you can use the DatasetLibrary.names() method:
[2]:
pm.DatasetLibrary.names()
[2]:
['BSC',
'BSCII',
'ChineseReading',
'CoLAGaze',
'CodeComprehension',
'CopCo',
'DAEMONS',
'DIDEC',
'EMTeC',
'ETDD70',
'FakeNewsPerception',
'Gaze4Hate',
'GazeBase',
'GazeBaseVR',
'GazeGraph',
'GazeOnFaces',
'HBN',
'IITB_HGC',
'InteRead',
'JuDo1000',
'MECOL1W1',
'MECOL2W1',
'MECOL2W2',
'MouseCursor',
'OneStop',
'PoTeC',
'PotsdamBingeRemotePVT',
'PotsdamBingeWearablePVT',
'Provo',
'SBSAT',
'TECO',
'ToyDataset',
'ToyDatasetEyeLink',
'UCL']
For this tutorial we will limit ourselves to the ToyDataset due to its minimal space requirements.
Other datasets can be downloaded by simply replacing ToyDataset with one of the other available datasets.
If you want to get more information about a specific dataset without downloading it yet, you can use the DatasetLibrary.get() method:
[3]:
pm.DatasetLibrary.get('ToyDataset')
[3]:
-
NoneNone
-
dict (0 items)
-
dict (1 items)
-
dict (4 items)
-
list (5 items)
- 'timestamp'
- 'x'
- (3 more)
-
dict (5 items)
-
Float64Float64
-
Float64Float64
- (3 more)
-
- (2 more)
-
-
-
NoneNone
-
ExperimentExperiment
-
EyeTrackerEyeTracker
-
NoneNone
-
NoneNone
-
NoneNone
-
NoneNone
-
10001000
-
NoneNone
-
NoneNone
-
-
10001000
-
ScreenScreen
-
6868
-
30.230.2
-
10241024
-
'upper left''upper left'
-
3838
-
12801280
-
15.59938648778295315.599386487782953
-
-15.599386487782953-15.599386487782953
-
12.50804441088254612.508044410882546
-
-12.508044410882546-12.508044410882546
-
-
-
NoneNone
-
dict (1 items)
-
'trial_{text_id:d}_{page_id:d}.csv''trial_{text_id:d}_{page_id:d}.csv'
-
-
dict (1 items)
-
dict (2 items)
-
<class 'int'><class 'int'>
-
<class 'int'><class 'int'>
-
-
-
TrueTrue
-
'pymovements Toy Dataset''pymovements Toy Dataset'
-
dict (0 items)
-
'ToyDataset''ToyDataset'
-
list (2 items)
- 'x'
- 'y'
-
NoneNone
-
list (1 items)
-
ResourceDefinition
-
'gaze''gaze'
-
'pymovements-toy-dataset.zip''pymovements-toy-dataset.zip'
-
'trial_{text_id:d}_{page_id:d}.csv''trial_{text_id:d}_{page_id:d}.csv'
-
dict (2 items)
-
<class 'int'><class 'int'>
-
<class 'int'><class 'int'>
-
-
NoneNone
-
'4da622457637a8181d86601fe17f3aa8''4da622457637a8181d86601fe17f3aa8'
-
str'http://github.com/aeye-lab/pymovements-toy-dataset/zipball/6cb5d663317bf418cec0c9abe1dde5085a8a8ebd/'
-
-
ResourceDefinition
-
'timestamp''timestamp'
-
'ms''ms'
-
NoneNone
-
NoneNone
First we initialize our public dataset by specifying its name and the root data directory.
Our dataset will then be placed in a directory with the name of the dataset:
[4]:
dataset = pm.Dataset('ToyDataset', path='data/ToyDataset')
dataset.path
[4]:
PosixPath('data/ToyDataset')
If you only want to specify a root directory which contains all your datasets, you can pass a DatasetPaths instance.
The directory of your dataset will have the same name as in the dataset definition.
[5]:
dataset_paths = pm.DatasetPaths(root='data/')
dataset = pm.Dataset('ToyDataset', path=dataset_paths)
dataset.path
[5]:
PosixPath('data/ToyDataset')
Can also specify an alternative dataset directory for your downloaded dataset.
[6]:
dataset_paths_alt = pm.DatasetPaths(root='data/', dataset='my_dataset')
dataset_alt = pm.Dataset('ToyDataset', path=dataset_paths_alt)
dataset_alt.path
[6]:
PosixPath('data/my_dataset')
Downloading#
The dataset will then be downloaded by calling:
[7]:
dataset.download()
INFO:pymovements.dataset.dataset:
You are downloading the pymovements Toy Dataset. Please be aware that pymovements does not
host or distribute any dataset resources and only provides a convenient interface to
download the public dataset resources that were published by their respective authors.
Please cite the referenced publication if you intend to use the dataset in your research.
Using already downloaded and verified file: data/ToyDataset/downloads/pymovements-toy-dataset.zip
Extracting pymovements-toy-dataset.zip to data/ToyDataset/raw
100%|██████████| 23/23 [00:00<00:00, 200.84it/s]
[7]:
-
DatasetDefinitionDatasetDefinition
-
NoneNone
-
dict (0 items)
-
dict (1 items)
-
dict (4 items)
-
list (5 items)
- 'timestamp'
- 'x'
- (3 more)
-
dict (5 items)
-
Float64Float64
-
Float64Float64
- (3 more)
-
- (2 more)
-
-
-
NoneNone
-
ExperimentExperiment
-
EyeTrackerEyeTracker
-
NoneNone
-
NoneNone
-
NoneNone
-
NoneNone
-
10001000
-
NoneNone
-
NoneNone
-
-
10001000
-
ScreenScreen
-
6868
-
30.230.2
-
10241024
-
'upper left''upper left'
-
3838
-
12801280
-
15.59938648778295315.599386487782953
-
-15.599386487782953-15.599386487782953
-
12.50804441088254612.508044410882546
-
-12.508044410882546-12.508044410882546
-
-
-
NoneNone
-
dict (1 items)
-
'trial_{text_id:d}_{page_id:d}.csv''trial_{text_id:d}_{page_id:d}.csv'
-
-
dict (1 items)
-
dict (2 items)
-
<class 'int'><class 'int'>
-
<class 'int'><class 'int'>
-
-
-
TrueTrue
-
'pymovements Toy Dataset''pymovements Toy Dataset'
-
dict (0 items)
-
'ToyDataset''ToyDataset'
-
list (2 items)
- 'x'
- 'y'
-
NoneNone
-
list (1 items)
-
ResourceDefinition
-
'gaze''gaze'
-
'pymovements-toy-dataset.zip''pymovements-toy-dataset.zip'
-
'trial_{text_id:d}_{page_id:d}.csv''trial_{text_id:d}_{page_id:d}.csv'
-
dict (2 items)
-
<class 'int'><class 'int'>
-
<class 'int'><class 'int'>
-
-
NoneNone
-
'4da622457637a8181d86601fe17f3aa8''4da622457637a8181d86601fe17f3aa8'
-
str'http://github.com/aeye-lab/pymovements-toy-dataset/zipball/6cb5d663317bf418cec0c9abe1dde5085a8a8ebd/'
-
-
ResourceDefinition
-
'timestamp''timestamp'
-
'ms''ms'
-
NoneNone
-
NoneNone
-
-
list (0 items)
-
DataFrame (0 columns, 0 rows)shape: (0, 0)
-
list (0 items)
-
PosixPath('data/ToyDataset')PosixPath('data/ToyDataset')
-
DatasetPathsDatasetPaths
-
PosixPath('data/ToyDataset')PosixPath('data/ToyDataset')
-
PosixPath('data/ToyDataset/downloads')PosixPath('data/ToyDataset/downloads')
-
PosixPath('data/ToyDataset/events')PosixPath('data/ToyDataset/events')
-
PosixPath('data/ToyDataset/precomputed_events')PosixPath('data/ToyDataset/precomputed_events')
-
PosixPathPosixPath('data/ToyDataset/precomputed_reading_measures')
-
PosixPath('data/ToyDataset/preprocessed')PosixPath('data/ToyDataset/preprocessed')
-
PosixPath('data/ToyDataset/raw')PosixPath('data/ToyDataset/raw')
-
PosixPath('data')PosixPath('data')
-
-
list (0 items)
-
list (0 items)
As we see from the download message, the dataset resource has been downloaded to a downloads directory.
You can get the path to this directory from the Datset.paths.downloads attribute:
[8]:
dataset.paths.downloads
[8]:
PosixPath('data/ToyDataset/downloads')
You can also specify a custom directory name during initialization:
[9]:
dataset_paths_3 = pm.DatasetPaths(root='data/', downloads='new_downloads')
dataset_3 = pm.Dataset('ToyDataset', path=dataset_paths_3)
dataset_3.paths.downloads
[9]:
PosixPath('data/ToyDataset/new_downloads')
By default, all archives are recursively extracted to Dataset.paths.raw:
[10]:
dataset.paths.raw
[10]:
PosixPath('data/ToyDataset/raw')
If you want to remove the downloaded archives after extraction to save some space, you can set remove_finished to True:
[11]:
dataset.extract(remove_finished=True)
Extracting pymovements-toy-dataset.zip to data/ToyDataset/raw
100%|██████████| 23/23 [00:00<00:00, 201.94it/s]
[11]:
-
DatasetDefinitionDatasetDefinition
-
NoneNone
-
dict (0 items)
-
dict (1 items)
-
dict (4 items)
-
list (5 items)
- 'timestamp'
- 'x'
- (3 more)
-
dict (5 items)
-
Float64Float64
-
Float64Float64
- (3 more)
-
- (2 more)
-
-
-
NoneNone
-
ExperimentExperiment
-
EyeTrackerEyeTracker
-
NoneNone
-
NoneNone
-
NoneNone
-
NoneNone
-
10001000
-
NoneNone
-
NoneNone
-
-
10001000
-
ScreenScreen
-
6868
-
30.230.2
-
10241024
-
'upper left''upper left'
-
3838
-
12801280
-
15.59938648778295315.599386487782953
-
-15.599386487782953-15.599386487782953
-
12.50804441088254612.508044410882546
-
-12.508044410882546-12.508044410882546
-
-
-
NoneNone
-
dict (1 items)
-
'trial_{text_id:d}_{page_id:d}.csv''trial_{text_id:d}_{page_id:d}.csv'
-
-
dict (1 items)
-
dict (2 items)
-
<class 'int'><class 'int'>
-
<class 'int'><class 'int'>
-
-
-
TrueTrue
-
'pymovements Toy Dataset''pymovements Toy Dataset'
-
dict (0 items)
-
'ToyDataset''ToyDataset'
-
list (2 items)
- 'x'
- 'y'
-
NoneNone
-
list (1 items)
-
ResourceDefinition
-
'gaze''gaze'
-
'pymovements-toy-dataset.zip''pymovements-toy-dataset.zip'
-
'trial_{text_id:d}_{page_id:d}.csv''trial_{text_id:d}_{page_id:d}.csv'
-
dict (2 items)
-
<class 'int'><class 'int'>
-
<class 'int'><class 'int'>
-
-
NoneNone
-
'4da622457637a8181d86601fe17f3aa8''4da622457637a8181d86601fe17f3aa8'
-
str'http://github.com/aeye-lab/pymovements-toy-dataset/zipball/6cb5d663317bf418cec0c9abe1dde5085a8a8ebd/'
-
-
ResourceDefinition
-
'timestamp''timestamp'
-
'ms''ms'
-
NoneNone
-
NoneNone
-
-
list (0 items)
-
DataFrame (0 columns, 0 rows)shape: (0, 0)
-
list (0 items)
-
PosixPath('data/ToyDataset')PosixPath('data/ToyDataset')
-
DatasetPathsDatasetPaths
-
PosixPath('data/ToyDataset')PosixPath('data/ToyDataset')
-
PosixPath('data/ToyDataset/downloads')PosixPath('data/ToyDataset/downloads')
-
PosixPath('data/ToyDataset/events')PosixPath('data/ToyDataset/events')
-
PosixPath('data/ToyDataset/precomputed_events')PosixPath('data/ToyDataset/precomputed_events')
-
PosixPathPosixPath('data/ToyDataset/precomputed_reading_measures')
-
PosixPath('data/ToyDataset/preprocessed')PosixPath('data/ToyDataset/preprocessed')
-
PosixPath('data/ToyDataset/raw')PosixPath('data/ToyDataset/raw')
-
PosixPath('data')PosixPath('data')
-
-
list (0 items)
-
list (0 items)
This is also available for the PublicDataset.download() method:
[12]:
dataset.download(remove_finished=True)
INFO:pymovements.dataset.dataset:
You are downloading the pymovements Toy Dataset. Please be aware that pymovements does not
host or distribute any dataset resources and only provides a convenient interface to
download the public dataset resources that were published by their respective authors.
Please cite the referenced publication if you intend to use the dataset in your research.
Downloading http://github.com/aeye-lab/pymovements-toy-dataset/zipball/6cb5d663317bf418cec0c9abe1dde5085a8a8ebd/ to data/ToyDataset/downloads/pymovements-toy-dataset.zip
Checking integrity of pymovements-toy-dataset.zip
Extracting pymovements-toy-dataset.zip to data/ToyDataset/raw
100%|██████████| 23/23 [00:00<00:00, 201.93it/s]
[12]:
-
DatasetDefinitionDatasetDefinition
-
NoneNone
-
dict (0 items)
-
dict (1 items)
-
dict (4 items)
-
list (5 items)
- 'timestamp'
- 'x'
- (3 more)
-
dict (5 items)
-
Float64Float64
-
Float64Float64
- (3 more)
-
- (2 more)
-
-
-
NoneNone
-
ExperimentExperiment
-
EyeTrackerEyeTracker
-
NoneNone
-
NoneNone
-
NoneNone
-
NoneNone
-
10001000
-
NoneNone
-
NoneNone
-
-
10001000
-
ScreenScreen
-
6868
-
30.230.2
-
10241024
-
'upper left''upper left'
-
3838
-
12801280
-
15.59938648778295315.599386487782953
-
-15.599386487782953-15.599386487782953
-
12.50804441088254612.508044410882546
-
-12.508044410882546-12.508044410882546
-
-
-
NoneNone
-
dict (1 items)
-
'trial_{text_id:d}_{page_id:d}.csv''trial_{text_id:d}_{page_id:d}.csv'
-
-
dict (1 items)
-
dict (2 items)
-
<class 'int'><class 'int'>
-
<class 'int'><class 'int'>
-
-
-
TrueTrue
-
'pymovements Toy Dataset''pymovements Toy Dataset'
-
dict (0 items)
-
'ToyDataset''ToyDataset'
-
list (2 items)
- 'x'
- 'y'
-
NoneNone
-
list (1 items)
-
ResourceDefinition
-
'gaze''gaze'
-
'pymovements-toy-dataset.zip''pymovements-toy-dataset.zip'
-
'trial_{text_id:d}_{page_id:d}.csv''trial_{text_id:d}_{page_id:d}.csv'
-
dict (2 items)
-
<class 'int'><class 'int'>
-
<class 'int'><class 'int'>
-
-
NoneNone
-
'4da622457637a8181d86601fe17f3aa8''4da622457637a8181d86601fe17f3aa8'
-
str'http://github.com/aeye-lab/pymovements-toy-dataset/zipball/6cb5d663317bf418cec0c9abe1dde5085a8a8ebd/'
-
-
ResourceDefinition
-
'timestamp''timestamp'
-
'ms''ms'
-
NoneNone
-
NoneNone
-
-
list (0 items)
-
DataFrame (0 columns, 0 rows)shape: (0, 0)
-
list (0 items)
-
PosixPath('data/ToyDataset')PosixPath('data/ToyDataset')
-
DatasetPathsDatasetPaths
-
PosixPath('data/ToyDataset')PosixPath('data/ToyDataset')
-
PosixPath('data/ToyDataset/downloads')PosixPath('data/ToyDataset/downloads')
-
PosixPath('data/ToyDataset/events')PosixPath('data/ToyDataset/events')
-
PosixPath('data/ToyDataset/precomputed_events')PosixPath('data/ToyDataset/precomputed_events')
-
PosixPathPosixPath('data/ToyDataset/precomputed_reading_measures')
-
PosixPath('data/ToyDataset/preprocessed')PosixPath('data/ToyDataset/preprocessed')
-
PosixPath('data/ToyDataset/raw')PosixPath('data/ToyDataset/raw')
-
PosixPath('data')PosixPath('data')
-
-
list (0 items)
-
list (0 items)
Inspecting the dataset#
The Dataset class provides a method to scan the dataset files and create a fileinfo table. This is useful to get an overview of the dataset structure and for example to check if all files have been downloaded correctly and how to specify a subset of files for further processing.
[13]:
dataset.scan()
[13]:
-
DatasetDefinitionDatasetDefinition
-
NoneNone
-
dict (0 items)
-
dict (1 items)
-
dict (4 items)
-
list (5 items)
- 'timestamp'
- 'x'
- (3 more)
-
dict (5 items)
-
Float64Float64
-
Float64Float64
- (3 more)
-
- (2 more)
-
-
-
NoneNone
-
ExperimentExperiment
-
EyeTrackerEyeTracker
-
NoneNone
-
NoneNone
-
NoneNone
-
NoneNone
-
10001000
-
NoneNone
-
NoneNone
-
-
10001000
-
ScreenScreen
-
6868
-
30.230.2
-
10241024
-
'upper left''upper left'
-
3838
-
12801280
-
15.59938648778295315.599386487782953
-
-15.599386487782953-15.599386487782953
-
12.50804441088254612.508044410882546
-
-12.508044410882546-12.508044410882546
-
-
-
NoneNone
-
dict (1 items)
-
'trial_{text_id:d}_{page_id:d}.csv''trial_{text_id:d}_{page_id:d}.csv'
-
-
dict (1 items)
-
dict (2 items)
-
<class 'int'><class 'int'>
-
<class 'int'><class 'int'>
-
-
-
TrueTrue
-
'pymovements Toy Dataset''pymovements Toy Dataset'
-
dict (0 items)
-
'ToyDataset''ToyDataset'
-
list (2 items)
- 'x'
- 'y'
-
NoneNone
-
list (1 items)
-
ResourceDefinition
-
'gaze''gaze'
-
'pymovements-toy-dataset.zip''pymovements-toy-dataset.zip'
-
'trial_{text_id:d}_{page_id:d}.csv''trial_{text_id:d}_{page_id:d}.csv'
-
dict (2 items)
-
<class 'int'><class 'int'>
-
<class 'int'><class 'int'>
-
-
NoneNone
-
'4da622457637a8181d86601fe17f3aa8''4da622457637a8181d86601fe17f3aa8'
-
str'http://github.com/aeye-lab/pymovements-toy-dataset/zipball/6cb5d663317bf418cec0c9abe1dde5085a8a8ebd/'
-
-
ResourceDefinition
-
'timestamp''timestamp'
-
'ms''ms'
-
NoneNone
-
NoneNone
-
-
list (0 items)
-
dict (1 items)
-
DataFrame (4 columns, 20 rows)shape: (20, 4)
text_id page_id filepath load_function i64 i64 str null 0 1 "aeye-lab-pymovements-toy-datas… null 0 2 "aeye-lab-pymovements-toy-datas… null 0 3 "aeye-lab-pymovements-toy-datas… null 0 4 "aeye-lab-pymovements-toy-datas… null 0 5 "aeye-lab-pymovements-toy-datas… null … … … … 3 1 "aeye-lab-pymovements-toy-datas… null 3 2 "aeye-lab-pymovements-toy-datas… null 3 3 "aeye-lab-pymovements-toy-datas… null 3 4 "aeye-lab-pymovements-toy-datas… null 3 5 "aeye-lab-pymovements-toy-datas… null
-
-
list (0 items)
-
PosixPath('data/ToyDataset')PosixPath('data/ToyDataset')
-
DatasetPathsDatasetPaths
-
PosixPath('data/ToyDataset')PosixPath('data/ToyDataset')
-
PosixPath('data/ToyDataset/downloads')PosixPath('data/ToyDataset/downloads')
-
PosixPath('data/ToyDataset/events')PosixPath('data/ToyDataset/events')
-
PosixPath('data/ToyDataset/precomputed_events')PosixPath('data/ToyDataset/precomputed_events')
-
PosixPathPosixPath('data/ToyDataset/precomputed_reading_measures')
-
PosixPath('data/ToyDataset/preprocessed')PosixPath('data/ToyDataset/preprocessed')
-
PosixPath('data/ToyDataset/raw')PosixPath('data/ToyDataset/raw')
-
PosixPath('data')PosixPath('data')
-
-
list (0 items)
-
list (0 items)
Loading into memory#
Based on the fileinfo table, we can define a subset of the dataset that we want to load into our working memory. We can do this by specifying a dictionary of the format dict[str, float | int | str | list[float | int | str]] where the keys are the column names of the fileinfo table and the values are the specifications of the files to load:
dataset.load(subset={'text_id': [1, 2], 'page_id': 1})
However, in this case we will load the entire dataset, so we do not need to specify a subset. We simply load the data into our working memory by using the load() method without any additional arguments:
[14]:
dataset.load()
[14]:
-
DatasetDefinitionDatasetDefinition
-
NoneNone
-
dict (0 items)
-
dict (1 items)
-
dict (4 items)
-
list (5 items)
- 'timestamp'
- 'x'
- (3 more)
-
dict (5 items)
-
Float64Float64
-
Float64Float64
- (3 more)
-
- (2 more)
-
-
-
NoneNone
-
ExperimentExperiment
-
EyeTrackerEyeTracker
-
NoneNone
-
NoneNone
-
NoneNone
-
NoneNone
-
10001000
-
NoneNone
-
NoneNone
-
-
10001000
-
ScreenScreen
-
6868
-
30.230.2
-
10241024
-
'upper left''upper left'
-
3838
-
12801280
-
15.59938648778295315.599386487782953
-
-15.599386487782953-15.599386487782953
-
12.50804441088254612.508044410882546
-
-12.508044410882546-12.508044410882546
-
-
-
NoneNone
-
dict (1 items)
-
'trial_{text_id:d}_{page_id:d}.csv''trial_{text_id:d}_{page_id:d}.csv'
-
-
dict (1 items)
-
dict (2 items)
-
<class 'int'><class 'int'>
-
<class 'int'><class 'int'>
-
-
-
TrueTrue
-
'pymovements Toy Dataset''pymovements Toy Dataset'
-
dict (0 items)
-
'ToyDataset''ToyDataset'
-
list (2 items)
- 'x'
- 'y'
-
NoneNone
-
list (1 items)
-
ResourceDefinition
-
'gaze''gaze'
-
'pymovements-toy-dataset.zip''pymovements-toy-dataset.zip'
-
'trial_{text_id:d}_{page_id:d}.csv''trial_{text_id:d}_{page_id:d}.csv'
-
dict (2 items)
-
<class 'int'><class 'int'>
-
<class 'int'><class 'int'>
-
-
NoneNone
-
'4da622457637a8181d86601fe17f3aa8''4da622457637a8181d86601fe17f3aa8'
-
str'http://github.com/aeye-lab/pymovements-toy-dataset/zipball/6cb5d663317bf418cec0c9abe1dde5085a8a8ebd/'
-
-
ResourceDefinition
-
'timestamp''timestamp'
-
'ms''ms'
-
NoneNone
-
NoneNone
-
-
list (0 items)
-
dict (1 items)
-
DataFrame (4 columns, 20 rows)shape: (20, 4)
text_id page_id filepath load_function i64 i64 str null 0 1 "aeye-lab-pymovements-toy-datas… null 0 2 "aeye-lab-pymovements-toy-datas… null 0 3 "aeye-lab-pymovements-toy-datas… null 0 4 "aeye-lab-pymovements-toy-datas… null 0 5 "aeye-lab-pymovements-toy-datas… null … … … … 3 1 "aeye-lab-pymovements-toy-datas… null 3 2 "aeye-lab-pymovements-toy-datas… null 3 3 "aeye-lab-pymovements-toy-datas… null 3 4 "aeye-lab-pymovements-toy-datas… null 3 5 "aeye-lab-pymovements-toy-datas… null
-
-
list (20 items)
-
Gaze
-
DataFrame (6 columns, 17223 rows)shape: (17_223, 6)
time stimuli_x stimuli_y text_id page_id pixel i64 f64 f64 i64 i64 list[f64] 1988145 -1.0 -1.0 0 1 [206.8, 152.4] 1988146 -1.0 -1.0 0 1 [206.9, 152.1] 1988147 -1.0 -1.0 0 1 [207.0, 151.8] 1988148 -1.0 -1.0 0 1 [207.1, 151.7] 1988149 -1.0 -1.0 0 1 [207.0, 151.5] … … … … … … 2005363 -1.0 -1.0 0 1 [361.0, 415.4] 2005364 -1.0 -1.0 0 1 [358.0, 414.5] 2005365 -1.0 -1.0 0 1 [355.8, 413.8] 2005366 -1.0 -1.0 0 1 [353.1, 413.2] 2005367 -1.0 -1.0 0 1 [351.2, 412.9] -
EventsEvents
-
DataFrame (6 columns, 0 rows)shape: (0, 6)
text_id page_id name onset offset duration i64 i64 str i64 i64 i64 -
list (2 items)
- 'text_id'
- 'page_id'
-
-
list (2 items)
- 'text_id'
- 'page_id'
-
ExperimentExperiment
-
EyeTrackerEyeTracker
-
NoneNone
-
NoneNone
-
NoneNone
-
NoneNone
-
10001000
-
NoneNone
-
NoneNone
-
-
10001000
-
ScreenScreen
-
6868
-
30.230.2
-
10241024
-
'upper left''upper left'
-
3838
-
12801280
-
15.59938648778295315.599386487782953
-
-15.599386487782953-15.599386487782953
-
12.50804441088254612.508044410882546
-
-12.508044410882546-12.508044410882546
-
-
-
-
Gaze
-
DataFrame (6 columns, 29799 rows)shape: (29_799, 6)
time stimuli_x stimuli_y text_id page_id pixel i64 f64 f64 i64 i64 list[f64] 2008305 -1.0 -1.0 0 2 [141.4, 153.6] 2008306 -1.0 -1.0 0 2 [141.1, 153.2] 2008307 -1.0 -1.0 0 2 [140.7, 152.8] 2008308 -1.0 -1.0 0 2 [140.6, 152.7] 2008309 -1.0 -1.0 0 2 [140.5, 152.6] … … … … … … 2038099 -1.0 -1.0 0 2 [273.8, 773.8] 2038100 -1.0 -1.0 0 2 [273.8, 774.1] 2038101 -1.0 -1.0 0 2 [273.9, 774.5] 2038102 -1.0 -1.0 0 2 [274.0, 774.4] 2038103 -1.0 -1.0 0 2 [274.0, 773.9] -
EventsEvents
-
DataFrame (6 columns, 0 rows)shape: (0, 6)
text_id page_id name onset offset duration i64 i64 str i64 i64 i64 -
list (2 items)
- 'text_id'
- 'page_id'
-
-
list (2 items)
- 'text_id'
- 'page_id'
-
ExperimentExperiment
-
EyeTrackerEyeTracker
-
NoneNone
-
NoneNone
-
NoneNone
-
NoneNone
-
10001000
-
NoneNone
-
NoneNone
-
-
10001000
-
ScreenScreen
-
6868
-
30.230.2
-
10241024
-
'upper left''upper left'
-
3838
-
12801280
-
15.59938648778295315.599386487782953
-
-15.599386487782953-15.599386487782953
-
12.50804441088254612.508044410882546
-
-12.508044410882546-12.508044410882546
-
-
-
- (18 more)
-
Gaze
-
PosixPath('data/ToyDataset')PosixPath('data/ToyDataset')
-
DatasetPathsDatasetPaths
-
PosixPath('data/ToyDataset')PosixPath('data/ToyDataset')
-
PosixPath('data/ToyDataset/downloads')PosixPath('data/ToyDataset/downloads')
-
PosixPath('data/ToyDataset/events')PosixPath('data/ToyDataset/events')
-
PosixPath('data/ToyDataset/precomputed_events')PosixPath('data/ToyDataset/precomputed_events')
-
PosixPathPosixPath('data/ToyDataset/precomputed_reading_measures')
-
PosixPath('data/ToyDataset/preprocessed')PosixPath('data/ToyDataset/preprocessed')
-
PosixPath('data/ToyDataset/raw')PosixPath('data/ToyDataset/raw')
-
PosixPath('data')PosixPath('data')
-
-
list (0 items)
-
list (0 items)
Let’s verify that we have correctly scanned the dataset files:
[15]:
dataset.fileinfo
[15]:
{'gaze': shape: (20, 4)
┌─────────┬─────────┬─────────────────────────────────┬───────────────┐
│ text_id ┆ page_id ┆ filepath ┆ load_function │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str ┆ null │
╞═════════╪═════════╪═════════════════════════════════╪═══════════════╡
│ 0 ┆ 1 ┆ aeye-lab-pymovements-toy-datas… ┆ null │
│ 0 ┆ 2 ┆ aeye-lab-pymovements-toy-datas… ┆ null │
│ 0 ┆ 3 ┆ aeye-lab-pymovements-toy-datas… ┆ null │
│ 0 ┆ 4 ┆ aeye-lab-pymovements-toy-datas… ┆ null │
│ 0 ┆ 5 ┆ aeye-lab-pymovements-toy-datas… ┆ null │
│ … ┆ … ┆ … ┆ … │
│ 3 ┆ 1 ┆ aeye-lab-pymovements-toy-datas… ┆ null │
│ 3 ┆ 2 ┆ aeye-lab-pymovements-toy-datas… ┆ null │
│ 3 ┆ 3 ┆ aeye-lab-pymovements-toy-datas… ┆ null │
│ 3 ┆ 4 ┆ aeye-lab-pymovements-toy-datas… ┆ null │
│ 3 ┆ 5 ┆ aeye-lab-pymovements-toy-datas… ┆ null │
└─────────┴─────────┴─────────────────────────────────┴───────────────┘}
Wonderful, all of our data has been downloaded and loaded in successfully!
What you have learned in this tutorial:#
how to initialize a public dataset
how to download and extract dataset resources
how to customize the default directory structure
how to load the dataset into your working memory