pymovements.datasets.Provo#

class pymovements.datasets.Provo(name: str = 'Provo', has_files: dict[str, bool] = <factory>, mirrors: dict[str, list[str]] = <factory>, resources: dict[str, list[dict[str, str]]] = <factory>, experiment: Experiment | None = <factory>, extract: dict[str, bool] = <factory>, filename_format: dict[str, str] = <factory>, filename_format_schema_overrides: dict[str, dict[str, type]] = <factory>, custom_read_kwargs: dict[str, dict[str, Any]] = <factory>, column_map: dict[str, str] = <factory>, trial_columns: list[str] | None = None, time_column: str | None = None, time_unit: str | None = 'ms', pixel_columns: list[str] | None = None, position_columns: list[str] | None = None, velocity_columns: list[str] | None = None, acceleration_columns: list[str] | None = None, distance_column: str | None = None)#

Provo dataset [Luke and Christianson, 2018].

The Provo Corpus, a corpus of eye-tracking data with accompanying predictability norms. The predictability norms for the Provo Corpus differ from those of other corpora. In addition to traditional cloze scores that estimate the predictability of the full orthographic form of each word, the Provo Corpus also includes measures of the predictability of the morpho-syntactic and semantic information for each word. This makes the Provo Corpus ideal for studying predictive processes in reading.

Check the respective paper for details [Luke and Christianson, 2018].

name#

The name of the dataset.

Type:

str

has_files#

Indicate whether the dataset contains ‘gaze’, ‘precomputed_events’, and ‘precomputed_reading_measures’.

Type:

dict[str, bool]

mirrors#

A list of mirrors of the dataset. Each entry must be of type str and end with a ‘/’.

Type:

dict[str, list[str]]

resources#

A list of dataset gaze_resources. Each list entry must be a dictionary with the following keys: - resource: The url suffix of the resource. This will be concatenated with the mirror. - filename: The filename under which the file is saved as. - md5: The MD5 checksum of the respective file.

Type:

dict[str, list[dict[str, str]]]

extract#

Decide whether to extract the data.

Type:

dict[str, bool]

filename_format#

Regular expression which will be matched before trying to load the file. Namedgroups will appear in the fileinfo dataframe.

Type:

dict[str, str]

filename_format_schema_overrides#

If named groups are present in the filename_format, this makes it possible to cast specific named groups to a particular datatype.

Type:

dict[str, dict[str, type]]

column_map#

The keys are the columns to read, the values are the names to which they should be renamed.

Type:

dict[str, str]

custom_read_kwargs#

If specified, these keyword arguments will be passed to the file reading function.

Type:

dict[str, dict[str, Any]]

Examples

Initialize your Dataset object with the SBSAT definition:

>>> import pymovements as pm
>>>
>>> dataset = pm.Dataset("SBSAT", path='data/SBSAT')

Download the dataset resources:

>>> dataset.download()

Load the data into memory:

>>> dataset.load()
__init__(name: str = 'Provo', has_files: dict[str, bool] = <factory>, mirrors: dict[str, list[str]] = <factory>, resources: dict[str, list[dict[str, str]]] = <factory>, experiment: Experiment | None = <factory>, extract: dict[str, bool] = <factory>, filename_format: dict[str, str] = <factory>, filename_format_schema_overrides: dict[str, dict[str, type]] = <factory>, custom_read_kwargs: dict[str, dict[str, Any]] = <factory>, column_map: dict[str, str] = <factory>, trial_columns: list[str] | None = None, time_column: str | None = None, time_unit: str | None = 'ms', pixel_columns: list[str] | None = None, position_columns: list[str] | None = None, velocity_columns: list[str] | None = None, acceleration_columns: list[str] | None = None, distance_column: str | None = None) None

Methods

__init__([name, has_files, mirrors, ...])

from_yaml(path)

Load a dataset definition from a YAML file.

to_yaml(path)

Save a dataset definition to a YAML file.

Attributes

acceleration_columns

distance_column

name

pixel_columns

position_columns

time_column

time_unit

trial_columns

velocity_columns

has_files

mirrors

resources

extract

filename_format

filename_format_schema_overrides

column_map

custom_read_kwargs

experiment