pymovements.datasets.PoTeC#

class pymovements.datasets.PoTeC(name: str = 'PoTeC', mirrors: tuple[str, ...] = ('https://osf.io/download/',), resources: tuple[dict[str, str], ...] = ({'filename': 'PoTeC.zip', 'md5': '7780904bf7b18ba7d30a811174750db3', 'resource': 'tgd9q/'},), experiment: Experiment = <pymovements.gaze.experiment.Experiment object>, filename_format: str = 'reader{subject_id:d}_{text_id}_raw_data.tsv', filename_format_dtypes: dict[str, type] = <factory>, custom_read_kwargs: dict[str, Any] = <factory>, column_map: dict[str, str] = <factory>, trial_columns: list[str] = <factory>, time_column: str = 'time', pixel_columns: list[str] = <factory>, position_columns: list[str] | None = None, velocity_columns: list[str] | None = None, acceleration_columns: list[str] | None = None, distance_column: str | None = None)[source]#

PoTeC dataset [Jäger et al., 2021].

The Potsdam Textbook Corpus (PoTeC) is a corpus of eye-tracking-while-reading data where participants (N=75) read a series of German short texts taken from college level textbooks of physics and biology. The experiments were conducted within a 2x2 fully-crossed factorial design with the reader’s expertise (advanced vs beginner) and major (physics vs biology) as factors. Reading comprehension was assessed using text comprehension questions. Moreover, background questions that required additional knowledge beyond the presented text tested the general domain knowledge. The repository contains the eye-movement data (1000 Hz, right eye monocular) as well as the stimulus text data with extensive linguistic feature annotations at the sub-lexical, lexical und supra-lexical level. Therefore, the PoTeC is ideal for studying cognitive processes related to sentence comprehension at all linguistic levels (e.g. lexical, syntactic, discourse) as well as higher-level text comprehension.

Check the respective repository for details.

name#

The name of the dataset.

Type:: str

mirrors#

A tuple of mirrors of the dataset. Each entry must be of type str and end with a ‘/’.

Type:: tuple[str, …]

resources#

A tuple of dataset resources. Each list entry must be a dictionary with the following keys: - resource: The url suffix of the resource. This will be concatenated with the mirror. - filename: The filename under which the file is saved as. - md5: The MD5 checksum of the respective file.

Type:: tuple[dict[str, str], …]

experiment#

The experiment definition.

Type:: Experiment

filename_format#

Regular expression which will be matched before trying to load the file. Namedgroups will appear in the fileinfo dataframe.

Type:: str

filename_format_dtypes#

If named groups are present in the filename_format, this makes it possible to cast specific named groups to a particular datatype.

Type:: dict[str, type], optional

column_map#

The keys are the columns to read, the values are the names to which they should be renamed.

Type:: dict[str, str]

custom_read_kwargs#

If specified, these keyword arguments will be passed to the file reading function.

Type:: dict[str, Any], optional

Examples

Initialize your PublicDataset object with the PoTeC definition:

>>> import pymovements as pm
>>>
>>> dataset = pm.Dataset("PoTeC", path='data/PoTeC')

Download the dataset resources resources:

>>> dataset.download()

Load the data into memory:

>>> dataset.load()

__init__(name: str = 'PoTeC', mirrors: tuple[str, ...] = ('https://osf.io/download/',), resources: tuple[dict[str, str], ...] = ({'filename': 'PoTeC.zip', 'md5': '7780904bf7b18ba7d30a811174750db3', 'resource': 'tgd9q/'},), experiment: Experiment = <pymovements.gaze.experiment.Experiment object>, filename_format: str = 'reader{subject_id:d}_{text_id}_raw_data.tsv', filename_format_dtypes: dict[str, type] = <factory>, custom_read_kwargs: dict[str, Any] = <factory>, column_map: dict[str, str] = <factory>, trial_columns: list[str] = <factory>, time_column: str = 'time', pixel_columns: list[str] = <factory>, position_columns: list[str] | None = None, velocity_columns: list[str] | None = None, acceleration_columns: list[str] | None = None, distance_column: str | None = None) → None

Methods

__init__([name, mirrors, resources, ...])

Attributes

`acceleration_columns`
`distance_column`
`experiment`
`filename_format`
`mirrors`
`name`
`pixel_columns`
`position_columns`
`resources`
`time_column`
`trial_columns`
`velocity_columns`
`filename_format_dtypes`
`custom_read_kwargs`
`column_map`