pymovements.datasets.PoTeC#
- class pymovements.datasets.PoTeC(name: str = 'PoTeC', mirrors: tuple[str, ...] = ('https://osf.io/download/',), resources: tuple[dict[str, str], ...] = ({'filename': 'PoTeC.zip', 'md5': 'cffd45039757c3777e2fd130e5d8a2ad', 'resource': 'tgd9q/'},), experiment: Experiment = <pymovements.gaze.experiment.Experiment object>, filename_format: str = 'reader{subject_id:d}_{text_id}_raw_data.tsv', filename_format_dtypes: dict[str, type] = <factory>, custom_read_kwargs: dict[str, Any] = <factory>, column_map: dict[str, str] = <factory>, trial_columns: list[str] = <factory>, time_column: str = 'time', time_unit: str = 'ms', pixel_columns: list[str] = <factory>, position_columns: list[str] | None = None, velocity_columns: list[str] | None = None, acceleration_columns: list[str] | None = None, distance_column: str | None = None)#
PoTeC dataset [Jakobi et al., 2024].
The Potsdam Textbook Corpus (PoTeC) is a naturalistic eye-tracking-while-reading corpus containing data from 75 participants reading 12 scientific texts. PoTeC is the first naturalistic eye-tracking-while-reading corpus that contains eye-movements from domain-experts as well as novices in a within-participant manipulation: It is based on a 2×2×2 fully-crossed factorial design which includes the participants’ level of study and the participants’ discipline of study as between-subject factors and the text domain as a within-subject factor. The participants’ reading comprehension was assessed by a series of text comprehension questions and their domain knowledge was tested by text-independent background questions for each of the texts. The materials are annotated for a variety of linguistic features at different levels. We envision PoTeC to be used for a wide range of studies including but not limited to analyses of expert and non-expert reading strategies.
The corpus and all the accompanying data at all stages of the preprocessing pipeline and all code used to preprocess the data are made available via GitHub.
- name#
The name of the dataset.
- Type:
str
- mirrors#
A tuple of mirrors of the dataset. Each entry must be of type str and end with a ‘/’.
- Type:
tuple[str, …]
- resources#
A tuple of dataset resources. Each list entry must be a dictionary with the following keys: - resource: The url suffix of the resource. This will be concatenated with the mirror. - filename: The filename under which the file is saved as. - md5: The MD5 checksum of the respective file.
- Type:
tuple[dict[str, str], …]
- experiment#
The experiment definition.
- Type:
- filename_format#
Regular expression which will be matched before trying to load the file. Namedgroups will appear in the fileinfo dataframe.
- Type:
str
- filename_format_dtypes#
If named groups are present in the filename_format, this makes it possible to cast specific named groups to a particular datatype.
- Type:
dict[str, type], optional
- column_map#
The keys are the columns to read, the values are the names to which they should be renamed.
- Type:
dict[str, str]
- custom_read_kwargs#
If specified, these keyword arguments will be passed to the file reading function.
- Type:
dict[str, Any], optional
Examples
Initialize your
PublicDataset
object with thePoTeC
definition:>>> import pymovements as pm >>> >>> dataset = pm.Dataset("PoTeC", path='data/PoTeC')
Download the dataset resources:
>>> dataset.download()
Load the data into memory:
>>> dataset.load()
- __init__(name: str = 'PoTeC', mirrors: tuple[str, ...] = ('https://osf.io/download/',), resources: tuple[dict[str, str], ...] = ({'filename': 'PoTeC.zip', 'md5': 'cffd45039757c3777e2fd130e5d8a2ad', 'resource': 'tgd9q/'},), experiment: Experiment = <pymovements.gaze.experiment.Experiment object>, filename_format: str = 'reader{subject_id:d}_{text_id}_raw_data.tsv', filename_format_dtypes: dict[str, type] = <factory>, custom_read_kwargs: dict[str, Any] = <factory>, column_map: dict[str, str] = <factory>, trial_columns: list[str] = <factory>, time_column: str = 'time', time_unit: str = 'ms', pixel_columns: list[str] = <factory>, position_columns: list[str] | None = None, velocity_columns: list[str] | None = None, acceleration_columns: list[str] | None = None, distance_column: str | None = None) None
Methods
__init__
([name, mirrors, resources, ...])Attributes
acceleration_columns
distance_column
pixel_columns
position_columns
time_column
time_unit
trial_columns
velocity_columns