pymovements.dataset.DatasetDefinition#

class pymovements.dataset.DatasetDefinition(name: str = '.', mirrors: tuple[str, ...] = <factory>, resources: tuple[dict[str, str], ...] = <factory>, experiment: Experiment | None = None, filename_format: str = '.*', filename_format_dtypes: dict[str, type] = <factory>, custom_read_kwargs: dict[str, Any] = <factory>, column_map: dict[str, str] = <factory>, trial_columns: list[str] | None = None, time_column: str | None = None, time_unit: str | None = 'ms', pixel_columns: list[str] | None = None, position_columns: list[str] | None = None, velocity_columns: list[str] | None = None, acceleration_columns: list[str] | None = None, distance_column: str | None = None)#

Definition to initialize a Dataset.

name#

The name of the dataset. (default: ‘.’)

Type:

str

mirrors#

A tuple of mirrors of the dataset. Each entry must be of type str and end with a ‘/’. (default: field(default_factory=tuple))

Type:

tuple[str, …]

resources#

A tuple of dataset resources. Each list entry must be a dictionary with the following keys: - resource: The url suffix of the resource. This will be concatenated with the mirror. - filename: The filename under which the file is saved as. - md5: The MD5 checksum of the respective file. (default: field(default_factory=tuple))

Type:

tuple[dict[str, str], …]

experiment#

The experiment definition. (default: None)

Type:

Experiment

filename_format#

Regular expression which will be matched before trying to load the file. Namedgroups will appear in the fileinfo dataframe. (default: ‘.*’)

Type:

str

filename_format_dtypes#

If named groups are present in the filename_format, this makes it possible to cast specific named groups to a particular datatype. (default: field(default_factory=dict))

Type:

dict[str, type]

custom_read_kwargs#

If specified, these keyword arguments will be passed to the file reading function. The behavior of this argument depends on the file extension of the dataset files. If the file extension is .csv the keyword arguments will be passed to polars.read_csv(). If the file extension is`.asc` the keyword arguments will be passed to pymovements.utils.parsing.parse_eyelink(). See Notes for more details on how to use this argument. (default: field(default_factory=dict))

Type:

dict[str, Any]

column_map#

The keys are the columns to read, the values are the names to which they should be renamed. (default: field(default_factory=dict))

Type:

dict[str, str]

trial_columns#

The name of the trial columns in the input data frame. If the list is empty or None, the input data frame is assumed to contain only one trial. If the list is not empty, the input data frame is assumed to contain multiple trials and the transformation methods will be applied to each trial separately. (default: None)

Type:

list[str] | None

time_column#

The name of the timestamp column in the input data frame. This column will be renamed to time. (default: None)

Type:

str | None

time_unit#

The unit of the timestamps in the timestamp column in the input data frame. Supported units are ‘s’ for seconds, ‘ms’ for milliseconds and ‘step’ for steps. If the unit is ‘step’ the experiment definition must be specified. All timestamps will be converted to milliseconds. (default: ‘ms’)

Type:

str | None

pixel_columns#

The name of the pixel position columns in the input data frame. These columns will be nested into the column pixel. If the list is empty or None, the nested pixel column will not be created. (default: None)

Type:

list[str] | None

position_columns#

The name of the dva position columns in the input data frame. These columns will be nested into the column position. If the list is empty or None, the nested position column will not be created. (default: None)

Type:

list[str] | None

velocity_columns#

The name of the velocity columns in the input data frame. These columns will be nested into the column velocity. If the list is empty or None, the nested velocity column will not be created. (default: None)

Type:

list[str] | None

acceleration_columns#

The name of the acceleration columns in the input data frame. These columns will be nested into the column acceleration. If the list is empty or None, the nested acceleration column will not be created. (default: None)

Type:

list[str] | None

distance_column#

The name of the column containing eye-to-screen distance in millimeters for each sample in the input data frame. If specified, the column will be used for pixel to dva transformations. If not specified, the constant eye-to-screen distance will be taken from the experiment definition. This column will be renamed to distance. (default: None)

Type:

str | None

Notes

When working with the custom_read_kwargs attribute there are specific use cases and considerations to keep in mind, especially for reading csv files:

1. Custom separator To read a csv file with a custom separator, you can pass the separator keyword argument to custom_read_kwargs. For example pass custom_read_kwargs={'separator': ';'} to read a semicolon-separated csv file.

2. Reading subset of columns To read only specific columns, specify them in custom_read_kwargs. For example: custom_read_kwargs={'columns': ['col1', 'col2']}

3. Specifying column datatypes polars.read_csv infers data types from a fixed number of rows, which might not be accurate for the entire dataset. To ensure correct data types, you can pass a dictionary to the dtypes keyword argument in custom_read_kwargs. Use data types from the polars library. For instance: custom_read_kwargs={'dtypes': {'col1': polars.Int64, 'col2': polars.Float64}}

__init__(name: str = '.', mirrors: tuple[str, ...] = <factory>, resources: tuple[dict[str, str], ...] = <factory>, experiment: Experiment | None = None, filename_format: str = '.*', filename_format_dtypes: dict[str, type] = <factory>, custom_read_kwargs: dict[str, Any] = <factory>, column_map: dict[str, str] = <factory>, trial_columns: list[str] | None = None, time_column: str | None = None, time_unit: str | None = 'ms', pixel_columns: list[str] | None = None, position_columns: list[str] | None = None, velocity_columns: list[str] | None = None, acceleration_columns: list[str] | None = None, distance_column: str | None = None) None

Methods

__init__([name, mirrors, resources, ...])

Attributes

acceleration_columns

distance_column

experiment

filename_format

name

pixel_columns

position_columns

time_column

time_unit

trial_columns

velocity_columns

mirrors

resources

filename_format_dtypes

custom_read_kwargs

column_map