pymovements.dataset.DatasetDefinition#

class pymovements.dataset.DatasetDefinition(name: str = '.', mirrors: tuple[str, ...] = <factory>, resources: tuple[dict[str, str], ...] = <factory>, experiment: Experiment | None = None, filename_regex: str = '.*', filename_regex_dtypes: dict[str, type] = <factory>, custom_read_kwargs: dict[str, Any] = <factory>, column_map: dict[str, str] = <factory>)[source]#

Definition to initialize a Dataset.

name#

The name of the dataset.

Type:: str

mirrors#

A tuple of mirrors of the dataset. Each entry must be of type str and end with a ‘/’.

Type:: tuple[str, …]

resources#

A tuple of dataset resources. Each list entry must be a dictionary with the following keys: - resource: The url suffix of the resource. This will be concatenated with the mirror. - filename: The filename under which the file is saved as. - md5: The MD5 checksum of the respective file.

Type:: tuple[dict[str, str], …]

experiment#

The experiment definition.

Type:: Experiment

filename_regex#

Regular expression which will be matched before trying to load the file. Namedgroups will appear in the fileinfo dataframe.

Type:: str

filename_regex_dtypes#

If named groups are present in the filename_regex, this makes it possible to cast specific named groups to a particular datatype.

Type:: dict[str, type], optional

column_map#

The keys are the columns to read, the values are the names to which they should be renamed.

Type:: dict[str, str]

custom_read_kwargs#

If specified, these keyword arguments will be passed to the file reading function.

Type:: dict[str, Any], optional

__init__(name: str = '.', mirrors: tuple[str, ...] = <factory>, resources: tuple[dict[str, str], ...] = <factory>, experiment: Experiment | None = None, filename_regex: str = '.*', filename_regex_dtypes: dict[str, type] = <factory>, custom_read_kwargs: dict[str, Any] = <factory>, column_map: dict[str, str] = <factory>) → None

Methods

__init__([name, mirrors, resources, ...])

Attributes

`experiment`
`filename_regex`
`name`
`mirrors`
`resources`
`filename_regex_dtypes`
`custom_read_kwargs`
`column_map`