{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "011263bd",
   "metadata": {},
   "source": [
    "# Downloading Public Datasets"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "14bfa755",
   "metadata": {},
   "source": [
    "## What you will learn in this tutorial:\n",
    "\n",
    "* how to download and extract one of the available public datasets\n",
    "* how to customize the default directory structure"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c31a9917",
   "metadata": {},
   "source": [
    "## Preparations"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "12af367f",
   "metadata": {},
   "source": [
    "We import `pymovements` as the alias `pm` for convenience."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "bef4ae0b",
   "metadata": {},
   "outputs": [],
   "source": [
    "import pymovements as pm"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9096f56b",
   "metadata": {},
   "source": [
    "pymovements provides a library of publicly available datasets.\n",
    "\n",
    "You can browse through the available dataset definitions here:\n",
    "[Datasets](https://pymovements.readthedocs.io/en/latest/reference/pymovements.datasets.html#module-pymovements.datasets)\n",
    "\n",
    "For this tutorial we will limit ourselves to the `ToyDataset` due to its minimal space requirements.\n",
    "\n",
    "Other datasets can be downloaded by simply replacing `ToyDataset` with one of the other available datasets."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7dfd61b9",
   "metadata": {},
   "source": [
    " ## Initialization"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c891cb45",
   "metadata": {},
   "source": [
    "First we initialize the dataset by specifying the root data directory.\n",
    "Our dataset will then be placed in a directory with the name of the dataset:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "375b5f97",
   "metadata": {},
   "outputs": [],
   "source": [
    "dataset = pm.datasets.ToyDataset(root='data/')\n",
    "\n",
    "dataset.path"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b6d791e5",
   "metadata": {},
   "source": [
    "If you don't want to create this additional directory and just use the root path as your dataset path, you can specify the `dataset_dirname` explicitly and set it to `.`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e1cdaaa6",
   "metadata": {},
   "outputs": [],
   "source": [
    "pm.datasets.ToyDataset(root='data/', dataset_dirname='.').path"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3f644b5b",
   "metadata": {},
   "source": [
    "## Downloading"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3f4c2dca",
   "metadata": {},
   "source": [
    "The dataset will then be downloaded by calling:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "80b2e7c7",
   "metadata": {},
   "outputs": [],
   "source": [
    "dataset.download()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5884f578",
   "metadata": {},
   "source": [
    "As we see from the download message, the dataset resource has been downloaded to a downloads directory.\n",
    "\n",
    "You can get the path to this directory from the `downloads_rootpath` attribute:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "77990896",
   "metadata": {},
   "outputs": [],
   "source": [
    "dataset.downloads_rootpath"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7a4486e2",
   "metadata": {},
   "source": [
    "You can also specify a custom directory name during initialization:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7c2054cb",
   "metadata": {},
   "outputs": [],
   "source": [
    "pm.datasets.ToyDataset(root='data/', downloads_dirname='my_downloads').downloads_rootpath"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b732b465",
   "metadata": {},
   "source": [
    "## Extracting"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e91345c0",
   "metadata": {},
   "source": [
    "You can then extract you downloaded data by calling:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "244859b2",
   "metadata": {},
   "outputs": [],
   "source": [
    "dataset.extract()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b573a28d",
   "metadata": {},
   "source": [
    "Your data is now extracted to the following directory:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "782311cf",
   "metadata": {},
   "outputs": [],
   "source": [
    "dataset.raw_rootpath"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5226af41",
   "metadata": {},
   "source": [
    "## Loading into memory"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1f07d791",
   "metadata": {},
   "source": [
    "Finally we can load the data into our working memory by using the common `load()` method:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b38485fb",
   "metadata": {},
   "outputs": [],
   "source": [
    "dataset.load()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "71eedf37",
   "metadata": {},
   "source": [
    "Let's verify that we have correctly scanned the dataset files:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7a7e7d82",
   "metadata": {},
   "outputs": [],
   "source": [
    "dataset.fileinfo"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8f8d46dd",
   "metadata": {},
   "source": [
    "Wonderful, all of our data has been downloaded successfully!"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "252493f7",
   "metadata": {},
   "source": [
    "## What you have learned in this tutorial:\n",
    "\n",
    "* how to initialize a public dataset\n",
    "* how to download and extract dataset resources\n",
    "* how to customize the default directory structure\n",
    "* how to load the dataset into your working memory"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.9"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}