{ "cells": [ { "cell_type": "markdown", "id": "011263bd", "metadata": {}, "source": [ "# Downloading Public Datasets" ] }, { "cell_type": "markdown", "id": "14bfa755", "metadata": {}, "source": [ "## What you will learn in this tutorial:\n", "\n", "* how to download and extract one of the available public datasets\n", "* how to customize the default directory structure" ] }, { "cell_type": "markdown", "id": "c31a9917", "metadata": {}, "source": [ "## Preparations" ] }, { "cell_type": "markdown", "id": "12af367f", "metadata": {}, "source": [ "We import `pymovements` as the alias `pm` for convenience." ] }, { "cell_type": "code", "execution_count": null, "id": "bef4ae0b", "metadata": {}, "outputs": [], "source": [ "import pymovements as pm" ] }, { "cell_type": "markdown", "id": "9096f56b", "metadata": {}, "source": [ "pymovements provides a library of publicly available datasets.\n", "\n", "You can browse through the available dataset definitions here:\n", "[Datasets](https://pymovements.readthedocs.io/en/latest/reference/pymovements.datasets.html#module-pymovements.datasets)\n", "\n", "For this tutorial we will limit ourselves to the `ToyDataset` due to its minimal space requirements.\n", "\n", "Other datasets can be downloaded by simply replacing `ToyDataset` with one of the other available datasets." ] }, { "cell_type": "markdown", "id": "7dfd61b9", "metadata": {}, "source": [ " ## Initialization" ] }, { "cell_type": "markdown", "id": "c891cb45", "metadata": {}, "source": [ "First we initialize the dataset by specifying the root data directory.\n", "Our dataset will then be placed in a directory with the name of the dataset:" ] }, { "cell_type": "code", "execution_count": null, "id": "375b5f97", "metadata": {}, "outputs": [], "source": [ "dataset = pm.datasets.ToyDataset(root='data/')\n", "\n", "dataset.path" ] }, { "cell_type": "markdown", "id": "b6d791e5", "metadata": {}, "source": [ "If you don't want to create this additional directory and just use the root path as your dataset path, you can specify the `dataset_dirname` explicitly and set it to `.`:" ] }, { "cell_type": "code", "execution_count": null, "id": "e1cdaaa6", "metadata": {}, "outputs": [], "source": [ "pm.datasets.ToyDataset(root='data/', dataset_dirname='.').path" ] }, { "cell_type": "markdown", "id": "3f644b5b", "metadata": {}, "source": [ "## Downloading" ] }, { "cell_type": "markdown", "id": "3f4c2dca", "metadata": {}, "source": [ "The dataset will then be downloaded by calling:" ] }, { "cell_type": "code", "execution_count": null, "id": "80b2e7c7", "metadata": {}, "outputs": [], "source": [ "dataset.download()" ] }, { "cell_type": "markdown", "id": "5884f578", "metadata": {}, "source": [ "As we see from the download message, the dataset resource has been downloaded to a downloads directory.\n", "\n", "You can get the path to this directory from the `downloads_rootpath` attribute:" ] }, { "cell_type": "code", "execution_count": null, "id": "77990896", "metadata": {}, "outputs": [], "source": [ "dataset.downloads_rootpath" ] }, { "cell_type": "markdown", "id": "7a4486e2", "metadata": {}, "source": [ "You can also specify a custom directory name during initialization:" ] }, { "cell_type": "code", "execution_count": null, "id": "7c2054cb", "metadata": {}, "outputs": [], "source": [ "pm.datasets.ToyDataset(root='data/', downloads_dirname='my_downloads').downloads_rootpath" ] }, { "cell_type": "markdown", "id": "b732b465", "metadata": {}, "source": [ "## Extracting" ] }, { "cell_type": "markdown", "id": "e91345c0", "metadata": {}, "source": [ "You can then extract you downloaded data by calling:" ] }, { "cell_type": "code", "execution_count": null, "id": "244859b2", "metadata": {}, "outputs": [], "source": [ "dataset.extract()" ] }, { "cell_type": "markdown", "id": "b573a28d", "metadata": {}, "source": [ "Your data is now extracted to the following directory:" ] }, { "cell_type": "code", "execution_count": null, "id": "782311cf", "metadata": {}, "outputs": [], "source": [ "dataset.raw_rootpath" ] }, { "cell_type": "markdown", "id": "5226af41", "metadata": {}, "source": [ "## Loading into memory" ] }, { "cell_type": "markdown", "id": "1f07d791", "metadata": {}, "source": [ "Finally we can load the data into our working memory by using the common `load()` method:" ] }, { "cell_type": "code", "execution_count": null, "id": "b38485fb", "metadata": {}, "outputs": [], "source": [ "dataset.load()" ] }, { "cell_type": "markdown", "id": "71eedf37", "metadata": {}, "source": [ "Let's verify that we have correctly scanned the dataset files:" ] }, { "cell_type": "code", "execution_count": null, "id": "7a7e7d82", "metadata": {}, "outputs": [], "source": [ "dataset.fileinfo" ] }, { "cell_type": "markdown", "id": "8f8d46dd", "metadata": {}, "source": [ "Wonderful, all of our data has been downloaded successfully!" ] }, { "cell_type": "markdown", "id": "252493f7", "metadata": {}, "source": [ "## What you have learned in this tutorial:\n", "\n", "* how to initialize a public dataset\n", "* how to download and extract dataset resources\n", "* how to customize the default directory structure\n", "* how to load the dataset into your working memory" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.9" } }, "nbformat": 4, "nbformat_minor": 5 }