{ "cells": [ { "cell_type": "markdown", "id": "a0", "metadata": {}, "source": [ "# Cleaning Gaze Data During Blinks\n", "\n", "During blinks, the eyelid partially or fully covers the pupil, producing gaze samples that\n", "do not reflect actual eye position. These **blink artifacts** corrupt downstream analyses\n", "such as fixation detection, velocity computation, and saccade classification.\n", "\n", "This notebook demonstrates how to:\n", "\n", "1. Load a real EyeLink dataset with blink events\n", "2. Visualize the raw gaze signal with blink regions highlighted\n", "3. Use `nullify_event_samples()` to remove blink artifacts (with optional padding)\n", "4. Visualize the cleaned result, showing which samples were nullified" ] }, { "cell_type": "code", "execution_count": null, "id": "a1", "metadata": {}, "outputs": [], "source": [ "import matplotlib.pyplot as plt\n", "import numpy as np\n", "import polars as pl\n", "\n", "import pymovements as pm\n", "from pymovements.gaze.io import from_asc" ] }, { "cell_type": "markdown", "id": "b0", "metadata": {}, "source": [ "## 1. Load Real EyeLink Data\n", "\n", "We use the `ToyDatasetEyeLink` dataset, which contains monocular eye tracking data\n", "recorded at 1000 Hz using an EyeLink Portable Duo.\n", "\n", "We first use `Dataset.download()` to fetch the data, then load the `.asc` file\n", "directly with `events=True` so that blink events from `SBLINK`/`EBLINK` markers\n", "are parsed." ] }, { "cell_type": "code", "execution_count": null, "id": "b1", "metadata": {}, "outputs": [], "source": [ "# Download the dataset\n", "dataset = pm.Dataset('ToyDatasetEyeLink', path='data/ToyDataset')\n", "dataset.download()\n", "\n", "# Load the first ASC file with events=True to parse blink events\n", "raw_dir = dataset.paths.raw / 'pymovements-toy-dataset-eyelink-main'\n", "asc_file = raw_dir / 'raw' / 'subject_1_session_1.asc'\n", "\n", "gaze = from_asc(\n", " asc_file,\n", " patterns='eyelink',\n", " encoding='ascii',\n", " events=True,\n", ")\n", "\n", "print('Samples shape:', gaze.samples.shape)\n", "print('Columns:', gaze.samples.columns)\n", "gaze.samples.head()" ] }, { "cell_type": "markdown", "id": "c0", "metadata": {}, "source": [ "## 2. Inspect Blink Events\n", "\n", "EyeLink blink events are stored with the name `blink_eyelink`. Let's look at the\n", "detected blinks and their durations." ] }, { "cell_type": "code", "execution_count": null, "id": "c1", "metadata": {}, "outputs": [], "source": [ "# Show all event types in the data\n", "print('Event types:', gaze.events.frame['name'].unique().to_list())\n", "\n", "# Filter to blink events only\n", "blink_events = gaze.events.frame.filter(pl.col('name') == 'blink_eyelink')\n", "print(f'\\nFound {len(blink_events)} blink events:')\n", "blink_events" ] }, { "cell_type": "markdown", "id": "d0", "metadata": {}, "source": [ "## 3. Visualize Raw Signal with Blink Regions\n", "\n", "We pick a time window that contains a few blinks and plot the raw gaze signal with\n", "blink intervals shaded in gray." ] }, { "cell_type": "code", "execution_count": null, "id": "d1", "metadata": {}, "outputs": [], "source": [ "# Extract time, pixel coordinates, and pupil as arrays (before cleaning)\n", "time_arr = gaze.samples['time'].to_numpy()\n", "pixel_data = gaze.samples['pixel'].to_list()\n", "x_raw = np.array([p[0] if p is not None else np.nan for p in pixel_data])\n", "y_raw = np.array([p[1] if p is not None else np.nan for p in pixel_data])\n", "pupil_raw = gaze.samples['pupil'].to_numpy().copy()\n", "\n", "# Get blink onset/offset pairs\n", "blink_onsets = blink_events['onset'].to_list()\n", "blink_offsets = blink_events['offset'].to_list()\n", "blink_regions = list(zip(blink_onsets, blink_offsets))\n", "\n", "# Focus on a window around the first few blinks\n", "window_start = blink_onsets[0] - 500\n", "window_end = blink_offsets[2] + 500 if len(blink_onsets) > 2 else blink_offsets[-1] + 500\n", "mask = (time_arr >= window_start) & (time_arr <= window_end)\n", "\n", "fig, axes = plt.subplots(2, 1, figsize=(14, 6), sharex=True)\n", "\n", "for ax, data, label, color in [\n", " (axes[0], x_raw, 'Gaze X (px)', 'steelblue'),\n", " (axes[1], y_raw, 'Gaze Y (px)', 'darkorange'),\n", "]:\n", " ax.plot(time_arr[mask], data[mask], color=color, linewidth=0.8)\n", "\n", " for onset, offset in blink_regions:\n", " if onset >= window_start and onset <= window_end:\n", " ax.axvspan(onset, offset, alpha=0.2, color='gray')\n", "\n", " ax.set_ylabel(label)\n", " ax.grid(True, alpha=0.3)\n", "\n", "axes[1].set_xlabel('Time (ms)')\n", "fig.suptitle('Raw Gaze Signal with Blink Regions (gray)', fontsize=13, fontweight='bold')\n", "plt.tight_layout()\n", "plt.show()" ] }, { "cell_type": "markdown", "id": "e0", "metadata": {}, "source": [ "## 4. Apply `nullify_event_samples()`\n", "\n", "We nullify gaze samples during blink events. The `padding` parameter extends the\n", "cleaning window to also remove the unreliable samples immediately before and after\n", "each blink:\n", "\n", "- **`padding=10`** means 10 ms of symmetric padding (same before and after)\n", "- **`padding=(20, 10)`** means 20 ms before and 10 ms after (asymmetric)\n", "\n", "Asymmetric padding is useful because the onset of a blink (eyelid closing) often\n", "produces artifacts slightly before the detected blink start, while the offset\n", "(eyelid opening) artifacts resolve more quickly." ] }, { "cell_type": "code", "execution_count": null, "id": "e1", "metadata": {}, "outputs": [], "source": [ "# Apply blink cleaning with the default symmetric padding of 25 ms\n", "gaze.nullify_event_samples('blink_eyelink')\n", "\n", "# Count how many samples were nullified\n", "null_count = gaze.samples['pixel'].null_count()\n", "total = gaze.samples.height\n", "\n", "print(f'Nullified {null_count} / {total} samples ({100 * null_count / total:.1f}%)')\n", "print('Using default padding: (25, 25) ms')" ] }, { "cell_type": "markdown", "id": "f0", "metadata": {}, "source": [ "## 5. Visualize Before vs. After\n", "\n", "We plot the same time window again, now showing which samples were nullified (red)\n", "and the cleaned signal with gaps where blink data was removed." ] }, { "cell_type": "code", "execution_count": null, "id": "f1", "metadata": {}, "outputs": [], "source": [ "# Build null mask\n", "null_mask = gaze.samples['pixel'].is_null().to_numpy()\n", "\n", "# Extract cleaned coordinates\n", "cleaned_pixels = gaze.samples['pixel'].to_list()\n", "x_cleaned = np.array([p[0] if p is not None else np.nan for p in cleaned_pixels])\n", "y_cleaned = np.array([p[1] if p is not None else np.nan for p in cleaned_pixels])\n", "\n", "# Default padding used\n", "padding = (25, 25)\n", "\n", "# Compute padded blink regions for shading\n", "padded_regions = [\n", " (onset - padding[0], offset + padding[1])\n", " for onset, offset in blink_regions\n", "]\n", "\n", "# Plot before vs. after in the same time window\n", "fig, axes = plt.subplots(2, 2, figsize=(16, 7), sharex=True)\n", "\n", "for col, (x_data, y_data, label) in enumerate([\n", " (x_raw, y_raw, 'Before Cleaning'),\n", " (x_cleaned, y_cleaned, 'After Cleaning'),\n", "]):\n", " for row, (data, ylabel, color) in enumerate([\n", " (x_data, 'Gaze X (px)', 'steelblue'),\n", " (y_data, 'Gaze Y (px)', 'darkorange'),\n", " ]):\n", " ax = axes[row, col]\n", " ax.plot(time_arr[mask], data[mask], color=color, linewidth=0.8)\n", "\n", " for onset, offset in padded_regions:\n", " if onset >= window_start and onset <= window_end:\n", " ax.axvspan(onset, offset, alpha=0.12, color='red')\n", "\n", " # On the 'before' panel, mark nullified samples in red\n", " if col == 0:\n", " null_in_window = mask & null_mask\n", " ax.scatter(\n", " time_arr[null_in_window], data[null_in_window],\n", " color='red', s=8, zorder=5, label='Nullified',\n", " )\n", " ax.legend(loc='upper right', fontsize=8)\n", "\n", " ax.set_ylabel(ylabel)\n", " ax.set_title(label if row == 0 else '', fontsize=11)\n", " ax.grid(True, alpha=0.3)\n", "\n", "axes[1, 0].set_xlabel('Time (ms)')\n", "axes[1, 1].set_xlabel('Time (ms)')\n", "fig.suptitle('Before vs. After Blink Cleaning', fontsize=13, fontweight='bold')\n", "plt.tight_layout()\n", "plt.show()" ] }, { "cell_type": "markdown", "id": "g0", "metadata": {}, "source": [ "## 6. Pupil Signal During Blinks\n", "\n", "The pupil size signal also shows characteristic artifacts during blinks. Let's\n", "visualize the pupil trace alongside the blink regions." ] }, { "cell_type": "code", "execution_count": null, "id": "g1", "metadata": {}, "outputs": [], "source": [ "pupil_cleaned = gaze.samples['pupil'].to_numpy()\n", "null_in_window = mask & null_mask\n", "\n", "fig, axes = plt.subplots(1, 2, figsize=(16, 3.5), sharex=True, sharey=True)\n", "\n", "# Before: original pupil signal with nullified samples marked in red\n", "axes[0].plot(time_arr[mask], pupil_raw[mask], color='mediumpurple', linewidth=0.8)\n", "axes[0].scatter(\n", " time_arr[null_in_window], pupil_raw[null_in_window],\n", " color='red', s=8, zorder=5, label='Nullified',\n", ")\n", "for onset, offset in padded_regions:\n", " if onset >= window_start and onset <= window_end:\n", " axes[0].axvspan(onset, offset, alpha=0.12, color='red')\n", "axes[0].set_title('Before Cleaning', fontsize=11)\n", "axes[0].set_ylabel('Pupil Size')\n", "axes[0].set_xlabel('Time (ms)')\n", "axes[0].legend(loc='upper right', fontsize=8)\n", "axes[0].grid(True, alpha=0.3)\n", "\n", "# After: cleaned pupil signal with gaps\n", "axes[1].plot(time_arr[mask], pupil_cleaned[mask], color='mediumpurple', linewidth=0.8)\n", "for onset, offset in padded_regions:\n", " if onset >= window_start and onset <= window_end:\n", " axes[1].axvspan(onset, offset, alpha=0.12, color='red')\n", "axes[1].set_title('After Cleaning', fontsize=11)\n", "axes[1].set_xlabel('Time (ms)')\n", "axes[1].grid(True, alpha=0.3)\n", "\n", "fig.suptitle('Pupil Signal: Before vs. After Blink Cleaning', fontsize=13, fontweight='bold')\n", "plt.tight_layout()\n", "plt.show()" ] }, { "cell_type": "markdown", "id": "h0", "metadata": {}, "source": [ "## 7. Summary Statistics\n", "\n", "A per-blink summary of the cleaning impact." ] }, { "cell_type": "code", "execution_count": null, "id": "h1", "metadata": {}, "outputs": [], "source": [ "summary_rows = []\n", "for row in blink_events.to_dicts():\n", " onset = row['onset']\n", " offset = row['offset']\n", " summary_rows.append({\n", " 'onset': onset,\n", " 'offset': offset,\n", " 'blink_ms': offset - onset,\n", " 'padded_onset': onset - padding[0],\n", " 'padded_offset': offset + padding[1],\n", " 'padded_ms': (offset + padding[1]) - (onset - padding[0]),\n", " })\n", "\n", "summary_df = pl.DataFrame(summary_rows)\n", "print('Blink Cleaning Summary')\n", "print('=' * 60)\n", "print(summary_df)\n", "print(f'\\nTotal samples: {total}')\n", "print(f'Total nullified: {null_count} ({100 * null_count / total:.1f}%)')\n", "print(f'Remaining usable: {total - null_count} ({100 * (total - null_count) / total:.1f}%)')" ] }, { "cell_type": "markdown", "id": "i0", "metadata": {}, "source": [ "## 8. Apply to All Recordings and Inspect Blink Instances\n", "\n", "We clean all recordings, then plot every blink instance (with a window of context\n", "around each) so you can visually verify the cleaning." ] }, { "cell_type": "code", "execution_count": null, "id": "i1", "metadata": {}, "outputs": [], "source": [ "padding_all = (25, 25) # default\n", "context_ms = 100 # extra ms of context before/after the padded region\n", "\n", "# Collect all blink instances across recordings\n", "all_blinks = []\n", "\n", "asc_dir = dataset.paths.raw / 'pymovements-toy-dataset-eyelink-main' / 'raw'\n", "\n", "for asc_path in sorted(asc_dir.glob('*.asc')):\n", " gaze_obj = from_asc(\n", " asc_path,\n", " patterns='eyelink',\n", " encoding='ascii',\n", " events=True,\n", " )\n", "\n", " blinks = gaze_obj.events.frame.filter(pl.col('name') == 'blink_eyelink')\n", " n_blinks = len(blinks)\n", "\n", " # Save raw data before cleaning\n", " t = gaze_obj.samples['time'].to_numpy()\n", " px = gaze_obj.samples['pixel'].to_list()\n", " x_before = np.array([p[0] if p is not None else np.nan for p in px])\n", " y_before = np.array([p[1] if p is not None else np.nan for p in px])\n", " pupil_before = gaze_obj.samples['pupil'].to_numpy().copy()\n", "\n", " # Apply cleaning with default padding\n", " gaze_obj.nullify_event_samples('blink_eyelink')\n", " null_mask_all = gaze_obj.samples['pixel'].is_null().to_numpy()\n", "\n", " null_count = null_mask_all.sum()\n", " total = gaze_obj.samples.height\n", " print(\n", " f'{asc_path.name}: {n_blinks} blinks, '\n", " f'{null_count}/{total} samples nullified ({100 * null_count / total:.1f}%)'\n", " )\n", "\n", " # Store each blink instance\n", " for row in blinks.to_dicts():\n", " onset, offset = row['onset'], row['offset']\n", " win_start = onset - padding_all[0] - context_ms\n", " win_end = offset + padding_all[1] + context_ms\n", " win = (t >= win_start) & (t <= win_end)\n", "\n", " all_blinks.append({\n", " 'file': asc_path.stem,\n", " 'onset': onset,\n", " 'offset': offset,\n", " 'duration': offset - onset,\n", " 'time': t[win],\n", " 'x_raw': x_before[win],\n", " 'y_raw': y_before[win],\n", " 'pupil_raw': pupil_before[win],\n", " 'null_mask': null_mask_all[win],\n", " })\n", "\n", "print(f'\\nTotal blink instances collected: {len(all_blinks)}')" ] }, { "cell_type": "code", "execution_count": null, "id": "s0wxm0f5kq", "metadata": {}, "outputs": [], "source": [ "# Plot all blink instances in a grid: pupil signal with nullified samples in red\n", "n = len(all_blinks)\n", "ncols = 5\n", "nrows = int(np.ceil(n / ncols))\n", "\n", "fig, axes = plt.subplots(nrows, ncols, figsize=(ncols * 3, nrows * 2.2), squeeze=False)\n", "\n", "for idx, blink in enumerate(all_blinks):\n", " row, col = divmod(idx, ncols)\n", " ax = axes[row, col]\n", "\n", " t_blink = blink['time']\n", " pupil = blink['pupil_raw']\n", " nmask = blink['null_mask']\n", "\n", " # Plot full raw pupil trace\n", " ax.plot(t_blink, pupil, color='mediumpurple', linewidth=0.8)\n", "\n", " # Overlay nullified samples in red\n", " if nmask.any():\n", " ax.scatter(t_blink[nmask], pupil[nmask], color='red', s=6, zorder=5)\n", "\n", " # Shade the original blink interval in gray\n", " ax.axvspan(blink['onset'], blink['offset'], alpha=0.2, color='gray')\n", "\n", " # Shade the padded region in light red\n", " ax.axvspan(\n", " blink['onset'] - padding_all[0], blink['offset'] + padding_all[1],\n", " alpha=0.08, color='red',\n", " )\n", "\n", " ax.set_title(f\"#{idx + 1} ({blink['duration']}ms)\", fontsize=8)\n", " ax.tick_params(labelsize=6)\n", " ax.set_yticks([])\n", "\n", "# Hide unused subplots\n", "for idx in range(n, nrows * ncols):\n", " row, col = divmod(idx, ncols)\n", " axes[row, col].set_visible(False)\n", "\n", "fig.suptitle(\n", " f'All {n} Blink Instances — Pupil Signal (gray=blink, red=nullified with padding)',\n", " fontsize=12, fontweight='bold',\n", ")\n", "plt.tight_layout()\n", "plt.show()" ] }, { "cell_type": "markdown", "id": "j0", "metadata": {}, "source": [ "## Key Considerations\n", "\n", "- **Load with `events=True`**: When using `from_asc()`, pass `events=True` to parse\n", " blink events from the EyeLink `SBLINK`/`EBLINK` markers. Without this flag, blink\n", " events are not loaded.\n", "- **Padding values** depend on your sampling rate and how your eye tracker reports\n", " blink boundaries. At 1000 Hz, 20 ms = 20 samples.\n", "- **Clean before computing derived signals** (velocity, acceleration) to prevent\n", " blink artifacts from propagating.\n", "- **Asymmetric padding** `(before, after)` is recommended because blink onset\n", " artifacts typically extend further than offset artifacts.\n", "- The `time` and trial columns are **never** nullified, preserving temporal alignment.\n", "- EyeLink blink events are named `blink_eyelink`. Other eye trackers may use\n", " different naming conventions." ] } ], "metadata": {}, "nbformat": 4, "nbformat_minor": 5 }