BSCII#

Beijing Sentence Corpus II

BSCII dataset [Yan et al., 2025].

The Beijing Sentence Corpus II (BSCII) is a Traditional Chinese sentence corpus of eye-tracking data, based on the original Beijing Sentence Corpus (BSC) in Simplified Chinese. Data was collected from 60 native Traditional Chinese readers. The corpus enables analyses of word frequency, visual complexity, and predictability on fixation location and duration.

Since the BSCII sentences are nearly identical to those in the BSC, the two corpora together provide a valuable resource for studying cross-script similarities and differences between Simplified and Traditional Chinese.

Eye-movements were recorded with an Eyelink 1000 system at 1000 Hz.

Check the respective paper for details [Yan et al., 2025].

How to Download#

import pymovements as pm

# Initialize the dataset object with its name
# Specify your local directory for saving and loading data
dataset = pm.Dataset(name='BSCII', path='path/to/your/data/directory')

# Download the dataset and extract all archives.
dataset.download()

# Load the dataset into memory for processing
dataset.load()