The childes-db project aims to make CHILDES transcripts more accessible by reducing the amount of preprocessing (e.g., CLAN or specific preprocessing libraries) and by making the individual tokens, utterances, transcripts, and corpora available in a tidy, tabular format that is accessible across programming languages. We release new versions of this dataset periodically to facilitate reproducibility. We also provide an R package (childesr) and a Python package (childespy) which allow users to access this database without having to write complex SQL queries.

Citation policy

If you use childes-db to access CHILDES in your research, please note the database version you used (i.e., 2018.1) and cite:

  1. The childes-db paper in Behavior Research Methods:
    *Sanchez, A., *Meylan, S.C., Braginsky, M., MacDonald, K. E., Yurovsky, D., & Frank, M. C. (2019). "childes-db: a flexible and reproducible interface to the Child Language Data Exchange System." Behavior Research Methods 51 (4), 1928–1941.
    * indicates co-first authorship.
  2. CHILDES itself – both the database and the corpora you use – following the Talkbank policy.

Meet the childes-db team

Stephan Meylan

MIT & Duke University

Michael C. Frank

Stanford University

Sathvik Nair

UC Berkeley (now at Amazon)

Jess Mankewitz

Stanford University

Sarp Uner

Duke University

Daniel Yurovsky

Carnegie Mellon University

childes-db alumni

Alessandro Sanchez

Stanford University