Get tokens

get_tokens(collection = NULL, language = NULL, corpus = NULL,
  target_child = NULL, role = NULL, role_exclude = NULL, age = NULL,
  sex = NULL, token, stem = NULL, part_of_speech = NULL, replace = TRUE,
  connection = NULL, db_version = "current", db_args = NULL)

Arguments

collection

A character vector of one or more names of collections

language

A character vector of one or more languages

corpus

A character vector of one or more names of corpora

target_child

A character vector of one or more names of children

role

A character vector of one or more roles to include

role_exclude

A character vector of one or more roles to exclude

age

A numeric vector of an age or a min age (inclusive) and max age (exclusive) in months

sex

A character vector of values "male" and/or "female"

token

A character vector of one or more token patterns (`%` matches any number of wildcard characters, `_` matches exactly one wildcard character)

stem

A character vector of one or more stems

part_of_speech

A character vector of one or more parts of speech

replace

A boolean indicating whether to replace "gloss" with "replacement" (i.e. phonologically assimilated form), when available (defaults to TRUE)

connection

A connection to the CHILDES database

db_version

String of the name of database version to use

db_args

List with host, user, and password defined

Value

A `tbl` of Token data, filtered down by supplied arguments. If `connection` is supplied, the result remains a remote query, otherwise it is retrieved into a local tibble.

Examples

get_tokens(token = "dog")
#> Using current database version: '2018.1'.
#> Getting data from 4853 children in 287 corpora...
#> # A tibble: 22,148 x 26 #> id gloss stem part_of_speech speaker_id utterance_id token_order #> <int> <chr> <chr> <chr> <int> <int> <int> #> 1 184379 Dog "" "" 76 54153 6 #> 2 207989 Dog "" "" 115 58787 8 #> 3 251131 Dog "" "" 156 65689 11 #> 4 251244 Dog "" "" 156 65722 1 #> 5 251664 Dog "" "" 156 65781 6 #> 6 252184 Dog "" "" 156 65874 1 #> 7 252691 Dog "" "" 156 65945 8 #> 8 253011 Dog "" "" 157 66011 3 #> 9 1968143 dog "" "" 1025 506100 2 #> 10 1968193 dog "" "" 1296 506122 2 #> # ... with 22,138 more rows, and 19 more variables: corpus_id <int>, #> # transcript_id <int>, speaker_code <chr>, speaker_name <chr>, #> # speaker_role <chr>, target_child_id <int>, target_child_age <dbl>, #> # target_child_name <chr>, target_child_sex <chr>, utterance_type <chr>, #> # collection_id <int>, collection_name <chr>, english <chr>, prefix <chr>, #> # suffix <chr>, num_morphemes <int>, language <chr>, corpus_name <chr>, #> # clitic <chr>