Get tokens

get_tokens(
  collection = NULL,
  language = NULL,
  corpus = NULL,
  target_child = NULL,
  role = NULL,
  role_exclude = NULL,
  age = NULL,
  sex = NULL,
  token,
  stem = NULL,
  part_of_speech = NULL,
  replace = TRUE,
  connection = NULL,
  db_version = "current",
  db_args = NULL
)

Arguments

collection

A character vector of one or more names of collections

language

A character vector of one or more languages

corpus

A character vector of one or more names of corpora

target_child

A character vector of one or more names of children

role

A character vector of one or more roles to include

role_exclude

A character vector of one or more roles to exclude

age

A numeric vector of an single age value or a min age value (inclusive) and max age value (exclusive) in months. For a single age value, participants are returned for which that age is within their age range; for two ages, participants are returned for whose age overlaps with the interval between those two ages.

sex

A character vector of values "male" and/or "female"

token

A character vector of one or more token patterns (`%` matches any number of wildcard characters, `_` matches exactly one wildcard character)

stem

A character vector of one or more stems

part_of_speech

A character vector of one or more parts of speech

replace

A boolean indicating whether to replace "gloss" with "replacement" (i.e. phonologically assimilated form), when available (defaults to TRUE)

connection

A connection to the CHILDES database

db_version

String of the name of database version to use

db_args

List with host, user, and password defined

Value

A `tbl` of Token data, filtered down by supplied arguments. If `connection` is supplied, the result remains a remote query, otherwise it is retrieved into a local tibble.

Examples

# \donttest{ get_tokens(token = "dog")
#> Using current database version: '2018.1'.
#> Getting data from 4853 children in 287 corpora...
#> # A tibble: 22,148 x 26 #> id gloss stem part_of_speech speaker_id utterance_id token_order #> <int> <chr> <chr> <chr> <int> <int> <int> #> 1 1.84e5 Dog "" "" 76 54153 6 #> 2 2.08e5 Dog "" "" 115 58787 8 #> 3 2.51e5 Dog "" "" 156 65689 11 #> 4 2.51e5 Dog "" "" 156 65722 1 #> 5 2.52e5 Dog "" "" 156 65781 6 #> 6 2.52e5 Dog "" "" 156 65874 1 #> 7 2.53e5 Dog "" "" 156 65945 8 #> 8 2.53e5 Dog "" "" 157 66011 3 #> 9 1.97e6 dog "" "" 1025 506100 2 #> 10 1.97e6 dog "" "" 1296 506122 2 #> # … with 22,138 more rows, and 19 more variables: corpus_id <int>, #> # transcript_id <int>, speaker_code <chr>, speaker_name <chr>, #> # speaker_role <chr>, target_child_id <int>, target_child_age <dbl>, #> # target_child_name <chr>, target_child_sex <chr>, utterance_type <chr>, #> # collection_id <int>, collection_name <chr>, english <chr>, prefix <chr>, #> # suffix <chr>, num_morphemes <int>, language <chr>, corpus_name <chr>, #> # clitic <chr>
# }