Get tokens — get_tokens • childesr

Get tokens

get_tokens(
  collection = NULL,
  language = NULL,
  corpus = NULL,
  target_child = NULL,
  role = NULL,
  role_exclude = NULL,
  age = NULL,
  sex = NULL,
  token,
  stem = NULL,
  part_of_speech = NULL,
  replace = TRUE,
  connection = NULL,
  db_version = "current",
  db_args = NULL
)

Arguments

collection	A character vector of one or more names of collections
language	A character vector of one or more languages
corpus	A character vector of one or more names of corpora
target_child	A character vector of one or more names of children
role	A character vector of one or more roles to include
role_exclude	A character vector of one or more roles to exclude
age	A numeric vector of an single age value or a min age value and max age value (inclusive) in months. For a single age value, participants are returned for which that age is within their age range; for two ages, participants are returned for whose age overlaps with the interval between those two ages.
sex	A character vector of values "male" and/or "female"
token	A character vector of one or more token patterns (`%` matches any number of wildcard characters, `_` matches exactly one wildcard character)
stem	A character vector of one or more stems
part_of_speech	A character vector of one or more parts of speech
replace	A boolean indicating whether to replace "gloss" with "replacement" (i.e. phonologically assimilated form), when available (defaults to `TRUE`)
connection	A connection to the CHILDES database
db_version	String of the name of database version to use
db_args	List with host, user, and password defined

Value

A `tbl` of Token data, filtered down by supplied arguments. If `connection` is supplied, the result remains a remote query, otherwise it is retrieved into a local tibble.

Examples

# \donttest{
get_tokens(token = "dog")
#> Using current database version: '2020.1'.
#> Getting data from 7085 children in 375 corpora...
#> # A tibble: 26,032 x 28
#>       id gloss language token_order prefix part_of_speech stem  actual_phonology
#>    <int> <chr> <chr>          <int> <chr>  <chr>          <chr> <chr>           
#>  1     7 dog   eng                7 ""     n              dog   ""              
#>  2    92 dog   eng                8 ""     n              dog   ""              
#>  3   219 dog   eng                3 ""     n              dog   ""              
#>  4   289 dog   eng                6 ""     n              dog   ""              
#>  5   352 dog   eng                7 ""     n              dog   ""              
#>  6   381 dog   eng                5 ""     n              dog   ""              
#>  7   510 dog   eng                8 ""     n              dog   ""              
#>  8   587 dog   eng                3 ""     n              dog   ""              
#>  9   687 dog   eng               11 ""     n              dog   ""              
#> 10   716 dog   eng                6 ""     n              dog   ""              
#> # … with 26,022 more rows, and 20 more variables: model_phonology <chr>,
#> #   suffix <chr>, num_morphemes <int>, english <chr>, clitic <chr>,
#> #   utterance_type <chr>, corpus_name <chr>, speaker_code <chr>,
#> #   speaker_name <chr>, speaker_role <chr>, target_child_name <chr>,
#> #   target_child_age <dbl>, target_child_sex <chr>, collection_name <chr>,
#> #   collection_id <int>, corpus_id <int>, speaker_id <int>,
#> #   target_child_id <int>, transcript_id <int>, utterance_id <int>
# }