Overview

The peekbankr package allows you to access data in the peekbank-db from R. This removes the need to write complex SQL queries in order to get the information you want from the database. This vignette shows some examples of how to use the data loading functions and what the resulting data look like.

There are several different get_ functions that you can use to extract different types of data from the peekbank-db:

Technical note 1: You do not have to explicitly establish a connection to the peekbank-db since the peekbankr functions will manage these connections. But if you would like to establish your own connection, you can do so with connect_to_peekbank() and pass it as an argument to any of the get_ functions.

Technical note 2: We have tried to optimize the time it takes to get data from the database. But if you try to query and get all of the timepoint tables, it will still take a long time as you are trying to get 100s of MB of data.

# load the library
library(peekbankr)

Get datasets

The get_datasets function returns a table related to the sources of the dataset, information of the tracker, information of the method (e.g., monitor size and sample rate).

For example, you can run get_datasets without any arguments to return all of the datasets in the database.

d_datasets <- get_datasets()
## Using current database version: '2021.1'.
head(d_datasets)
## # A tibble: 6 × 5
##   dataset_id lab_dataset_id        dataset_name          shortcite   cite       
##        <int> <chr>                 <chr>                 <chr>       <chr>      
## 1          0 casillas_tseltal_2015 casillas_tseltal_2015 Casillas e… "Casillas,…
## 2          1 perry_cowpig          perry_cowpig          Perry & Sa… "Perry, L.…
## 3          2 SwitchingCues         pomper_saffran_2016   Pomper & S… "Pomper, R…
## 4          3 adams_marchman_2018   adams_marchman_2018   Adams et a… "Adams, K.…
## 5          4 pomper_salientme      pomper_salientme      Pomper & S… "Pomper, R…
## 6          5 mon2                  swingley_aslin_2002   Swingley &… "Swingley,…

Get Subjects

The get_subjects function returns information about persistent subject identifiers for noting when subjects have participated in multiple experiments. This includes demographic information (currently only sex and lab-specific subject id).

d_subjects <- get_subjects()
## Using current database version: '2021.1'.
head(d_subjects)
## # A tibble: 6 × 4
##   subject_id sex    native_language lab_subject_id
##        <int> <chr>  <chr>           <chr>         
## 1          0 male   other           P3-14moM      
## 2          1 male   other           P14-22moM     
## 3          2 male   other           P14-45moM     
## 4          3 female other           P16-22moF     
## 5          4 female other           P16-45moF     
## 6          5 female other           P19-27moF

Get Administrations

The get_administrations function returns information about the specific experimental administrations to subjects in the database. This includes information about:

  • age
  • monitor size
  • tracker

Again, if you run the function with no arguments, then you get all the information for all administrations in the database, but you can now also filter on a dataset name or dataset id.

d_administrations <- get_administrations(dataset_name = "pomper_saffran_2016")
## Using current database version: '2021.1'.
head(d_administrations)
## # A tibble: 6 × 12
##   administration_id   age lab_age lab_age_units monitor_size_x monitor_size_y
##               <int> <dbl>   <dbl> <chr>                  <int>          <int>
## 1                68    46      46 months                    NA             NA
## 2                69    43      43 months                    NA             NA
## 3                70    47      47 months                    NA             NA
## 4                71    42      42 months                    NA             NA
## 5                72    43      43 months                    NA             NA
## 6                73    42      42 months                    NA             NA
## # … with 6 more variables: sample_rate <dbl>, tracker <chr>,
## #   coding_method <chr>, dataset_id <int>, subject_id <int>, dataset_name <chr>

The age argument takes a number indicating the age(s) of children (in months) that you want to analyze. you can use this argument in two ways

  1. Pass a single number to get information about all participants who were tested at that particular age.
  2. Pass a range of ages to get information about all participants who were tested within a certain age range.

For example, you can get the participant information for all of the children who were tested between the ages of 24 and 36 months.

d_age_range <- get_administrations(age = c(24, 36))
## Using current database version: '2021.1'.
head(d_age_range)
## # A tibble: 6 × 12
##   administration_id   age lab_age lab_age_units monitor_size_x monitor_size_y
##               <int> <dbl>   <dbl> <chr>                  <int>          <int>
## 1                 5    27      27 months                  1366            768
## 2                 7    25      25 months                  1366            768
## 3                14    29      29 months                  1366            768
## 4                17    32      32 months                  1366            768
## 5                18    32      32 months                  1366            768
## 6                19    35      35 months                  1366            768
## # … with 6 more variables: sample_rate <dbl>, tracker <chr>,
## #   coding_method <chr>, dataset_id <int>, subject_id <int>, dataset_name <chr>

Get trials

The get_trials function returns a table with information of the trials in the experiments in the database. This includes the following information:

  • Phrase
  • Language
  • Point of disambiguation
  • IDs to link to other tables.
d_trials <- get_trials()
## Using current database version: '2021.1'.
head(d_trials)
## # A tibble: 6 × 5
##   trial_id trial_order trial_type_id dataset_id dataset_name       
##      <int>       <int>         <int>      <int> <chr>              
## 1      458           1           179          3 adams_marchman_2018
## 2      459           2           180          3 adams_marchman_2018
## 3      460           3           181          3 adams_marchman_2018
## 4      461           4           182          3 adams_marchman_2018
## 5      462           5           183          3 adams_marchman_2018
## 6      463           6           184          3 adams_marchman_2018

This function also takes dataset name and id filters.

Get stimuli

The get_stimuli function returns a table with information of the stimuli in the experiments in the database. This includes the following information:

  • Label
  • Image
  • Novelty status
d_stimuli <- get_stimuli()
## Using current database version: '2021.1'.
head(d_stimuli)
## # A tibble: 6 × 10
##   stimulus_id stimulus_novelty original_stimulus_label english_stimulus_label
##         <int> <chr>            <chr>                   <chr>                 
## 1         118 familiar         baby                    baby                  
## 2         119 familiar         car                     car                   
## 3         120 familiar         ball                    ball                  
## 4         121 familiar         doggy                   doggy                 
## 5         122 familiar         shoe                    shoe                  
## 6         123 familiar         birdie                  birdie                
## # … with 6 more variables: stimulus_image_path <chr>, lab_stimulus_id <chr>,
## #   dataset_id <int>, image_description <chr>, image_description_source <chr>,
## #   dataset_name <chr>

This function also takes dataset name and id filters.

Get AOI region sets

The get_aoi_region_sets() returning a table with the information of the region of area of interest (AOI) for experiments using eye-trackers. It includes information of the dimensions of the x and y, such as the minimum and maximum dimension of the xy spaces.

d_aoi_region_sets <- get_aoi_region_sets()
## Using current database version: '2021.1'.
head(d_aoi_region_sets)
## # A tibble: 6 × 9
##   aoi_region_set_id l_x_max l_x_min l_y_max l_y_min r_x_max r_x_min r_y_max
##               <int>   <int>   <int>   <int>   <int>   <int>   <int>   <int>
## 1                 0     395     359     754     359    1366     971     754
## 2                 1     715       3     917     142    1679     967     921
## 3                 2     710       0     911     146    1676     982     906
## 4                 3     701       3     918     121    1676     978     923
## 5                 4     706       0     927     138    1681     982     925
## 6                 5     699       2     923     133    1680     982     923
## # … with 1 more variable: r_y_min <int>

This function is not expected to be used commonly - this information is retained as part of the process of calculating AOIs from XY points.

Get AOI timepoints

The get_aoi_timepoints() function returns a table with information of the subject’s looking behavior in each trial. For example, you can get information about which area that the subject was looking at in a particular trial (e.g., looking away or target or distractor).

The t_norm field provides a trial-normalized time variable (milliseconds) whose 0 point is the point of disambiguation on that trial (first timestep of the onset of the first time the target label is said).

d_aoi_timepoints <- get_aoi_timepoints(dataset_name = "pomper_saffran_2016")
## Using current database version: '2021.1'.
head(d_aoi_timepoints)
## # A tibble: 6 × 4
##   administration_id trial_id aoi    t_norm
##               <int>    <int> <chr>   <dbl>
## 1                68      202 target  -1000
## 2                68      202 target   -975
## 3                68      202 target   -950
## 4                68      202 target   -925
## 5                68      202 target   -900
## 6                68      202 target   -875

Get XY timepoints

For experiments using eye-trackers (as opposed to hand coding from video), the get_xy_timepoints function returns a table including the x and y position across time.

d_xy_timepoints <- get_xy_timepoints()